Download a pre-trained model
Warning
This command is partially implemented.
usage: radon-defect-predictor download-model [-h] {ansible,tosca} {github,gitlab} repository
positional arguments:
{ansible,tosca} the language the model is trained on
{github,gitlab} the platform the user's repository is hosted to
repository the user's remote repository in the form <namespace>/<repository> (e.g., radon-h2020/radon-defect-prediction-cli)
optional arguments:
-h, --help show this help message and exit
Warning
It is important to set up the following variables in your environment:
-
GITHUB_ACCESS_TOKEN=<paste your token here>
for Github, and/or -
GITLAB_ACCESS_TOKEN=<paste your token here>
for Gitlab. -
TMP_REPOSITORIES_DIR=/tmp/
if not using the Docker image. It is the directory where the tool clones the repository to extract the information to get the best matching model.
language {ansible, tosca}
Every models are trained for a specific language.
To download the proper model the user must specify the language on which they want to use the model.
Ansible and TOSCA are currently supported.
If the project contains both Ansible and Tosca files, the user can download two models by running the command twice,
passing the option ansible
and tosca
, respectively.
host {github, gitlab}
The hosting platform for software development and version control using Git. Github and Gitlab are supported.
This option is required to use the appropriate APIs (pygithub
or
python-gitlab
) to compute some of the aforementioned criteria.
repository
The user's repository full name namespace/repository (e.g., radon-h2020/radon-defect-prediction-cli).
It is necessary to select the appropriate model for the repository at hand.
Indeed, the downloaded model is the model trained on the most similar repository based on the following criteria:
- Core contributors: the number of contributors whose total number of commits accounts for 80% or more of the total contributions.
- Continuous integration (CI): the repository has evidence of a CI service, determined by the presence of a configuration file required by that service (e.g., a.travis.ymlfor TravisCI).
- Comments ratio: ratio between comments and lines of code.
- Commit frequency: the average number of commits per month.
- Issue frequency: the average number of issue events transpired per month.
- License availability: the repository has evidence of a license (i.e., a LICENSE file).
- Lines of Code: the number of executable lines of code.
- Ratio of IaC scripts: ratio between Infrastructure-as-Code (IaC) files and total files.
The value of each criterion is automatically extracted by the radon-repository-scorer
this tool depends on.
Examples
Ansible
Let's assume the user wants to download an Ansible model suitable for ANXS/postgresql.
For the sake of the example, let's create and move to a working directory:
mkdir radon-wd-ansible && cd radon-wd-ansible
The user can now get an Ansible model by running:
radon-defect-predictor download-model ansible github ANXS/postgresql
The model is saved in the current working directory:
ls
radondp_model.joblib
The model can be used later for predictions.
Tosca
Let's assume the user wants to download a TOSCA model suitable for UoW-CPC/COLARepo.
For the sake of the example, let's create and move to a working directory:
mkdir radon-wd-tosca && cd radon-wd-tosca
The user can now get an Ansible model by running:
radon-defect-predictor download-model tosca github UoW-CPC/COLARepo
The model is saved in the current working directory:
ls
radondp_model.joblib
The model can be used later for predictions.