Mining using Command-Line¶
usage: repo-miner mine [-h] [--branch BRANCH] [--exclude-commits EXCLUDE_COMMITS] [--exclude-files EXCLUDE_FILES] [--verbose] {fixing-commits,fixed-files,failure-prone-files} {github,gitlab} {ansible,tosca} repository dest
positional arguments:
{fixing-commits,fixed-files,failure-prone-files}
the information to mine
{github,gitlab} the source code versioning host
{ansible,tosca} mine only commits modifying files of this language
repository the repository full name: <onwer/name> (e.g., radon-h2020/radon-repository-miner)
dest destination folder for the reports
optional arguments:
-h, --help show this help message and exit
-b, --branch BRANCH the repository branch to mine (default: master)
--exclude-commits EXCLUDE_COMMITS
the path to a JSON file containing the list of commit hashes to exclude
--include-commits INCLUDE_COMMITS
the path to a JSON file containing the list of commit hashes to include
--exclude-files EXCLUDE_FILES
the path to a JSON file containing the list of FixedFiles to exclude
--verbose show log
Note
Running this command will generate the following report files:
dest/fixing-commits.json
containing the list of fixing-commit hashes;dest/fixed-files.json
containing the list of FixedFile objects (if mined fixed-files or failure-prone-files);dest/failure-prone-files.json
containing the list of FailureProne objects (if mined failure-prone-files);
Warning
To properly use this command you MUST add the following to your environment variables:
GITHUB_ACCESS_TOKEN=<paste your token here>
if you are using thegithub
argument. See how to create a personal access token.GITLAB_ACCESS_TOKEN=<paste your token here>
if you are using thegitlab
argument. See how to create a personal access token.TMP_REPOSITORIES_DIR=<path/to/tmp/repositories/>
to temporary clone the remote repository for analysis. Please, note that the repository will be cloned in this folder but not deleted. The latter step is left to the user, when and if needed. Note: this variable is not needed if using the Docker image.
Examples¶
Using Docker¶
Setup environment variables
export GITHUB_ACCESS_TOKEN=***************
export GITLAB_ACCESS_TOKEN=***************
Pull the Docker image
docker pull radonconsortium/repo-miner:latest
Create a folder to share results
mkdir /tmp/repo-miner
Mine
(using github)
docker run -v /tmp/repo-miner:/app -e GITHUB_ACCESS_TOKEN=$GITHUB_ACCESS_TOKEN repo-miner:latest repo-miner mine failure-prone-files github ansible adriagalin/ansible.motd . --verbose
(using gitlab)
docker run -v /tmp/repo-miner:/app -e GITLAB_ACCESS_TOKEN=$GITHUB_ACCESS_TOKEN repo-miner:latest repo-miner mine failure-prone-files github ansible adriagalin/ansible.motd . --verbose
Access reports
ls /tmp/repo-miner
On local machine¶
Setup environment variables
export GITHUB_ACCESS_TOKEN=*****
export GITLAB_ACCESS_TOKEN=*****
export TMP_REPOSITORIES_DIR=/tmp/
Create a working directory and move there
mkdir radon-example && cd radon-example
(Optional) Create a virtualenv to avoid affecting the original environment
sudo apt install python3-venv python3 -m venv repo-miner-env source repo-miner-env/bin/activate
Install the package
pip install repository-miner
Mine
repo-miner mine failure-prone-files github ansible adriagalin/ansible.motd . --verbose
Access reports
ls .
(Recall the working directory isradon-example
)
Either way, you’ll get a similar output:
Mining adriagalin/ansible.motd [started at: 15:29]
Identifying fixing-commits from closed issues related to bugs
Identifying fixing-commits from commit messages
Saving fixing-commits
JSON created at ./fixing-commits.json
Identifying ansible files modified in fixing-commits
Saving fixed-files
JSON created at ./fixed-files.json
Identifying and labeling failure-prone files
Saving failure-prone files
JSON created at ./failure-prone-files.json