Miners¶
Base Miner¶
-
class
repominer.mining.base.
BaseMiner
(url_to_repo: str, branch: str = 'master')¶ This is the base class to miner a software repositories.
It allows for mining bug-fixing commits, fixed files, and bug-introducing commits. It can be extended to instantiate the miner in the context of specific languages (e.g., Ansible and Tosca).
-
__init__
(url_to_repo: str, branch: str = 'master')¶ The class constructor. Initialize a new BaseMiner.
- Parameters
url_to_repo (str) – the url to a remote Github or Gitlab repository
branch (str) – the branch to analyze. Default ‘master’
-
host
¶ Source Code Versioning host (‘github’ or ‘gitlab’). The value is automatically extracted from parameter
url_to_repo
.- Type
str
-
repository
¶ Repository full name (e.g., radon-h2020/radon-repository-miner). The value is automatically extracted from parameter
url_to_repo
.- Type
str
-
branch
¶ Repository’s branch to analyze.
- Type
str
-
commit_hashes
¶ List of commit hash on the repository’s branch, ordered by creation date.
- Type
List[str]
-
exclude_commits
¶ Set of commit hash to exclude from mining.
Mining bug-fixing commits might lead several false positives, i.e., commits that do not actually fix bugs. If you are certain that some commits do not fix bugs, before mining, you can specify their hash as follows:
Example
from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.exclude_commits = {'521515108c4fee9a4bd1147fc42936768297e3b6'}
- Type
Set[str]
-
exclude_fixed_files
¶ Set of fixed files to exclude from mining. Fixed files are files modified in a bug-fixing commit.
When fixing a bug, several files might be modified but not all of them contributed to fix the bug. For example, someone could fix a bug in a file, and at the same time modify another file not involved in the fix, such as a README.md.
If you are certain that some files in a give commit are not involved in fixing bugs, you can tell the miner to ignore them as follow:
Example
from repominer.files import FixedFile from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.exclude_fixed_files = [ FixedFile(filepath='CHANGELOG.md', fic='f350e05696db1c5f78320483e0e44e7aea410449', bic=None), FixedFile(filepath='repominer/cli.py', fic='f350e05696db1c5f78320483e0e44e7aea410449', bic=None) ]
- Type
List[str]
-
fixing_commits
¶ List of bug-fixing commit hashes.
Bug-fixings commits are identified by the methods
get_fixing_commits_from_closed_issues
andget_fixing_commits_from_commit_messages
.Although, if you are certain that some commits fix bugs, e.g., because of a previous manual analysis, you can specify them in advance to speed up the mining as follows:
Example
from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.fixing_commits = ['f350e05696db1c5f78320483e0e44e7aea410449']
This is useful when you have to run the miner again on future commits, and you already have results from the past runs.
- Type
List[str]
-
fixed_files
¶ List of FixedFiles objects. Fixed files are files modified in bug-fixing commits.
They are identified by the method
get_fixed_files
. Unlikefixing_commits
, it cannot be used to inlude fixed file, as it resets at everyget_fixed_files
call. This is due to the algorithm used to identify them.- Type
List[FixedFile]
-
discard_undesired_fixing_commits
(commits: List[str]) → None¶ Discard undesired commits.
Given a list of commit hash, this method discard those that are deemed undesired. Undesired commits depends on the problem being formulated. For example, if the user is mining fixing-commits for Ansible, an undesired commit might be one modifying not-Ansible files.
Note, the update occurs in-place. That is, the original list is updated.
- Parameters
commits (List[str]) – List of commit hash
-
get_fixed_files
() → List[repominer.files.FixedFile]¶ Return a list of FixedFile objects.
A FixeFile is a file modified in a bug-fixing commit that consists of a filename, hash of the commit that fixed it, and hash of the commit that introduced the bug.
It uses the SZZ algorithm implemented in PyDriller to identify the oldest commit that introduced the bug, referred to as bug-introducing commit.
Note: before calling this method, it is necessary that you run at least one between get_fixing_commits_from_closed_issues and get_fixing_commits_from_commit_messages.
- Returns
List of FixedFile objects
- Return type
List[FixedFile]
-
get_fixing_commits_from_closed_issues
(labels: Set[str] = None) → List[str]¶ Return a list of bug-fixing commit hash.
This method returns the commits linked to issues closed and related to bugs (i.e., with labels bug, bugfix, etc.). GitHub and GitLab issue trackers link commits and corresponding issue reports, along with labels that are used to organize issues. The search of bug-related issues is based on the following defaults labels:
'bug', 'Bug', 'bug :bug:', 'Bug - Medium', 'Bug - Low', 'Bug - Critical', 'ansible_bug',
'Type: Bug', 'Type: bug', 'Type/Bug', 'type: bug 🐛', 'type:bug', 'type: bug', 'type/bug',
'kind/bug', 'kind/bugs', 'bug/bugfix', 'bugfix', 'critical-bug', '01 type: bug', 'bug_report',
'minor-bug'
Although, the user can specify different labels.
- Parameters
labels (Set[str]) – Set of bug-related labels (e.g., bug, bugfix, type: bug). If none is passed, the default labels are used.
- Returns
The list of bug-fixing commits hashes
- Return type
List[str]
-
get_fixing_commits_from_commit_messages
(regex: str = None) → List[str]¶ Return a list of bug-fixing commit hash.
This method returns the commits whose message indicates defective scripts. Specifically, when analyzing the commits messages, it first removes all words ending with bug or fix (apart of bugfix), since those terms can be affixes of other words as “debug” and “prefix”. A commit message is tagged as bug-fixing if it matches the following regular expression:
(bug|fix|error|crash|problem|fail|defect|patch)
Although, the user can specify a different regex.
Note: Beside returning the list of bug-fixing commits, it also updates the attribute
fixing_commits
.- Parameters
regex (str) – A regular expression to match against commits message to identify bug-fixing commits. If none is passed, the default regex is used.
- Returns
The list of bug-fixing commits hashes
- Return type
List[str]
-
ignore_file
(path_to_file: str, content: str = None) → bool¶ Ignore a file.
When looking for fixed files in
get_fixed_files
, you might want to consider only files with some characteristics, and ignore all the others. For example, when instantiating anToscaMiner
, this method ignore all the non-Ansible files, based on their filepath and content. That is, only files terminating with .yml, .yaml, or .tosca, or which content contains the keywordtosca_definitions_version
are kept.- Parameters
path_to_file (str) – The filepath (e.g., repominer/mining/base.py).
content (str) – The file content.
- Returns
True if the file must be ignore. False, otherwise.
- Return type
bool
-
label
() → Generator[repominer.files.FailureProneFile, None, None]¶ For each FixedFile object, yield a FailureProneFile object for each commit between the FixedFile’s bug-introducing-commit and its fixing-commit.
Note: make sure to run the method
get_fixed_files
before.- Yields
FailureProneFile – A FailureProneFile object.
-
sort_commits
(commits: List[str]) → None¶ Sort a list of commits in chronological order.
- Parameters
commits (List[str]) – List of commits hash to sort.
-
Ansible Miner¶
-
class
repominer.mining.ansible.
AnsibleMiner
(url_to_repo: str, branch: str = 'master')¶ Bases:
repominer.mining.base.BaseMiner
This class extends BaseMiner to mine Ansible-based repositories
-
__init__
(url_to_repo: str, branch: str = 'master')¶ The class constructor. Initialize a new BaseMiner.
- Parameters
url_to_repo (str) – the url to a remote Github or Gitlab repository
branch (str) – the branch to analyze. Default ‘master’
-
host
¶ Source Code Versioning host (‘github’ or ‘gitlab’). The value is automatically extracted from parameter
url_to_repo
.- Type
str
-
repository
¶ Repository full name (e.g., radon-h2020/radon-repository-miner). The value is automatically extracted from parameter
url_to_repo
.- Type
str
-
branch
¶ Repository’s branch to analyze.
- Type
str
-
commit_hashes
¶ List of commit hash on the repository’s branch, ordered by creation date.
- Type
List[str]
-
exclude_commits
¶ Set of commit hash to exclude from mining.
Mining bug-fixing commits might lead several false positives, i.e., commits that do not actually fix bugs. If you are certain that some commits do not fix bugs, before mining, you can specify their hash as follows:
Example
from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.exclude_commits = {'521515108c4fee9a4bd1147fc42936768297e3b6'}
- Type
Set[str]
-
exclude_fixed_files
¶ Set of fixed files to exclude from mining. Fixed files are files modified in a bug-fixing commit.
When fixing a bug, several files might be modified but not all of them contributed to fix the bug. For example, someone could fix a bug in a file, and at the same time modify another file not involved in the fix, such as a README.md.
If you are certain that some files in a give commit are not involved in fixing bugs, you can tell the miner to ignore them as follow:
Example
from repominer.files import FixedFile from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.exclude_fixed_files = [ FixedFile(filepath='CHANGELOG.md', fic='f350e05696db1c5f78320483e0e44e7aea410449', bic=None), FixedFile(filepath='repominer/cli.py', fic='f350e05696db1c5f78320483e0e44e7aea410449', bic=None) ]
- Type
List[str]
-
fixing_commits
¶ List of bug-fixing commit hashes.
Bug-fixings commits are identified by the methods
get_fixing_commits_from_closed_issues
andget_fixing_commits_from_commit_messages
.Although, if you are certain that some commits fix bugs, e.g., because of a previous manual analysis, you can specify them in advance to speed up the mining as follows:
Example
from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.fixing_commits = ['f350e05696db1c5f78320483e0e44e7aea410449']
This is useful when you have to run the miner again on future commits, and you already have results from the past runs.
- Type
List[str]
-
fixed_files
¶ List of FixedFiles objects. Fixed files are files modified in bug-fixing commits.
They are identified by the method
get_fixed_files
. Unlikefixing_commits
, it cannot be used to inlude fixed file, as it resets at everyget_fixed_files
call. This is due to the algorithm used to identify them.- Type
List[FixedFile]
-
discard_undesired_fixing_commits
(commits: List[str])¶ Given a list of commits, discard commits that do not modify at least one Ansible file.
Note, the update occurs in-place. That is, the original list is updated.
- Parameters
commits (List[str]) – List of commit hash
-
ignore_file
(path_to_file: str, content: str = None)¶ Ignore non-Ansible files.
- Parameters
path_to_file (str) – The filepath (e.g., repominer/mining/base.py).
content (str) – The file content.
- Returns
True if the file is not an Ansible file, and must be ignored. False, otherwise.
- Return type
bool
-
Tosca Miner¶
-
class
repominer.mining.tosca.
ToscaMiner
(url_to_repo: str, branch: str = 'master')¶ Bases:
repominer.mining.base.BaseMiner
This class extends the BaseMiner to mine TOSCA-based repositories
-
__init__
(url_to_repo: str, branch: str = 'master')¶ The class constructor. Initialize a new BaseMiner.
- Parameters
url_to_repo (str) – the url to a remote Github or Gitlab repository
branch (str) – the branch to analyze. Default ‘master’
-
host
¶ Source Code Versioning host (‘github’ or ‘gitlab’). The value is automatically extracted from parameter
url_to_repo
.- Type
str
-
repository
¶ Repository full name (e.g., radon-h2020/radon-repository-miner). The value is automatically extracted from parameter
url_to_repo
.- Type
str
-
branch
¶ Repository’s branch to analyze.
- Type
str
-
commit_hashes
¶ List of commit hash on the repository’s branch, ordered by creation date.
- Type
List[str]
-
exclude_commits
¶ Set of commit hash to exclude from mining.
Mining bug-fixing commits might lead several false positives, i.e., commits that do not actually fix bugs. If you are certain that some commits do not fix bugs, before mining, you can specify their hash as follows:
Example
from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.exclude_commits = {'521515108c4fee9a4bd1147fc42936768297e3b6'}
- Type
Set[str]
-
exclude_fixed_files
¶ Set of fixed files to exclude from mining. Fixed files are files modified in a bug-fixing commit.
When fixing a bug, several files might be modified but not all of them contributed to fix the bug. For example, someone could fix a bug in a file, and at the same time modify another file not involved in the fix, such as a README.md.
If you are certain that some files in a give commit are not involved in fixing bugs, you can tell the miner to ignore them as follow:
Example
from repominer.files import FixedFile from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.exclude_fixed_files = [ FixedFile(filepath='CHANGELOG.md', fic='f350e05696db1c5f78320483e0e44e7aea410449', bic=None), FixedFile(filepath='repominer/cli.py', fic='f350e05696db1c5f78320483e0e44e7aea410449', bic=None) ]
- Type
List[str]
-
fixing_commits
¶ List of bug-fixing commit hashes.
Bug-fixings commits are identified by the methods
get_fixing_commits_from_closed_issues
andget_fixing_commits_from_commit_messages
.Although, if you are certain that some commits fix bugs, e.g., because of a previous manual analysis, you can specify them in advance to speed up the mining as follows:
Example
from repominer.mining.base import BaseMiner miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner') miner.fixing_commits = ['f350e05696db1c5f78320483e0e44e7aea410449']
This is useful when you have to run the miner again on future commits, and you already have results from the past runs.
- Type
List[str]
-
fixed_files
¶ List of FixedFiles objects. Fixed files are files modified in bug-fixing commits.
They are identified by the method
get_fixed_files
. Unlikefixing_commits
, it cannot be used to inlude fixed file, as it resets at everyget_fixed_files
call. This is due to the algorithm used to identify them.- Type
List[FixedFile]
-
discard_undesired_fixing_commits
(commits: List[str])¶ Given a list of commits, discard commits that do not modify at least one Tosca file.
Note, the update occurs in-place. That is, the original list is updated.
- Parameters
commits (List[str]) – List of commit hash
-
ignore_file
(path_to_file: str, content: str = None)¶ Ignore non-TOSCA files.
- Parameters
path_to_file (str) – The filepath (e.g., repominer/mining/base.py).
content (str) – The file content.
- Returns
True if the file is not a TOSCA file, and must be ignored. False, otherwise.
- Return type
bool
-