Miners

Base Miner

class repominer.mining.base.BaseMiner(url_to_repo: str, branch: str = 'master')

This is the base class to miner a software repositories.

It allows for mining bug-fixing commits, fixed files, and bug-introducing commits. It can be extended to instantiate the miner in the context of specific languages (e.g., Ansible and Tosca).

__init__(url_to_repo: str, branch: str = 'master')

The class constructor. Initialize a new BaseMiner.

Parameters
  • url_to_repo (str) – the url to a remote Github or Gitlab repository

  • branch (str) – the branch to analyze. Default ‘master’

host

Source Code Versioning host (‘github’ or ‘gitlab’). The value is automatically extracted from parameter url_to_repo.

Type

str

repository

Repository full name (e.g., radon-h2020/radon-repository-miner). The value is automatically extracted from parameter url_to_repo.

Type

str

branch

Repository’s branch to analyze.

Type

str

commit_hashes

List of commit hash on the repository’s branch, ordered by creation date.

Type

List[str]

exclude_commits

Set of commit hash to exclude from mining.

Mining bug-fixing commits might lead several false positives, i.e., commits that do not actually fix bugs. If you are certain that some commits do not fix bugs, before mining, you can specify their hash as follows:

Example

from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.exclude_commits = {'521515108c4fee9a4bd1147fc42936768297e3b6'}
Type

Set[str]

exclude_fixed_files

Set of fixed files to exclude from mining. Fixed files are files modified in a bug-fixing commit.

When fixing a bug, several files might be modified but not all of them contributed to fix the bug. For example, someone could fix a bug in a file, and at the same time modify another file not involved in the fix, such as a README.md.

If you are certain that some files in a give commit are not involved in fixing bugs, you can tell the miner to ignore them as follow:

Example

from repominer.files import FixedFile
from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.exclude_fixed_files = [
    FixedFile(filepath='CHANGELOG.md',
              fic='f350e05696db1c5f78320483e0e44e7aea410449',
              bic=None),
    FixedFile(filepath='repominer/cli.py',
              fic='f350e05696db1c5f78320483e0e44e7aea410449',
              bic=None)
]
Type

List[str]

fixing_commits

List of bug-fixing commit hashes.

Bug-fixings commits are identified by the methods get_fixing_commits_from_closed_issues and get_fixing_commits_from_commit_messages.

Although, if you are certain that some commits fix bugs, e.g., because of a previous manual analysis, you can specify them in advance to speed up the mining as follows:

Example

from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.fixing_commits = ['f350e05696db1c5f78320483e0e44e7aea410449']

This is useful when you have to run the miner again on future commits, and you already have results from the past runs.

Type

List[str]

fixed_files

List of FixedFiles objects. Fixed files are files modified in bug-fixing commits.

They are identified by the method get_fixed_files. Unlike fixing_commits, it cannot be used to inlude fixed file, as it resets at every get_fixed_files call. This is due to the algorithm used to identify them.

Type

List[FixedFile]

discard_undesired_fixing_commits(commits: List[str]) → None

Discard undesired commits.

Given a list of commit hash, this method discard those that are deemed undesired. Undesired commits depends on the problem being formulated. For example, if the user is mining fixing-commits for Ansible, an undesired commit might be one modifying not-Ansible files.

Note, the update occurs in-place. That is, the original list is updated.

Parameters

commits (List[str]) – List of commit hash

get_fixed_files() → List[repominer.files.FixedFile]

Return a list of FixedFile objects.

A FixeFile is a file modified in a bug-fixing commit that consists of a filename, hash of the commit that fixed it, and hash of the commit that introduced the bug.

It uses the SZZ algorithm implemented in PyDriller to identify the oldest commit that introduced the bug, referred to as bug-introducing commit.

Note: before calling this method, it is necessary that you run at least one between get_fixing_commits_from_closed_issues and get_fixing_commits_from_commit_messages.

Returns

List of FixedFile objects

Return type

List[FixedFile]

get_fixing_commits_from_closed_issues(labels: Set[str] = None) → List[str]

Return a list of bug-fixing commit hash.

This method returns the commits linked to issues closed and related to bugs (i.e., with labels bug, bugfix, etc.). GitHub and GitLab issue trackers link commits and corresponding issue reports, along with labels that are used to organize issues. The search of bug-related issues is based on the following defaults labels:

'bug', 'Bug', 'bug :bug:', 'Bug - Medium', 'Bug - Low', 'Bug - Critical', 'ansible_bug', 'Type: Bug', 'Type: bug', 'Type/Bug', 'type: bug 🐛', 'type:bug', 'type: bug', 'type/bug', 'kind/bug', 'kind/bugs', 'bug/bugfix', 'bugfix', 'critical-bug', '01 type: bug', 'bug_report', 'minor-bug'

Although, the user can specify different labels.

Parameters

labels (Set[str]) – Set of bug-related labels (e.g., bug, bugfix, type: bug). If none is passed, the default labels are used.

Returns

The list of bug-fixing commits hashes

Return type

List[str]

get_fixing_commits_from_commit_messages(regex: str = None) → List[str]

Return a list of bug-fixing commit hash.

This method returns the commits whose message indicates defective scripts. Specifically, when analyzing the commits messages, it first removes all words ending with bug or fix (apart of bugfix), since those terms can be affixes of other words as “debug” and “prefix”. A commit message is tagged as bug-fixing if it matches the following regular expression:

(bug|fix|error|crash|problem|fail|defect|patch)

Although, the user can specify a different regex.

Note: Beside returning the list of bug-fixing commits, it also updates the attribute fixing_commits.

Parameters

regex (str) – A regular expression to match against commits message to identify bug-fixing commits. If none is passed, the default regex is used.

Returns

The list of bug-fixing commits hashes

Return type

List[str]

ignore_file(path_to_file: str, content: str = None) → bool

Ignore a file.

When looking for fixed files in get_fixed_files, you might want to consider only files with some characteristics, and ignore all the others. For example, when instantiating an ToscaMiner, this method ignore all the non-Ansible files, based on their filepath and content. That is, only files terminating with .yml, .yaml, or .tosca, or which content contains the keyword tosca_definitions_version are kept.

Parameters
  • path_to_file (str) – The filepath (e.g., repominer/mining/base.py).

  • content (str) – The file content.

Returns

True if the file must be ignore. False, otherwise.

Return type

bool

label() → Generator[repominer.files.FailureProneFile, None, None]

For each FixedFile object, yield a FailureProneFile object for each commit between the FixedFile’s bug-introducing-commit and its fixing-commit.

Note: make sure to run the method get_fixed_files before.

Yields

FailureProneFile – A FailureProneFile object.

sort_commits(commits: List[str]) → None

Sort a list of commits in chronological order.

Parameters

commits (List[str]) – List of commits hash to sort.

Ansible Miner

class repominer.mining.ansible.AnsibleMiner(url_to_repo: str, branch: str = 'master')

Bases: repominer.mining.base.BaseMiner

This class extends BaseMiner to mine Ansible-based repositories

__init__(url_to_repo: str, branch: str = 'master')

The class constructor. Initialize a new BaseMiner.

Parameters
  • url_to_repo (str) – the url to a remote Github or Gitlab repository

  • branch (str) – the branch to analyze. Default ‘master’

host

Source Code Versioning host (‘github’ or ‘gitlab’). The value is automatically extracted from parameter url_to_repo.

Type

str

repository

Repository full name (e.g., radon-h2020/radon-repository-miner). The value is automatically extracted from parameter url_to_repo.

Type

str

branch

Repository’s branch to analyze.

Type

str

commit_hashes

List of commit hash on the repository’s branch, ordered by creation date.

Type

List[str]

exclude_commits

Set of commit hash to exclude from mining.

Mining bug-fixing commits might lead several false positives, i.e., commits that do not actually fix bugs. If you are certain that some commits do not fix bugs, before mining, you can specify their hash as follows:

Example

from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.exclude_commits = {'521515108c4fee9a4bd1147fc42936768297e3b6'}
Type

Set[str]

exclude_fixed_files

Set of fixed files to exclude from mining. Fixed files are files modified in a bug-fixing commit.

When fixing a bug, several files might be modified but not all of them contributed to fix the bug. For example, someone could fix a bug in a file, and at the same time modify another file not involved in the fix, such as a README.md.

If you are certain that some files in a give commit are not involved in fixing bugs, you can tell the miner to ignore them as follow:

Example

from repominer.files import FixedFile
from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.exclude_fixed_files = [
    FixedFile(filepath='CHANGELOG.md',
              fic='f350e05696db1c5f78320483e0e44e7aea410449',
              bic=None),
    FixedFile(filepath='repominer/cli.py',
              fic='f350e05696db1c5f78320483e0e44e7aea410449',
              bic=None)
]
Type

List[str]

fixing_commits

List of bug-fixing commit hashes.

Bug-fixings commits are identified by the methods get_fixing_commits_from_closed_issues and get_fixing_commits_from_commit_messages.

Although, if you are certain that some commits fix bugs, e.g., because of a previous manual analysis, you can specify them in advance to speed up the mining as follows:

Example

from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.fixing_commits = ['f350e05696db1c5f78320483e0e44e7aea410449']

This is useful when you have to run the miner again on future commits, and you already have results from the past runs.

Type

List[str]

fixed_files

List of FixedFiles objects. Fixed files are files modified in bug-fixing commits.

They are identified by the method get_fixed_files. Unlike fixing_commits, it cannot be used to inlude fixed file, as it resets at every get_fixed_files call. This is due to the algorithm used to identify them.

Type

List[FixedFile]

discard_undesired_fixing_commits(commits: List[str])

Given a list of commits, discard commits that do not modify at least one Ansible file.

Note, the update occurs in-place. That is, the original list is updated.

Parameters

commits (List[str]) – List of commit hash

ignore_file(path_to_file: str, content: str = None)

Ignore non-Ansible files.

Parameters
  • path_to_file (str) – The filepath (e.g., repominer/mining/base.py).

  • content (str) – The file content.

Returns

True if the file is not an Ansible file, and must be ignored. False, otherwise.

Return type

bool

Tosca Miner

class repominer.mining.tosca.ToscaMiner(url_to_repo: str, branch: str = 'master')

Bases: repominer.mining.base.BaseMiner

This class extends the BaseMiner to mine TOSCA-based repositories

__init__(url_to_repo: str, branch: str = 'master')

The class constructor. Initialize a new BaseMiner.

Parameters
  • url_to_repo (str) – the url to a remote Github or Gitlab repository

  • branch (str) – the branch to analyze. Default ‘master’

host

Source Code Versioning host (‘github’ or ‘gitlab’). The value is automatically extracted from parameter url_to_repo.

Type

str

repository

Repository full name (e.g., radon-h2020/radon-repository-miner). The value is automatically extracted from parameter url_to_repo.

Type

str

branch

Repository’s branch to analyze.

Type

str

commit_hashes

List of commit hash on the repository’s branch, ordered by creation date.

Type

List[str]

exclude_commits

Set of commit hash to exclude from mining.

Mining bug-fixing commits might lead several false positives, i.e., commits that do not actually fix bugs. If you are certain that some commits do not fix bugs, before mining, you can specify their hash as follows:

Example

from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.exclude_commits = {'521515108c4fee9a4bd1147fc42936768297e3b6'}
Type

Set[str]

exclude_fixed_files

Set of fixed files to exclude from mining. Fixed files are files modified in a bug-fixing commit.

When fixing a bug, several files might be modified but not all of them contributed to fix the bug. For example, someone could fix a bug in a file, and at the same time modify another file not involved in the fix, such as a README.md.

If you are certain that some files in a give commit are not involved in fixing bugs, you can tell the miner to ignore them as follow:

Example

from repominer.files import FixedFile
from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.exclude_fixed_files = [
    FixedFile(filepath='CHANGELOG.md',
              fic='f350e05696db1c5f78320483e0e44e7aea410449',
              bic=None),
    FixedFile(filepath='repominer/cli.py',
              fic='f350e05696db1c5f78320483e0e44e7aea410449',
              bic=None)
]
Type

List[str]

fixing_commits

List of bug-fixing commit hashes.

Bug-fixings commits are identified by the methods get_fixing_commits_from_closed_issues and get_fixing_commits_from_commit_messages.

Although, if you are certain that some commits fix bugs, e.g., because of a previous manual analysis, you can specify them in advance to speed up the mining as follows:

Example

from repominer.mining.base import BaseMiner

miner = BaseMiner('https://github.com/radon-h2020/radon-repository-miner')
miner.fixing_commits = ['f350e05696db1c5f78320483e0e44e7aea410449']

This is useful when you have to run the miner again on future commits, and you already have results from the past runs.

Type

List[str]

fixed_files

List of FixedFiles objects. Fixed files are files modified in bug-fixing commits.

They are identified by the method get_fixed_files. Unlike fixing_commits, it cannot be used to inlude fixed file, as it resets at every get_fixed_files call. This is due to the algorithm used to identify them.

Type

List[FixedFile]

discard_undesired_fixing_commits(commits: List[str])

Given a list of commits, discard commits that do not modify at least one Tosca file.

Note, the update occurs in-place. That is, the original list is updated.

Parameters

commits (List[str]) – List of commit hash

ignore_file(path_to_file: str, content: str = None)

Ignore non-TOSCA files.

Parameters
  • path_to_file (str) – The filepath (e.g., repominer/mining/base.py).

  • content (str) – The file content.

Returns

True if the file is not a TOSCA file, and must be ignored. False, otherwise.

Return type

bool