A modulizable code differencing and AST extracting tool through mining Git commits.
This tool's approach was statically analyzed on 2023_KCSE_ACEA
ASTChangeAnalyzer is a tool that mines commit IDs from a designated URL, a local path, or .csv file path with lists of URL using JGIt API to extract two ASTs (Abstract Syntax Tree) of source codes from the commit and generate change information based on the edit script. The tool is designed to collect huge size of change information.
ASTChangeAnalyzer mines repositories using JGit
Parses and generate EditScript of source codes using GumTree and LAS
Generated edit script is abstracted by our own threshold including parent node type, changed node type, and edit type.
For detail follow this draft추상 구문 트리에 기반한 코드 변화 분석.pdf]
(Find the details of tools :
GumTree(https://github.com/GumTreeDiff/gumtree), LAS(https://github.com/thwak/LAS)
It has the following features:
- mining diff commits and extract source code
- converting a source file into a language-agnostic tree format (Java, Python, c, and c++ supported)
- compute the differences between the trees
- visualize these differences in different abstract levels in terms of edit script
- storing unique change patterns for collecting mass change data
- Mine issue keys for the corresponding change
- Cluster changes based on the abstract edit script.
- Required Options :
-
-p
option : provide an argument of a local path, an URL, or a file containing either list of local paths or URLs for the repository (absolute path if local path) -
-java
,-python
,c
options : choose 'java', 'python', 'c', or 'c++(cpp in command) and name the code differencing tool (default: GumTree). -
-gitClone
: this just clones a github repo/ or a repos from .csv listing urls taken from-p
option (path statically set). -
-changeCount
: this gives you the total number of changes from the given path from-p
option. -
-save
option : this option provides 3 thingsfirst, it clones (if not cloned yet), diffs, and produces `.chg` binary file per project at the path given as an argument second, it provides `Statistics.txt` file for further analysis. third, it updates or creates `index.csv` file that has hascode lists and project names mapped to index from `.chg` files.
-
-combine
option : this combines multiple.chg
files into one.chg
file (not implemented yet) -
-sample
option : takes an absolute path ofindex.csv
to provide 20 samples based on the median value. -
-hashcode
option : takes an hashcode along with-sample
option. -
-issueMine
option : readsindex.csv
file to produce new csv file that has issue number if exists in corresponding commits of changes.
Example : -p https://github.com/centic9/jgit-cookbook -java las -save
- Dependencies :
Required installation is internally done - no need!
(Currently, Gumtree execution file does not run on Windows environment)
- About
index.csv
file :
First column is hash code generated by SHA256 algorithm from abstract edit script cluster
Following columns are projectnamecommitIDfileName that belongs to the corresponding abstract edit script.
Yeawon Na, Zack CG Lee from ISEL