This is the README file for the reproduction package of the paper: "Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study".
The package contains the following artefacts:
- data_bigvul: contains the original and latent vulnerable functions and non-vulnerable functions from the Big-Vul dataset
- data_devign: contains the original and latent vulnerable functions and non-vulnerable functions from the Devign dataset
- linevul: contains original code from the state-of-the-art LineVul vulnerability prediction model
- manual_analysis_rq: contains manual labeling and analysis of 140 samples of latent vulnerable functions from the Big-Vul and Devign datasets, as described in RQ1.
The size of the data, i.e., the data_bigvul
and data_devign
folders, is large, so they are not included in the GitHub repository. Please download them from this link instead.
Before running any code, please install all the required Python packages using the following command: pip install -r requirements.txt
The next steps are to run the scripts to generate the data splits for training, validating, and testing the models. The scripts for Big-Vul are python extract_data_bigvul.py
and python extract_data_bigvul_latest.py
. The scripts for Devign are python extract_data_devign.py
and python extract_data_devign_latest.py
. Note that the *_latest.py
files are for splitting the data for the LIC scenario (having the same validation and testing sets, yet with different input latent vulnerable functions).
For RQ2 and RQ3, train and evaluate models by running sbatch
with either evaluate_linevul_bigvul.sh
(Baseline + Models with V-SZZ based latent SVs), evaluate_linevul_bigvul_latest.sh
(Models with V-SZZ based latent SVs + Latest Introducing Commit), evaluate_linevul_bigvul_predict.sh
(Models with V-SZZ based latent SVs + Self-Training), or evaluate_linevul_bigvul_centroid.sh
(Models with V-SZZ based latent SVs + Centroid-based Removal)
These scripts will generate both results for the function and line levels.
Note that after these training/evaluation scripts finish, they will generate output folders, i.e., results_bigvul/
and results_devign/
containing the results (.csv files) for the RQ2 and RQ3 models, respectively.
These .csv result files can then be used for analysis and comparison as described in the paper.
The file Statistical-test-results_RQ2_RQ3.xlsx
contains the results of the statistical tests for comparing the performance between the optimal models with latent SVs with the baseline models without latent SVs for both function-level and line-level tasks/metrics, according to Table 3 in the paper. Note that we only include the results in which using latent SVs was better than the baseline.
The file Low_Resource_Latent_SVs_Results.xlsx
include the results of using a portion of original vulnerabilities (SVs) together with latent SVs with respect to using all original SVs, as discussed in "Use of Latent SVs for Low-Resource Projects" in section 7.1. Specifically, Devign required 50% of the original SVs + latent SVs, while Big-Vul required 70% of original SVs + latent SVs. These results mean that 60% of original SVs + latent SVs on average were required to surpass the performance of using all original SVs, as reported in the paper.