The project involves researching two main questions related to developer readability:
- Do developers intentionally increase readability?
- Can we predict whether the readability of a file will be improved?
The procedure of the project involves: Collecting data from repositories and Extracting information via Pydriller. The data is then saved to a file and Analyzed using Jupyter Notebook, Pandas, Scipy, Matplotlib, and Scikit-learn. Two word clouds are created, and a set of keywords is identified to identify commits where the intent of the developer is to increase readability for RQ1.
To do this project, it will be needed to use Python and various data analysis tools to collect and analyze data to answer the research questions.
The procedure of the project involves mining multiple repositories for code readability data and using data analysis tools to extract and analyze the data. The extracted data is used to answer the research questions through statistical analysis and predictive modeling.
The algorithms that have been used for this project include Pydriller for mining repositories, a readability tool to extract the readability of code files, and a Random Forest model for predictive modeling.
The algorithms used for this project are Random Forest models to predict readability increase and text preprocessing steps such as text cleaning, stop word removal, and stemming.
Packages used in this project are Pydriller Jupyterlab, Pandas, Scipy, Matplotlib Scikit-learn.