This repository contains the data and the random forest algorithm from the paper. Arxiv version of the paper is available (Caplar_Tacchella_Birrer_Quantitative-evaluation-gender.pdf
). Note that the Arxiv version differs from the published version in Nature Astronomy in style and the amount of content, as Nature Astronomy asked for more succinct version of the findings.
Data_Gender_Bias_In_Astronomy.csv
= dataset in *.csv form, columns as in Table 1A and 1B in the manuscript
dataCleanedWithCountryAndRankingSuperCleanedSandro
= cleaned dataset used in the analysis. Import in Wolfram Mathematica by navigating to the directory of the data and simply using Get["dataMathematicaForm"]. Columns are:
- paper id,
- name as it appears in the publication,
- full name, deduced from the whole database,
- last name,
- sex,
- year of first publication,
- number of citations,
- number of references,
- number of authors,
- institution,
- year of publication,
- journal,
- field (1-6, see Table 1 from the paper),
- number of floats in the manuscript,
- number of equations in the manuscript,
- number of math inline in the manuscript,
- number of words in the manuscript,
- id of first paper by the same author
Random_Forest
= folder with random forest algorithm. Inside this folder you can find:
Gender_Random_Forest.ipynb
= ipython routine which does the main part of the analysisGender_Random_Forest_Visualization.nb
= Wolfram Mathematica notebook to visualize the resultsmaleset
,femaleset
,Male_Train
,Male_Test
,Female
= auxiliary files from the analysis and visualization parts of the algorithm
For problems with using the code or installation use GitHub issues page or send us an email.