Hidden Figures
A pipeline for inferring gender for acknowledged individuals in scientific literature on a massive scale
An investigation into the acknowledgments section of research articles within PubMed Central. Prior literature suggests there is a gender discrepancy between men and women in authorship and acknowledgment. Specifically, it has been observed that women were more likely to be acknowledged rather than the author list, in a small sample of theoretical population genetics publications. We tested this observation on a large-scale across biomedical research articles and investigated the contributions of acknowledged individuals.
- Women are more likely to be on the acknowledgments than the author list would suggest.
- The acknowledgment for the types of tasks for men and women differ.
- The type of praise given men and women differ (fruitful discussion, outstanding analysis).
- These trends change over time, reflecting more equality.
Few large-scale studies have been conducted on acknowledgments in research articles; our study is novel in size and scope. Notable previous studies:
- extracted acknowledgments sections from articles in CiteSeerX
- identified individuals and organizations
- build network graph of acknowledged entities and authors
- extracted acknowledgements sections from articles in Web of Science
- identified acknowledged contributions
- analyzed trends in contributions by field of study
Source: PMC FTP
The PMC XML files have an <ack>
tag for the Acknowledgments section.
For example, consider PMC 4959138:
<ack>
<p>
We thank Alexia Prskawetz for the fruitful discussions and remarks.
Further on, we would like to thank the referees and editors for their
valuable comments. This research was partly supported by the Austrian
Science Fund (FWF) under Grant No. P25979-N25 and is an extract out of
the Ph.D. thesis (Moser <xref ref-type="bibr" rid="CR30">2014</xref>).
</p>
</ack>
Sentence parsing using spaCy
Extract names and infer gender using genderize
Acknowledgments and, to a lesser extent, authorship is skewed toward men.
For the PubMed Central subset with acknowledgments (PMCA):
- Number of pubs in PMCA with authors with identifiable genders: 312,237
- Fraction of women in PMCA in the pubs: 0.424
- Fraction of women on PMCA in the acknowledgments: 0.233
- Median number of people on an acknowledgments: 5
- Most acknowledgments are uni-gender: 80%
- Most of these uni-gender acknowledgments are all-male 202,150 vs 47,105
- Publications with acknowledgments have a much higher RCR than those without 0.8 vs 0.4.
Acknowledgment Name Parsing Error | Occurrence | PMCID | Example |
---|---|---|---|
Author's Name Listed | 4.5% | PMC3339585 | Smriti Shrivastava is thankful to CSIR for Senior Research Fellowship |
Fellowship Name | 2.0% | PMC5864053 | J.S. was funded by a Biotechnology and Biological Sciences Research Council (BBSRC) David Phillips Fellowship (BB/L024551/1) |
Organization Name | 2.0% | PMC4160263 | National Institute of Biomedical Imaging and Bioengineering Grant R01 EB006745 Stanford Bio-X, the American Heart Association (Western States Affiliates) |
Award Name | 1.5% | PMC4189622 | Seed Grant provided by Michigan Technological University (MTU) |
Disclosure | 1.5% | PMC4147052 | In addition, Jin Jin also holds stock in Eli Lilly |
Dedication | 0.5% | PMC4831668 | This paper is dedicated to José Luis García Ruano on occasion of his retirement |
Extract MeSH terms and analyze based on presence of acknowledgment
MeSH terms from PMC articles without acknowledgments tend to be clincally-focused.
MeSH terms from PMC articles with acknowledgments tend to focus on fundamental research.
Words associated with acknowledged individuals, colored by gender: purple words are predominantly associated with men and green words are predominantly associated with women; grey is used for words that are equally associated with both genders. Larger words appear more frequently. Gender-specific words were preferentially selected.
Nouns
Verbs
We manually curated a list of keywords to group acknowledgements into six categories based on the type of contribution being acknowledged: Manuscript, Coordination, Procedure, Analysis, Materials, and Advice. For each category, we calculated the representation of female names.
Category | Percent of female names |
---|---|
manuscript | 52.2% |
coordination | 50.2% |
procedure | 43.6% |
analysis | 42.2% |
materials | 37.4% |
advice | 32.7% |
- Literature review
- Historical acknowledgments research
- Gender in authorship/acknowledgments
- Analysis of features
- Extract acknowledgments from PMC
- Analyze acknowledgment features
- % PMC coverage, years, journals, MeSH terms, etc.
- False negatives?
- Natural Language Processing (NLP)
- Names → infer gender with genderize.io
- Organizations and objects
- Acknowledged tasks
- Task modifiers (stretch goal)