Skip to content

Commit 1f977ec

Browse files
committedJun 5, 2019
Separate and update READMEs
1 parent 900b0da commit 1f977ec

File tree

3 files changed

+80
-57
lines changed

3 files changed

+80
-57
lines changed
 

‎README.annotation_process.md

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
## Randomized Control Trials (RCTs), Prompts, and Answers
2+
There are three main types of data for this project, each of which will be described in depth in the following sections.
3+
4+
### RCTs
5+
In this project, we use texts from RCTs, or randomized control trials. These RCTs are articles that directly compare two different treatments. For example, a given article might want to determine the effectiveness of ibuprofen in counteracting headaches in comparison to other treatments, such as tylenol. These papers often tend to compare multiple treatments (i.e. ibuprofen, tylenol), and the effects with respect to various outcomes (i.e. headaches, pain).
6+
7+
### Prompts
8+
A prompt will be of the given form: "With respect to *outcome*, characterize the reported difference between patients receiving *intervention* and those receiving *comparator*." The prompt has the 3 fill-in-the-blanks, each of which lines up nicely with the RCT. For instance, if we use the example described in the RCTs section, we get:
9+
* **Outcome** = 'number of headaches'
10+
* **Intervention** = 'ibuprofen'
11+
* **Comparator** = 'tylenol'
12+
* "With respect to *number of headaches*, characterize the reported difference between patients receiving *ibuprofen* and those receiving *tylenol*"
13+
14+
A given article might have 10+ of these comparisons within. For example, if the RCT article also compared *ibuprofen* and *tylenol* with respect to *side effects*, this could also be used a prompt.
15+
16+
### Answers
17+
Given a prompt, we must characterize how the relationship of two different treatments with respect to an outcome. Let us use the prompt described previously:
18+
* "With respect to *number of headaches*, characterize the reported difference between patients receiving *ibuprofen* and those receiving *tylenol*"
19+
20+
There are three answers we could give: 'significantly increased', 'significantly decreased', 'no significant difference.' Take, for example, three sentences that *could* appear in an article, that would each result in a different outcome.
21+
1. **Significantly increased**: "Ibprofen relieved 60 headaches, while tylenol relieved 120; therefore ibuprofen is worse than tylenol for reducing the number of headaches (p < 0.05)."
22+
* This can be seen as an answer of significantly increased, since ibuprofen technically *increases* the chance of having a headache if you use it instead of tylenol. We can see this because more people benefited from the use of tylenol in comparison to ibuprofen.
23+
2. **Significantly decreased**: "Ibuprofen reduced the 2-times the number of headaches than tylenol, and therefore reduced a greater number headaches (p < 0.05)."
24+
* This is an answer of significantly decreased since ibuprofen **decreased** the number of headaches in comparison to tylenol.
25+
3. **No significant difference**: "Ibuprofen relieved more headaches than tylenol, but the difference was not statistically significant"
26+
* We only care about statistical significance here. In this case it is clear that there is no statistical difference between the two, warranting an answer of no significant difference.
27+
28+
As an answer, we would submit two things:
29+
1. The answer (significantly inc./dec./no-diff.).
30+
2. A quote from the text that supports our answer (one of the sentences described above).
31+
32+
33+
## Process Description
34+
Gathering the data is contained in 3 main processes: prompt generation, annotation, and verification. We hire M.D.s from Upwork to work on only one of the processes. We use flask and AWS to host servers for the M.D.s to work on.
35+
36+
### Prompt Generation
37+
A prompt generator is hired to look at a set of articles taken from PUBMED central open access. Each of these articles are RCT that are comparing multiple treatments with respect to various outcomes. These prompt generators look to find triplets of outcomes-interventions-comparators that fill in the following sentence: "With respect to outcome, characterize the reported difference between patients receiving intervention and those receiving comparator." In order to find these prompts, prompt generators will generally find sentences describing the actual result of the prompt they find. Thus, we ask prompt generators to not only select an answer to the prompt (how the relationship of the intervention and outcome w/ respect to comparator is defined), but also the reasoning of how they achieved their answer. The answer will be one of 'significantly increased', 'significantly decreased', or 'no significant difference', while the reasoning will be a direct quote from the text. For each article, the prompt generator attempts to find a max of five unique prompts.
38+
39+
The prompt generator instructions can be found here: http://www.ccs.neu.edu/home/lehmer16/prompt-gen-instruction/templates/instructions.html
40+
41+
### Annotator
42+
An annotator is given the article, and a prompt. The answer will be one of 'significantly increased', 'significantly decreased', or 'no significant difference', while the reasoning will be a direct quote from the text. The annotator only has access to the prompt and the article, and therefore must search the article for the evidence and the answer. Also, if the prompt is incoherent or simply invalid, then it can be marked as such. The annotators will also attemmpt to look for the answer in the abstract. If it is not available, then the annotators may look at the remaining sections of the article to find the answer.
43+
44+
The annotator instructions can be found here: http://www.ccs.neu.edu/home/lehmer16/annotation-instruction-written/templates/instructions.html
45+
46+
### Verifier
47+
The verifier is given the prompt, the article, the reasoning and answer of the annotator, and the reasoning and answer of the prompt generator. However, both pairs of reasoning and answers are presented as if they were both annotators. This is to ensure that the verifier does not potentially side with the prompt generator over the intuition that their answer would be more accurate. The verifier determines if all answers, and reasonings are valid. Similarly, the verifier will determine if the prompt is valid.
48+
49+
The verifier instructions can be found here : http://www.ccs.neu.edu/home/lehmer16/Verification-Instructions/instructions.html
50+
51+
A link to the description of the data can be found here: https://github.com/jayded/evidence-inference/tree/master/annotations
52+

‎README.md

+27-56
Original file line numberDiff line numberDiff line change
@@ -6,69 +6,40 @@ The dataset consists of biomedical articles describing randomized control trials
66

77
The dataset could be used for automatic data extraction of the results of a given RCT. This would enable readers to discover the effectiveness of different treatments without needing to read the paper.
88

9-
### Citation
9+
See [README.annotation_process.md](./README.annotation_process.md) for information about the annotation process.
1010

11-
Eric Lehman, Jay DeYoung, Regina Barzilay, and Byron C. Wallace. Inferring Which Medical Treatments Work from Reports of Clinical Trials. In NAACL (2019).
11+
## Data
1212

13-
When citing this project, please use the following bibtex citation:
14-
15-
@inproceedings{TBD,
16-
title = {{Inferring Which Medical Treatments Work from Reports of Clinical Trials}},
17-
author = {Lehman, Eric and DeYoung, Jay and Barzilay, Regina and Wallace, Byron C.},
18-
booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},
19-
year = {2019}
20-
}
21-
22-
23-
## Randomized Control Trials (RCTs), Prompts, and Answers
24-
There are three main types of data for this project, each of which will be described in depth in the following sections.
25-
26-
### RCTs
27-
In this project, we use texts from RCTs, or randomized control trials. These RCTs are articles that directly compare two different treatments. For example, a given article might want to determine the effectiveness of ibuprofen in counteracting headaches in comparison to other treatments, such as tylenol. These papers often tend to compare multiple treatments (i.e. ibuprofen, tylenol), and the effects with respect to various outcomes (i.e. headaches, pain).
13+
Raw documents are generated in both the PubMed nxml format and a plain text version suitable for human and machine readability (you can use your favorite tokenizer and model). Annotations are described in detail in [the annotation description](./annotations/README.md).
2814

29-
### Prompts
30-
A prompt will be of the given form: "With respect to *outcome*, characterize the reported difference between patients receiving *intervention* and those receiving *comparator*." The prompt has the 3 fill-in-the-blanks, each of which lines up nicely with the RCT. For instance, if we use the example described in the RCTs section, we get:
31-
* **Outcome** = 'number of headaches'
32-
* **Intervention** = 'ibuprofen'
33-
* **Comparator** = 'tylenol'
34-
* "With respect to *number of headaches*, characterize the reported difference between patients receiving *ibuprofen* and those receiving *tylenol*"
35-
36-
A given article might have 10+ of these comparisons within. For example, if the RCT article also compared *ibuprofen* and *tylenol* with respect to *side effects*, this could also be used a prompt.
15+
We distribute annotation in a csv format ([prompts](./annotations/prompts_merged.csv) and [labels](./annotations/annotations_merged.csv)). If you prefer to work with a json format, we provide a [script](./evidence_inference/preprocess/convert_annotations_to_json.py) to convert from the csv format.
3716

38-
### Answers
39-
Given a prompt, we must characterize how the relationship of two different treatments with respect to an outcome. Let us use the prompt described previously:
40-
* "With respect to *number of headaches*, characterize the reported difference between patients receiving *ibuprofen* and those receiving *tylenol*"
17+
## Reproduction
4118

42-
There are three answers we could give: 'significantly increased', 'significantly decreased', 'no significant difference.' Take, for example, three sentences that *could* appear in an article, that would each result in a different outcome.
43-
1. **Significantly increased**: "Ibprofen relieved 60 headaches, while tylenol relieved 120; therefore ibuprofen is worse than tylenol for reducing the number of headaches (p < 0.05)."
44-
* This can be seen as an answer of significantly increased, since ibuprofen technically *increases* the chance of having a headache if you use it instead of tylenol. We can see this because more people benefited from the use of tylenol in comparison to ibuprofen.
45-
2. **Significantly decreased**: "Ibuprofen reduced the 2-times the number of headaches than tylenol, and therefore reduced a greater number headaches (p < 0.05)."
46-
* This is an answer of significantly decreased since ibuprofen **decreased** the number of headaches in comparison to tylenol.
47-
3. **No significant difference**: "Ibuprofen relieved more headaches than tylenol, but the difference was not statistically significant"
48-
* We only care about statistical significance here. In this case it is clear that there is no statistical difference between the two, warranting an answer of no significant difference.
49-
50-
As an answer, we would submit two things:
51-
1. The answer (significantly inc./dec./no-diff.).
52-
2. A quote from the text that supports our answer (one of the sentences described above).
53-
19+
See [SETUP.md](./SETUP.md) for information about how to configure and reproduce primary paper results.
5420

55-
## Process Description
56-
Gathering the data is contained in 3 main processes: prompt generation, annotation, and verification. We hire M.D.s from Upwork to work on only one of the processes. We use flask and AWS to host servers for the M.D.s to work on.
21+
## Citation
5722

58-
### Prompt Generation
59-
A prompt generator is hired to look at a set of articles taken from PUBMED central open access. Each of these articles are RCT that are comparing multiple treatments with respect to various outcomes. These prompt generators look to find triplets of outcomes-interventions-comparators that fill in the following sentence: "With respect to outcome, characterize the reported difference between patients receiving intervention and those receiving comparator." In order to find these prompts, prompt generators will generally find sentences describing the actual result of the prompt they find. Thus, we ask prompt generators to not only select an answer to the prompt (how the relationship of the intervention and outcome w/ respect to comparator is defined), but also the reasoning of how they achieved their answer. The answer will be one of 'significantly increased', 'significantly decreased', or 'no significant difference', while the reasoning will be a direct quote from the text. For each article, the prompt generator attempts to find a max of five unique prompts.
23+
### Standard Form Citation
6024

61-
The prompt generator instructions can be found here: http://www.ccs.neu.edu/home/lehmer16/prompt-gen-instruction/templates/instructions.html
25+
Eric Lehman, Jay DeYoung, Regina Barzilay, and Byron C. Wallace. 2019. Inferring which medical treatments work from reports of clinical trials. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3705–3717, Minneapolis, Minnesota. Association for Computational Linguistics.
6226

63-
### Annotator
64-
An annotator is given the article, and a prompt. The answer will be one of 'significantly increased', 'significantly decreased', or 'no significant difference', while the reasoning will be a direct quote from the text. The annotator only has access to the prompt and the article, and therefore must search the article for the evidence and the answer. Also, if the prompt is incoherent or simply invalid, then it can be marked as such. The annotators will also attemmpt to look for the answer in the abstract. If it is not available, then the annotators may look at the remaining sections of the article to find the answer.
65-
66-
The annotator instructions can be found here: http://www.ccs.neu.edu/home/lehmer16/annotation-instruction-written/templates/instructions.html
67-
68-
### Verifier
69-
The verifier is given the prompt, the article, the reasoning and answer of the annotator, and the reasoning and answer of the prompt generator. However, both pairs of reasoning and answers are presented as if they were both annotators. This is to ensure that the verifier does not potentially side with the prompt generator over the intuition that their answer would be more accurate. The verifier determines if all answers, and reasonings are valid. Similarly, the verifier will determine if the prompt is valid.
70-
71-
The verifier instructions can be found here : http://www.ccs.neu.edu/home/lehmer16/Verification-Instructions/instructions.html
72-
73-
A link to the description of the data can be found here: https://github.com/jayded/evidence-inference/tree/master/annotations
27+
### Bibtex Citation
28+
When citing this project, please use the following bibtex citation:
7429

30+
<pre>
31+
@inproceedings{lehman-etal-2019-inferring,
32+
title = "Inferring Which Medical Treatments Work from Reports of Clinical Trials",
33+
author = "Lehman, Eric and
34+
DeYoung, Jay and
35+
Barzilay, Regina and
36+
Wallace, Byron C.",
37+
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
38+
month = jun,
39+
year = "2019",
40+
address = "Minneapolis, Minnesota",
41+
publisher = "Association for Computational Linguistics",
42+
url = "https://www.aclweb.org/anthology/N19-1371",
43+
pages = "3705--3717",
44+
}
45+
</pre>

‎SETUP.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Download embeddings to "embeddings" from
22
http://evexdb.org/pmresources/vec-space-models/PubMed-w2v.bin
33

44
You should create a virtualenv satisfying requirements.txt. We recommend using
5-
conda. You will need to install a very recent PyTorch (1.0-nightly or later).
5+
conda. You will need to install a very recent PyTorch (1.1 or later).
66

77
Experiments were run on a mix of 1080Tis, K40ms, K80ms, and P100s.
88
You should be able to (approximately) reproduce the main experiments via the

0 commit comments

Comments
 (0)