You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Randomized Control Trials (RCTs), Prompts, and Answers
2
+
There are three main types of data for this project, each of which will be described in depth in the following sections.
3
+
4
+
### RCTs
5
+
In this project, we use texts from RCTs, or randomized control trials. These RCTs are articles that directly compare two different treatments. For example, a given article might want to determine the effectiveness of ibuprofen in counteracting headaches in comparison to other treatments, such as tylenol. These papers often tend to compare multiple treatments (i.e. ibuprofen, tylenol), and the effects with respect to various outcomes (i.e. headaches, pain).
6
+
7
+
### Prompts
8
+
A prompt will be of the given form: "With respect to *outcome*, characterize the reported difference between patients receiving *intervention* and those receiving *comparator*." The prompt has the 3 fill-in-the-blanks, each of which lines up nicely with the RCT. For instance, if we use the example described in the RCTs section, we get:
9
+
***Outcome** = 'number of headaches'
10
+
***Intervention** = 'ibuprofen'
11
+
***Comparator** = 'tylenol'
12
+
* "With respect to *number of headaches*, characterize the reported difference between patients receiving *ibuprofen* and those receiving *tylenol*"
13
+
14
+
A given article might have 10+ of these comparisons within. For example, if the RCT article also compared *ibuprofen* and *tylenol* with respect to *side effects*, this could also be used a prompt.
15
+
16
+
### Answers
17
+
Given a prompt, we must characterize how the relationship of two different treatments with respect to an outcome. Let us use the prompt described previously:
18
+
* "With respect to *number of headaches*, characterize the reported difference between patients receiving *ibuprofen* and those receiving *tylenol*"
19
+
20
+
There are three answers we could give: 'significantly increased', 'significantly decreased', 'no significant difference.' Take, for example, three sentences that *could* appear in an article, that would each result in a different outcome.
21
+
1.**Significantly increased**: "Ibprofen relieved 60 headaches, while tylenol relieved 120; therefore ibuprofen is worse than tylenol for reducing the number of headaches (p < 0.05)."
22
+
* This can be seen as an answer of significantly increased, since ibuprofen technically *increases* the chance of having a headache if you use it instead of tylenol. We can see this because more people benefited from the use of tylenol in comparison to ibuprofen.
23
+
2.**Significantly decreased**: "Ibuprofen reduced the 2-times the number of headaches than tylenol, and therefore reduced a greater number headaches (p < 0.05)."
24
+
* This is an answer of significantly decreased since ibuprofen **decreased** the number of headaches in comparison to tylenol.
25
+
3.**No significant difference**: "Ibuprofen relieved more headaches than tylenol, but the difference was not statistically significant"
26
+
* We only care about statistical significance here. In this case it is clear that there is no statistical difference between the two, warranting an answer of no significant difference.
27
+
28
+
As an answer, we would submit two things:
29
+
1. The answer (significantly inc./dec./no-diff.).
30
+
2. A quote from the text that supports our answer (one of the sentences described above).
31
+
32
+
33
+
## Process Description
34
+
Gathering the data is contained in 3 main processes: prompt generation, annotation, and verification. We hire M.D.s from Upwork to work on only one of the processes. We use flask and AWS to host servers for the M.D.s to work on.
35
+
36
+
### Prompt Generation
37
+
A prompt generator is hired to look at a set of articles taken from PUBMED central open access. Each of these articles are RCT that are comparing multiple treatments with respect to various outcomes. These prompt generators look to find triplets of outcomes-interventions-comparators that fill in the following sentence: "With respect to outcome, characterize the reported difference between patients receiving intervention and those receiving comparator." In order to find these prompts, prompt generators will generally find sentences describing the actual result of the prompt they find. Thus, we ask prompt generators to not only select an answer to the prompt (how the relationship of the intervention and outcome w/ respect to comparator is defined), but also the reasoning of how they achieved their answer. The answer will be one of 'significantly increased', 'significantly decreased', or 'no significant difference', while the reasoning will be a direct quote from the text. For each article, the prompt generator attempts to find a max of five unique prompts.
38
+
39
+
The prompt generator instructions can be found here: http://www.ccs.neu.edu/home/lehmer16/prompt-gen-instruction/templates/instructions.html
40
+
41
+
### Annotator
42
+
An annotator is given the article, and a prompt. The answer will be one of 'significantly increased', 'significantly decreased', or 'no significant difference', while the reasoning will be a direct quote from the text. The annotator only has access to the prompt and the article, and therefore must search the article for the evidence and the answer. Also, if the prompt is incoherent or simply invalid, then it can be marked as such. The annotators will also attemmpt to look for the answer in the abstract. If it is not available, then the annotators may look at the remaining sections of the article to find the answer.
43
+
44
+
The annotator instructions can be found here: http://www.ccs.neu.edu/home/lehmer16/annotation-instruction-written/templates/instructions.html
45
+
46
+
### Verifier
47
+
The verifier is given the prompt, the article, the reasoning and answer of the annotator, and the reasoning and answer of the prompt generator. However, both pairs of reasoning and answers are presented as if they were both annotators. This is to ensure that the verifier does not potentially side with the prompt generator over the intuition that their answer would be more accurate. The verifier determines if all answers, and reasonings are valid. Similarly, the verifier will determine if the prompt is valid.
48
+
49
+
The verifier instructions can be found here : http://www.ccs.neu.edu/home/lehmer16/Verification-Instructions/instructions.html
50
+
51
+
A link to the description of the data can be found here: https://github.com/jayded/evidence-inference/tree/master/annotations
Copy file name to clipboardexpand all lines: README.md
+27-56
Original file line number
Diff line number
Diff line change
@@ -6,69 +6,40 @@ The dataset consists of biomedical articles describing randomized control trials
6
6
7
7
The dataset could be used for automatic data extraction of the results of a given RCT. This would enable readers to discover the effectiveness of different treatments without needing to read the paper.
8
8
9
-
### Citation
9
+
See [README.annotation_process.md](./README.annotation_process.md) for information about the annotation process.
10
10
11
-
Eric Lehman, Jay DeYoung, Regina Barzilay, and Byron C. Wallace. Inferring Which Medical Treatments Work from Reports of Clinical Trials. In NAACL (2019).
11
+
## Data
12
12
13
-
When citing this project, please use the following bibtex citation:
14
-
15
-
@inproceedings{TBD,
16
-
title = {{Inferring Which Medical Treatments Work from Reports of Clinical Trials}},
17
-
author = {Lehman, Eric and DeYoung, Jay and Barzilay, Regina and Wallace, Byron C.},
18
-
booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},
19
-
year = {2019}
20
-
}
21
-
22
-
23
-
## Randomized Control Trials (RCTs), Prompts, and Answers
24
-
There are three main types of data for this project, each of which will be described in depth in the following sections.
25
-
26
-
### RCTs
27
-
In this project, we use texts from RCTs, or randomized control trials. These RCTs are articles that directly compare two different treatments. For example, a given article might want to determine the effectiveness of ibuprofen in counteracting headaches in comparison to other treatments, such as tylenol. These papers often tend to compare multiple treatments (i.e. ibuprofen, tylenol), and the effects with respect to various outcomes (i.e. headaches, pain).
13
+
Raw documents are generated in both the PubMed nxml format and a plain text version suitable for human and machine readability (you can use your favorite tokenizer and model). Annotations are described in detail in [the annotation description](./annotations/README.md).
28
14
29
-
### Prompts
30
-
A prompt will be of the given form: "With respect to *outcome*, characterize the reported difference between patients receiving *intervention* and those receiving *comparator*." The prompt has the 3 fill-in-the-blanks, each of which lines up nicely with the RCT. For instance, if we use the example described in the RCTs section, we get:
31
-
***Outcome** = 'number of headaches'
32
-
***Intervention** = 'ibuprofen'
33
-
***Comparator** = 'tylenol'
34
-
* "With respect to *number of headaches*, characterize the reported difference between patients receiving *ibuprofen* and those receiving *tylenol*"
35
-
36
-
A given article might have 10+ of these comparisons within. For example, if the RCT article also compared *ibuprofen* and *tylenol* with respect to *side effects*, this could also be used a prompt.
15
+
We distribute annotation in a csv format ([prompts](./annotations/prompts_merged.csv) and [labels](./annotations/annotations_merged.csv)). If you prefer to work with a json format, we provide a [script](./evidence_inference/preprocess/convert_annotations_to_json.py) to convert from the csv format.
37
16
38
-
### Answers
39
-
Given a prompt, we must characterize how the relationship of two different treatments with respect to an outcome. Let us use the prompt described previously:
40
-
* "With respect to *number of headaches*, characterize the reported difference between patients receiving *ibuprofen* and those receiving *tylenol*"
17
+
## Reproduction
41
18
42
-
There are three answers we could give: 'significantly increased', 'significantly decreased', 'no significant difference.' Take, for example, three sentences that *could* appear in an article, that would each result in a different outcome.
43
-
1.**Significantly increased**: "Ibprofen relieved 60 headaches, while tylenol relieved 120; therefore ibuprofen is worse than tylenol for reducing the number of headaches (p < 0.05)."
44
-
* This can be seen as an answer of significantly increased, since ibuprofen technically *increases* the chance of having a headache if you use it instead of tylenol. We can see this because more people benefited from the use of tylenol in comparison to ibuprofen.
45
-
2.**Significantly decreased**: "Ibuprofen reduced the 2-times the number of headaches than tylenol, and therefore reduced a greater number headaches (p < 0.05)."
46
-
* This is an answer of significantly decreased since ibuprofen **decreased** the number of headaches in comparison to tylenol.
47
-
3.**No significant difference**: "Ibuprofen relieved more headaches than tylenol, but the difference was not statistically significant"
48
-
* We only care about statistical significance here. In this case it is clear that there is no statistical difference between the two, warranting an answer of no significant difference.
49
-
50
-
As an answer, we would submit two things:
51
-
1. The answer (significantly inc./dec./no-diff.).
52
-
2. A quote from the text that supports our answer (one of the sentences described above).
53
-
19
+
See [SETUP.md](./SETUP.md) for information about how to configure and reproduce primary paper results.
54
20
55
-
## Process Description
56
-
Gathering the data is contained in 3 main processes: prompt generation, annotation, and verification. We hire M.D.s from Upwork to work on only one of the processes. We use flask and AWS to host servers for the M.D.s to work on.
21
+
## Citation
57
22
58
-
### Prompt Generation
59
-
A prompt generator is hired to look at a set of articles taken from PUBMED central open access. Each of these articles are RCT that are comparing multiple treatments with respect to various outcomes. These prompt generators look to find triplets of outcomes-interventions-comparators that fill in the following sentence: "With respect to outcome, characterize the reported difference between patients receiving intervention and those receiving comparator." In order to find these prompts, prompt generators will generally find sentences describing the actual result of the prompt they find. Thus, we ask prompt generators to not only select an answer to the prompt (how the relationship of the intervention and outcome w/ respect to comparator is defined), but also the reasoning of how they achieved their answer. The answer will be one of 'significantly increased', 'significantly decreased', or 'no significant difference', while the reasoning will be a direct quote from the text. For each article, the prompt generator attempts to find a max of five unique prompts.
23
+
### Standard Form Citation
60
24
61
-
The prompt generator instructions can be found here: http://www.ccs.neu.edu/home/lehmer16/prompt-gen-instruction/templates/instructions.html
25
+
Eric Lehman, Jay DeYoung, Regina Barzilay, and Byron C. Wallace. 2019. Inferring which medical treatments work from reports of clinical trials. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3705–3717, Minneapolis, Minnesota. Association for Computational Linguistics.
62
26
63
-
### Annotator
64
-
An annotator is given the article, and a prompt. The answer will be one of 'significantly increased', 'significantly decreased', or 'no significant difference', while the reasoning will be a direct quote from the text. The annotator only has access to the prompt and the article, and therefore must search the article for the evidence and the answer. Also, if the prompt is incoherent or simply invalid, then it can be marked as such. The annotators will also attemmpt to look for the answer in the abstract. If it is not available, then the annotators may look at the remaining sections of the article to find the answer.
65
-
66
-
The annotator instructions can be found here: http://www.ccs.neu.edu/home/lehmer16/annotation-instruction-written/templates/instructions.html
67
-
68
-
### Verifier
69
-
The verifier is given the prompt, the article, the reasoning and answer of the annotator, and the reasoning and answer of the prompt generator. However, both pairs of reasoning and answers are presented as if they were both annotators. This is to ensure that the verifier does not potentially side with the prompt generator over the intuition that their answer would be more accurate. The verifier determines if all answers, and reasonings are valid. Similarly, the verifier will determine if the prompt is valid.
70
-
71
-
The verifier instructions can be found here : http://www.ccs.neu.edu/home/lehmer16/Verification-Instructions/instructions.html
72
-
73
-
A link to the description of the data can be found here: https://github.com/jayded/evidence-inference/tree/master/annotations
27
+
### Bibtex Citation
28
+
When citing this project, please use the following bibtex citation:
74
29
30
+
<pre>
31
+
@inproceedings{lehman-etal-2019-inferring,
32
+
title = "Inferring Which Medical Treatments Work from Reports of Clinical Trials",
33
+
author = "Lehman, Eric and
34
+
DeYoung, Jay and
35
+
Barzilay, Regina and
36
+
Wallace, Byron C.",
37
+
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
38
+
month = jun,
39
+
year = "2019",
40
+
address = "Minneapolis, Minnesota",
41
+
publisher = "Association for Computational Linguistics",
0 commit comments