======= Header metadata ======= Evaluation on 1998 random PDF files out of 1998 PDF (ratio 1.0). ======= Strict Matching ======= (exact matches) ===== Field-level results ===== label accuracy precision recall f1 support abstract 78 2.21 2.16 2.18 1988 authors 94.7 77.17 76.51 76.84 1997 first_author 98.97 96.21 95.49 95.85 1995 keywords 95.83 58.78 59.83 59.3 839 title 92.32 66.67 65.97 66.31 1998 all (micro avg.) 91.97 60.59 60.07 60.33 8817 all (macro avg.) 91.97 60.21 59.99 60.1 8817 ======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) ===== Field-level results ===== label accuracy precision recall f1 support abstract 90.67 59.72 58.55 59.13 1988 authors 94.86 77.88 77.22 77.55 1997 first_author 99.04 96.52 95.79 96.15 1995 keywords 96.33 63.93 65.08 64.5 839 title 93.1 70.16 69.42 69.79 1998 all (micro avg.) 94.8 74.94 74.29 74.61 8817 all (macro avg.) 94.8 73.64 73.21 73.42 8817 ==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8) ===== Field-level results ===== label accuracy precision recall f1 support abstract 95.18 80.19 78.62 79.4 1988 authors 96.85 86.77 86.03 86.4 1997 first_author 99.13 96.92 96.19 96.55 1995 keywords 97.83 79.51 80.93 80.21 839 title 95.87 82.55 81.68 82.11 1998 all (micro avg.) 96.97 85.94 85.19 85.56 8817 all (macro avg.) 96.97 85.19 84.69 84.94 8817 = Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95) ===== Field-level results ===== label accuracy precision recall f1 support abstract 94.45 76.91 75.4 76.15 1988 authors 95.75 81.87 81.17 81.52 1997 first_author 98.97 96.21 95.49 95.85 1995 keywords 97.05 71.43 72.71 72.06 839 title 94.6 76.88 76.08 76.48 1998 all (micro avg.) 96.17 81.86 81.15 81.51 8817 all (macro avg.) 96.17 80.66 80.17 80.41 8817 ===== Instance-level results ===== Total expected instances: 1998 Total correct instances: 32 (strict) Total correct instances: 612 (soft) Total correct instances: 1075 (Levenshtein) Total correct instances: 901 (ObservedRatcliffObershelp) Instance-level recall: 1.6 (strict) Instance-level recall: 30.63 (soft) Instance-level recall: 53.8 (Levenshtein) Instance-level recall: 45.1 (RatcliffObershelp) ======= Citation metadata ======= Evaluation on 1998 random PDF files out of 1998 PDF (ratio 1.0). ======= Strict Matching ======= (exact matches) ===== Field-level results ===== label accuracy precision recall f1 support authors 98.4 88.17 83.27 85.65 97081 date 98.87 91.73 86.34 88.95 97527 doi 99.13 70.85 83.81 76.79 16893 first_author 99.31 95.07 89.71 92.31 97081 inTitle 97.69 82.85 79.45 81.11 96328 issue 99.61 94.34 92.04 93.18 30282 page 97.52 94.97 78.34 85.86 88503 pmcid 99.95 66.38 86.12 74.97 807 pmid 99.87 70.06 84.95 76.79 2093 title 97.98 84.89 83.59 84.23 92367 volume 99.46 96.23 95.23 95.73 87618 all (micro avg.) 98.89 89.86 85.36 87.55 706580 all (macro avg.) 98.89 85.05 85.71 85.05 706580 ======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) ===== Field-level results ===== label accuracy precision recall f1 support authors 98.55 89.33 84.37 86.77 97081 date 98.87 91.73 86.34 88.95 97527 doi 99.26 75.34 89.12 81.65 16893 first_author 99.37 95.49 90.11 92.72 97081 inTitle 98.97 92.33 88.54 90.39 96328 issue 99.61 94.34 92.04 93.18 30282 page 97.52 94.97 78.34 85.86 88503 pmcid 99.96 75.64 98.14 85.44 807 pmid 99.89 74.47 90.3 81.62 2093 title 99.09 93.23 91.81 92.51 92367 volume 99.46 96.23 95.23 95.73 87618 all (micro avg.) 99.14 92.67 88.04 90.29 706580 all (macro avg.) 99.14 88.46 89.49 88.62 706580 ==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8) ===== Field-level results ===== label accuracy precision recall f1 support authors 99.25 94.59 89.34 91.89 97081 date 98.87 91.73 86.34 88.95 97527 doi 99.32 77.61 91.8 84.11 16893 first_author 99.39 95.64 90.25 92.87 97081 inTitle 99.1 93.31 89.48 91.35 96328 issue 99.61 94.34 92.04 93.18 30282 page 97.52 94.97 78.34 85.86 88503 pmcid 99.96 75.64 98.14 85.44 807 pmid 99.89 74.47 90.3 81.62 2093 title 99.46 96.05 94.59 95.31 92367 volume 99.46 96.23 95.23 95.73 87618 all (micro avg.) 99.26 94 89.3 91.59 706580 all (macro avg.) 99.26 89.51 90.53 89.66 706580 = Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95) ===== Field-level results ===== label accuracy precision recall f1 support authors 98.85 91.54 86.46 88.93 97081 date 98.87 91.73 86.34 88.95 97527 doi 99.28 76.04 89.95 82.42 16893 first_author 99.32 95.11 89.75 92.36 97081 inTitle 98.8 91.07 87.33 89.16 96328 issue 99.61 94.34 92.04 93.18 30282 page 97.52 94.97 78.34 85.86 88503 pmcid 99.95 66.38 86.12 74.97 807 pmid 99.87 70.06 84.95 76.79 2093 title 99.37 95.36 93.91 94.63 92367 volume 99.46 96.23 95.23 95.73 87618 all (micro avg.) 99.17 93.03 88.38 90.64 706580 all (macro avg.) 99.17 87.53 88.22 87.54 706580 ===== Instance-level results ===== Total expected instances: 98695 Total extracted instances: 98006 Total correct instances: 43738 (strict) Total correct instances: 54739 (soft) Total correct instances: 58936 (Levenshtein) Total correct instances: 55658 (RatcliffObershelp) Instance-level precision: 44.63 (strict) Instance-level precision: 55.85 (soft) Instance-level precision: 60.14 (Levenshtein) Instance-level precision: 56.79 (RatcliffObershelp) Instance-level recall: 44.32 (strict) Instance-level recall: 55.46 (soft) Instance-level recall: 59.72 (Levenshtein) Instance-level recall: 56.39 (RatcliffObershelp) Instance-level f-score: 44.47 (strict) Instance-level f-score: 55.66 (soft) Instance-level f-score: 59.92 (Levenshtein) Instance-level f-score: 56.59 (RatcliffObershelp) Matching 1 : 79247 Matching 2 : 4436 Matching 3 : 4338 Matching 4 : 2106 Total matches : 90127 ======= Citation context resolution ======= Total expected references: 98693 - 49.4 references per article Total predicted references: 98006 - 49.05 references per article Total expected citation contexts: 142690 - 71.42 citation contexts per article Total predicted citation contexts: 135516 - 67.83 citation contexts per article Total correct predicted citation contexts: 116580 - 58.35 citation contexts per article Total wrong predicted citation contexts: 18936 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM) Precision citation contexts: 86.03 Recall citation contexts: 81.7 fscore citation contexts: 83.81 ======= Fulltext structures ======= Evaluation on 1998 random PDF files out of 1998 PDF (ratio 1.0). ======= Strict Matching ======= (exact matches) ===== Field-level results ===== label accuracy precision recall f1 support availability_stmt 99.83 29.69 25.62 27.5 445 figure_title 90.57 4.23 2 2.72 22953 funding_stmt 98.64 4.16 24.43 7.11 745 reference_citation 75.63 71.04 71.33 71.19 147299 reference_figure 91.71 70.59 67.76 69.15 47879 reference_table 98.19 48.15 83.09 60.97 5950 section_title 94.75 72.6 69.67 71.11 32359 table_title 98.2 4.35 2.89 3.47 3914 all (micro avg.) 93.44 65.46 63.42 64.42 261544 all (macro avg.) 93.44 38.1 43.35 39.15 261544 ======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) ===== Field-level results ===== label accuracy precision recall f1 support availability_stmt 99.86 50 43.15 46.32 445 figure_title 94.27 69.44 32.89 44.64 22953 funding_stmt 98.52 4.37 25.64 7.46 745 reference_citation 84.55 83.03 83.36 83.2 147299 reference_figure 91.16 71.22 68.36 69.76 47879 reference_table 98.06 48.6 83.87 61.54 5950 section_title 95.05 76.48 73.39 74.9 32359 table_title 98.81 51.29 34.03 40.92 3914 all (micro avg.) 95.03 76.37 73.99 75.16 261544 all (macro avg.) 95.03 56.8 55.59 53.59 261544 ===== Document-level ratio results ===== label accuracy precision recall f1 support availability_stmt 65.87 84.77 86.29 85.52 445 all (micro avg.) 65.87 84.77 86.29 85.52 445 all (macro avg.) 65.87 84.77 86.29 85.52 445 ====================================================================================