======= Header metadata ======= Evaluation on 969 random PDF files out of 982 PDF (ratio 1.0). ======= Strict Matching ======= (exact matches) ===== Field-level results ===== label accuracy precision recall f1 support abstract 77.27 9.37 9.08 9.22 969 authors 91.62 67.43 66.53 66.98 968 first_author 97.65 91.94 90.8 91.36 967 title 95.9 85.9 83.59 84.73 969 all (micro avg.) 90.61 63.82 62.48 63.14 3873 all (macro avg.) 90.61 63.66 62.5 63.07 3873 ======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) ===== Field-level results ===== label accuracy precision recall f1 support abstract 80.42 22.36 21.67 22.01 969 authors 91.67 67.64 66.74 67.19 968 first_author 97.65 91.94 90.8 91.36 967 title 98.01 94.59 92.05 93.31 969 all (micro avg.) 91.94 69.25 67.8 68.52 3873 all (macro avg.) 91.94 69.13 67.81 68.47 3873 ==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8) ===== Field-level results ===== label accuracy precision recall f1 support abstract 86.46 47.28 45.82 46.54 969 authors 96.31 86.49 85.33 85.91 968 first_author 97.73 92.25 91.11 91.68 967 title 98.48 96.5 93.91 95.19 969 all (micro avg.) 94.74 80.72 79.03 79.87 3873 all (macro avg.) 94.74 80.63 79.04 79.83 3873 = Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95) ===== Field-level results ===== label accuracy precision recall f1 support abstract 85.71 44.2 42.83 43.5 969 authors 93.5 75.08 74.07 74.57 968 first_author 97.65 91.94 90.8 91.36 967 title 98.45 96.39 93.81 95.08 969 all (micro avg.) 93.83 76.98 75.37 76.16 3873 all (macro avg.) 93.83 76.9 75.38 76.13 3873 ===== Instance-level results ===== Total expected instances: 969 Total correct instances: 58 (strict) Total correct instances: 160 (soft) Total correct instances: 350 (Levenshtein) Total correct instances: 295 (ObservedRatcliffObershelp) Instance-level recall: 5.99 (strict) Instance-level recall: 16.51 (soft) Instance-level recall: 36.12 (Levenshtein) Instance-level recall: 30.44 (RatcliffObershelp) ======= Citation metadata ======= Evaluation on 969 random PDF files out of 982 PDF (ratio 1.0). ======= Strict Matching ======= (exact matches) ===== Field-level results ===== label accuracy precision recall f1 support authors 96.99 79.38 78.3 78.84 61728 date 99.39 95.85 94.16 94.99 62107 first_author 99.22 94.74 93.42 94.08 61728 inTitle 99.38 95.77 94.82 95.3 61677 issue 99.86 2.08 75 4.05 16 page 99.36 96.26 95.32 95.78 52105 title 98.58 90.3 90.84 90.57 60559 volume 99.65 97.85 98.29 98.07 59595 all (micro avg.) 99.05 92.67 92.07 92.37 419515 all (macro avg.) 99.05 81.53 90.02 81.46 419515 ======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) ===== Field-level results ===== label accuracy precision recall f1 support authors 97.01 79.51 78.43 78.97 61728 date 99.39 95.85 94.16 94.99 62107 first_author 99.23 94.82 93.5 94.16 61728 inTitle 99.45 96.25 95.29 95.77 61677 issue 99.86 2.08 75 4.05 16 page 99.36 96.26 95.32 95.78 52105 title 99.4 95.94 96.52 96.23 60559 volume 99.65 97.85 98.29 98.07 59595 all (micro avg.) 99.17 93.59 92.98 93.29 419515 all (macro avg.) 99.17 82.32 90.81 82.25 419515 ==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8) ===== Field-level results ===== label accuracy precision recall f1 support authors 99.01 93.28 92.01 92.64 61728 date 99.39 95.85 94.16 94.99 62107 first_author 99.3 95.27 93.94 94.6 61728 inTitle 99.5 96.56 95.6 96.08 61677 issue 99.86 2.08 75 4.05 16 page 99.36 96.26 95.32 95.78 52105 title 99.65 97.66 98.24 97.95 60559 volume 99.65 97.85 98.29 98.07 59595 all (micro avg.) 99.47 95.97 95.34 95.65 419515 all (macro avg.) 99.47 84.35 92.82 84.27 419515 = Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95) ===== Field-level results ===== label accuracy precision recall f1 support authors 98.05 86.69 85.51 86.1 61728 date 99.39 95.85 94.16 94.99 62107 first_author 99.22 94.76 93.44 94.09 61728 inTitle 99.45 96.24 95.28 95.76 61677 issue 99.86 2.08 75 4.05 16 page 99.36 96.26 95.32 95.78 52105 title 99.63 97.51 98.1 97.81 60559 volume 99.65 97.85 98.29 98.07 59595 all (micro avg.) 99.33 94.86 94.24 94.55 419515 all (macro avg.) 99.33 83.4 91.89 83.33 419515 ===== Instance-level results ===== Total expected instances: 62109 Total extracted instances: 62910 Total correct instances: 41373 (strict) Total correct instances: 44134 (soft) Total correct instances: 51595 (Levenshtein) Total correct instances: 48283 (RatcliffObershelp) Instance-level precision: 65.77 (strict) Instance-level precision: 70.15 (soft) Instance-level precision: 82.01 (Levenshtein) Instance-level precision: 76.75 (RatcliffObershelp) Instance-level recall: 66.61 (strict) Instance-level recall: 71.06 (soft) Instance-level recall: 83.07 (Levenshtein) Instance-level recall: 77.74 (RatcliffObershelp) Instance-level f-score: 66.19 (strict) Instance-level f-score: 70.6 (soft) Instance-level f-score: 82.54 (Levenshtein) Instance-level f-score: 77.24 (RatcliffObershelp) Matching 1 : 57278 Matching 2 : 980 Matching 3 : 1211 Matching 4 : 357 Total matches : 59826 ======= Citation context resolution ======= Total expected references: 62109 - 64.1 references per article Total predicted references: 62910 - 64.92 references per article Total expected citation contexts: 106379 - 109.78 citation contexts per article Total predicted citation contexts: 97185 - 100.29 citation contexts per article Total correct predicted citation contexts: 93525 - 96.52 citation contexts per article Total wrong predicted citation contexts: 3660 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM) Precision citation contexts: 96.23 Recall citation contexts: 87.92 fscore citation contexts: 91.89 ======= Fulltext structures ======= Evaluation on 969 random PDF files out of 982 PDF (ratio 1.0). ======= Strict Matching ======= (exact matches) ===== Field-level results ===== label accuracy precision recall f1 support availability_stmt 99.76 29.69 26.48 27.99 574 figure_title 87.41 0.02 0.01 0.01 31073 funding_stmt 98.58 5.24 23.84 8.59 906 reference_citation 70.69 55.41 55.64 55.53 106306 reference_figure 81.63 56.95 49.99 53.25 67647 reference_table 99.59 69.03 74.84 71.82 2254 section_title 97.43 85.35 74.07 79.31 21462 table_title 99.23 0.46 0.16 0.24 1832 all (micro avg.) 91.79 54.89 47.8 51.1 232054 all (macro avg.) 91.79 37.77 38.13 37.09 232054 ======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) ===== Field-level results ===== label accuracy precision recall f1 support availability_stmt 99.75 38.87 34.67 36.65 574 figure_title 89.02 49 15.16 23.16 31073 funding_stmt 98.35 5.24 23.84 8.59 906 reference_citation 93.3 91.05 91.42 91.23 106306 reference_figure 78.82 57.24 50.24 53.51 67647 reference_table 99.53 69.11 74.93 71.9 2254 section_title 97.15 86.23 74.83 80.13 21462 table_title 99.5 81.02 28.66 42.34 1832 all (micro avg.) 94.43 76.49 66.61 71.21 232054 all (macro avg.) 94.43 59.72 49.22 50.94 232054 ===== Document-level ratio results ===== label accuracy precision recall f1 support availability_stmt 83.39 96.24 89.2 92.59 574 all (micro avg.) 83.39 96.24 89.2 92.59 574 all (macro avg.) 83.39 96.24 89.2 92.59 574 ==================================================================================== Evaluation report in markdown format saved under /home/lfoppiano/grobid/grobid-trainer/../grobid-home/tmp/report.md :grobid-trainer:jatsEval (Thread[Daemon worker,5,main]) completed. Took 30 mins 38.449 secs.