Evaluate on subsets: x-shot and synthetic clusters #48

simsa-st · 2023-02-01T11:54:00Z

Default is now to print without showing fieldtypes and with showing these subsets: 0-shot, 1-3-shot and 4+-shot.

simsa-st · 2023-02-01T12:00:00Z

Default report:

poetry run docile_print_evaluation_report --evaluation-result-path data/example_val_results_KILE.json --dataset-path data/docile

Evaluation report for docile221221-0:val subsets

KILE

Primary metric (AP): 0.43411554364183247

subsets	AP	f1	precision	recall	TP	FP	FN
docile221221-0:val	0.434	0.640	0.688	0.599	3510	1589	2352
Dataset(docile:val-0-shot)	0.299	0.520	0.552	0.491	623	505	647
Dataset(docile:val-1-3-shot)	0.441	0.640	0.698	0.592	783	339	540
Dataset(docile:val-4+-shot)	0.489	0.688	0.739	0.644	2104	745	1165

Notes:

'{dataset}-x-shot' means that the evaluation is restricted to documents from layout clusters with x documents for training available. Here 'training' means trainval for test and train for val.
'{dataset}-synth-clusters-only' means that the evaluation is restricted to documents from layout clusters for which synthetic data exists.
For AP all predictions are used. For f1, precision, recall, TP, FP and FN predictions explicitly marked with flag use_only_for_ap=True are excluded.

With synthetic subsets:

poetry run docile_print_evaluation_report --evaluation-result-path data/example_val_results_KILE.json --dataset-path data/docile --evaluate-synthetic-subsets

Evaluation report for docile221221-0:val subsets

KILE

Primary metric (AP): 0.43411554364183247

subsets	AP	f1	precision	recall	TP	FP	FN
docile221221-0:val	0.434	0.640	0.688	0.599	3510	1589	2352
Dataset(docile:val-synth-clusters-only)	0.509	0.698	0.743	0.658	1218	421	633
Dataset(docile:val-0-shot)	0.299	0.520	0.552	0.491	623	505	647
Dataset(docile:val-1-3-shot)	0.441	0.640	0.698	0.592	783	339	540
Dataset(docile:val-1-3-shot-synth-clusters-only)	0.482	0.671	0.710	0.636	456	186	261
Dataset(docile:val-4+-shot)	0.489	0.688	0.739	0.644	2104	745	1165
Dataset(docile:val-4+-shot-synth-clusters-only)	0.527	0.715	0.764	0.672	762	235	372

Notes:

'{dataset}-x-shot' means that the evaluation is restricted to documents from layout clusters with x documents for training available. Here 'training' means trainval for test and train for val.
'{dataset}-synth-clusters-only' means that the evaluation is restricted to documents from layout clusters for which synthetic data exists.
For AP all predictions are used. For f1, precision, recall, TP, FP and FN predictions explicitly marked with flag use_only_for_ap=True are excluded.

No subsets but fieldtypes:

poetry run docile_print_evaluation_report --evaluation-result-path data/example_val_results_KILE.json --evaluate-x-shot-subsets "" --evaluate-fieldtypes

Evaluation report for docile221221-0:val

KILE

Primary metric (AP): 0.43411554364183247

fieldtype	AP	f1	precision	recall	TP	FP	FN
-> micro average	0.434	0.640	0.688	0.599	3510	1589	2352
account_num	0.000	0.000	0.000	0.000	0	4	9
amount_due	0.579	0.750	0.791	0.713	371	98	149
amount_paid	0.533	0.622	0.667	0.583	14	7	10
amount_total_gross	0.514	0.697	0.710	0.684	355	145	164
amount_total_net	0.377	0.548	0.607	0.500	34	22	34
amount_total_tax	0.584	0.729	0.775	0.689	31	9	14
bank_num	0.611	0.571	0.500	0.667	4	4	2
bic	0.000	0.000	0.000	0.000	0	0	0
currency_code_amount_due	0.038	0.084	0.320	0.048	16	34	315
customer_billing_address	0.612	0.748	0.745	0.752	318	109	105
customer_billing_name	0.620	0.763	0.777	0.749	384	110	129
customer_delivery_address	0.182	0.333	0.296	0.381	8	19	13
customer_delivery_name	0.257	0.456	0.419	0.500	13	18	13
customer_id	0.601	0.726	0.748	0.705	122	41	51
customer_order_id	0.275	0.410	0.447	0.378	17	21	28
customer_other_address	0.594	0.760	0.864	0.679	19	3	9
customer_other_name	0.402	0.598	0.641	0.560	75	42	59
customer_registration_id	0.000	0.000	0.000	0.000	0	0	2
customer_tax_id	0.000	0.000	0.000	0.000	0	0	0
date_due	0.731	0.812	0.867	0.765	65	10	20
date_issue	0.732	0.819	0.835	0.803	411	81	101
document_id	0.610	0.740	0.729	0.753	341	127	112
iban	0.000	0.000	0.000	0.000	0	0	0
order_id	0.225	0.448	0.534	0.386	117	102	186
payment_reference	0.000	0.000	0.000	0.000	0	0	0
payment_terms	0.560	0.692	0.670	0.715	118	58	47
tax_detail_gross	0.371	0.588	0.645	0.541	20	11	17
tax_detail_net	0.298	0.529	0.600	0.474	18	12	20
tax_detail_rate	0.458	0.533	0.571	0.500	4	3	4
tax_detail_tax	0.460	0.615	0.667	0.571	24	12	18
vendor_address	0.250	0.489	0.541	0.447	244	207	302
vendor_email	0.253	0.488	0.512	0.467	21	20	24
vendor_name	0.264	0.507	0.575	0.453	290	214	350
vendor_order_id	0.336	0.529	0.474	0.600	9	10	6
vendor_registration_id	0.500	0.667	0.500	1.000	1	1	0
vendor_tax_id	0.324	0.517	0.508	0.525	31	30	28

Notes:

For AP all predictions are used. For f1, precision, recall, TP, FP and FN predictions explicitly marked with flag use_only_for_ap=True are excluded.

it is also possible to combine subsets and fieldtypes which then prints subsets summary and individual per-subset reports (not showing here because it is too long).

simsa-st force-pushed the sts-few-shot-clusters-eval branch from c61dba4 to 129f3d9 Compare February 1, 2023 14:40

Evaluate on subsets: x-shot and synthetic clusters

eb10af9

simsa-st force-pushed the sts-few-shot-clusters-eval branch from 129f3d9 to eb10af9 Compare February 1, 2023 14:47

simsa-st requested a review from ahHamdi February 1, 2023 14:56

ahHamdi approved these changes Feb 6, 2023

View reviewed changes

simsa-st merged commit 41c2db8 into main Feb 6, 2023

simsa-st deleted the sts-few-shot-clusters-eval branch February 6, 2023 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate on subsets: x-shot and synthetic clusters #48

Evaluate on subsets: x-shot and synthetic clusters #48

simsa-st commented Feb 1, 2023

simsa-st commented Feb 1, 2023 •

edited

Loading

Evaluate on subsets: x-shot and synthetic clusters #48

Evaluate on subsets: x-shot and synthetic clusters #48

Conversation

simsa-st commented Feb 1, 2023

simsa-st commented Feb 1, 2023 • edited Loading

Evaluation report for docile221221-0:val subsets

KILE

Evaluation report for docile221221-0:val subsets

KILE

Evaluation report for docile221221-0:val

KILE

simsa-st commented Feb 1, 2023 •

edited

Loading