COVR-10

Code and data for the paper Unobserved Local Structures Make Compositional Generalization Hard.

COVR-10

COVR is a synthetic semantic parsing dataset used to evaluate sequence to sequence models for compositional generalization. COVR-10 contains 10 compositional splits, in which each test set contains a particular kind of unseen programs.

The splits

#	Acc.¹ (FT) Bart/T5	Acc.² (ICL) GPT-3	2-ULSs (unobserved local structures³)	Example
8	0.34	0.51	eq+triangle eq+brown eq+gray eq+round eq+query_attr[color] eq+black eq+white eq+query_attr[shape] eq+square	Both the color of cat that is chasing black triangle mouse that is playing with ... and (🟠eq (🔵query_attr [color] (with_relation (find (cat), chasing, with_relation (...
25	0.59	0.23	and+some none+filter filter+scene some+filter most+filter exists+filter all+filter	None of square square cat are playing with dog that is looking at white animal... 🟠none (🔵filter (square, filter (square, find (cat))), with_relation (scene (), pla...
34	0.35	0.38	all+with_relation with_relation+scene exists+with_relation none+with_relation most+with_relation some+with_relation	Either the number of white animal that is looking at square brown animal that is... or (eq (count (🔵with_relation (filter (white, find (animal)), looking at, ...), 4...
43	0.2	0.11	and+some and+most or+all and+all or+none and+none or+most or+some	Both the color of cat is equal to brown and some of cat are brown ... 🟠and (eq (query_attr [color] (find (cat)), brown), 🔵some (find (cat), filter (brow...
48	0	0.85	<s>+query_attr[shape] <s>+query_attr[color]	What is the shape of square cat that is looking at black brown animal that is lo... 🟤query_attr [shape] (with_relation (filter (square, find (cat)), looking at, with...
51	0.64	0.35		Either the color of mouse that is playing with mouse that is chasing triangle br... or (eq (query_attr [color] (with_relation (find (mouse), playing with, with_rela...
99	0	0.89	<s>+count	What is the number of gray animal that is chasing gray mouse that is playing wit... 🟤count (with_relation (filter (gray, find (animal)), chasing, with_relation (filt...
100	0.02	0.18	and+exists exists+find or+exists	Both the shape of cat is equal to white and there is triangle black cat ... 🟠and (eq (query_attr [shape] (find (cat)), white), 🔵exists (filter (triangle, filt...
110	0.18	0.33	with_relation+filter	Either the number of animal is equal to the number of round dog that is chasing ... or (eq (count (find (animal)), count (🟠with_relation (🔵filter (round, find (dog)),...
115	0.28	0.05	all+with_relation with_relation+scene none+with_relation most+with_relation some+with_relation	Either all of cat that is chasing triangle triangle cat that is playing with mou... or (🟠all (🔵with_relation (find (cat), chasing, with_relation (filter (triangle, fi...
More

🟠 and 🔵 represent an unseen pair of symbols in a given example. 🟤 represents a symbol that was unseen as a first token in the output sequence.

Splits are created using the Synchronous context-free grammar (SCFG) rules that have generated this dataset, by holding out sets of rules that are not seen together during training.

For details on this splitting method, see our paper (Appendix B.2).
You can see the set of unseen grammar rules for each split, along with training and test examples, by clicking on Details for any desired split.
See the list of all grammar splits, which includes splits that were not selected for COVR-10. This list only includes grammar splits and not n-LS splits.
Download COVR-10

¹Average exact match accuracy for BART-Base, BART-Large, T5-Base and T5-Large, fine-tuned (FT) separately on each split (see implementation details in the paper).

²Exact match accuracy of GPT-3, engine text-davinci-002, using OpenAI API. For each split we evaluated on a subset of 100 test examples. We use in-context learning (ICL): for each test instance, we randomly sample 10 examples from the training set and add their source and target to the prompt. Click on the GPT-3 accuracy to see samples of prompts and outputs.

³Unobserved local structures of size 2 (2-LS), considering only parent-child relations.

Download datasets and splits used in paper

Dataset	Split Method	# Splits	Download Dataset and splits	Comments
COVR-10	Grammar	10	covr10.zip
COVR	Grammar/ n-LS	124/ 22	covr.zip
Overnight	Template	5 (per domain)	overnight.zip
Schema2QA	Template	5	s2q.zip	Both utterances and targets are normalized for better evaluation, and are anonymized to resolve column ambiguity
Atis	Template	5	atis.zip	Normalized variables for better evaluation

Experiments

Code to run experiments.

Code to compute easiness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVR-10

The splits

Download datasets and splits used in paper

Experiments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
covr		covr
datasets		datasets
easiness		easiness
experiments		experiments
README.md		README.md

benbogin/unobserved-local-structures

Folders and files

Latest commit

History

Repository files navigation

COVR-10

The splits

Download datasets and splits used in paper

Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages