Skip to content

benbogin/unobserved-local-structures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code and data for the paper Unobserved Local Structures Make Compositional Generalization Hard.

COVR-10

COVR is a synthetic semantic parsing dataset used to evaluate sequence to sequence models for compositional generalization. COVR-10 contains 10 compositional splits, in which each test set contains a particular kind of unseen programs.

The splits

# Acc.1 (FT)
Bart/T5
Acc.2 (ICL)
GPT-3
2-ULSs
(unobserved local structures3)
Example
8 0.34 0.51 eq+triangle
eq+brown
eq+gray
eq+round
eq+query_attr[color]
eq+black
eq+white
eq+query_attr[shape]
eq+square
Both the color of cat that is chasing black triangle mouse that is playing with ...
and (🟠eq (🔵query_attr [color] (with_relation (find (cat), chasing, with_relation (...
25 0.59 0.23 and+some
none+filter
filter+scene
some+filter
most+filter
exists+filter
all+filter
None of square square cat are playing with dog that is looking at white animal...
🟠none (🔵filter (square, filter (square, find (cat))), with_relation (scene (), pla...
34 0.35 0.38 all+with_relation
with_relation+scene
exists+with_relation
none+with_relation
most+with_relation
some+with_relation
Either the number of white animal that is looking at square brown animal that is...
or (eq (count (🔵with_relation (filter (white, find (animal)), looking at, ...), 4...
43 0.2 0.11 and+some
and+most
or+all
and+all
or+none
and+none
or+most
or+some
Both the color of cat is equal to brown and some of cat are brown ...
🟠and (eq (query_attr [color] (find (cat)), brown), 🔵some (find (cat), filter (brow...
48 0 0.85 <s>+query_attr[shape]
<s>+query_attr[color]
What is the shape of square cat that is looking at black brown animal that is lo...
🟤query_attr [shape] (with_relation (filter (square, find (cat)), looking at, with...
51 0.64 0.35 Either the color of mouse that is playing with mouse that is chasing triangle br...
or (eq (query_attr [color] (with_relation (find (mouse), playing with, with_rela...
99 0 0.89 <s>+count
What is the number of gray animal that is chasing gray mouse that is playing wit...
🟤count (with_relation (filter (gray, find (animal)), chasing, with_relation (filt...
100 0.02 0.18 and+exists
exists+find
or+exists
Both the shape of cat is equal to white and there is triangle black cat ...
🟠and (eq (query_attr [shape] (find (cat)), white), 🔵exists (filter (triangle, filt...
110 0.18 0.33 with_relation+filter
Either the number of animal is equal to the number of round dog that is chasing ...
or (eq (count (find (animal)), count (🟠with_relation (🔵filter (round, find (dog)),...
115 0.28 0.05 all+with_relation
with_relation+scene
none+with_relation
most+with_relation
some+with_relation
Either all of cat that is chasing triangle triangle cat that is playing with mou...
or (🟠all (🔵with_relation (find (cat), chasing, with_relation (filter (triangle, fi...
More

🟠 and 🔵 represent an unseen pair of symbols in a given example. 🟤 represents a symbol that was unseen as a first token in the output sequence.

Splits are created using the Synchronous context-free grammar (SCFG) rules that have generated this dataset, by holding out sets of rules that are not seen together during training.

  • For details on this splitting method, see our paper (Appendix B.2).
  • You can see the set of unseen grammar rules for each split, along with training and test examples, by clicking on Details for any desired split.
  • See the list of all grammar splits, which includes splits that were not selected for COVR-10. This list only includes grammar splits and not n-LS splits.
  • Download COVR-10

1Average exact match accuracy for BART-Base, BART-Large, T5-Base and T5-Large, fine-tuned (FT) separately on each split (see implementation details in the paper).

2Exact match accuracy of GPT-3, engine text-davinci-002, using OpenAI API. For each split we evaluated on a subset of 100 test examples. We use in-context learning (ICL): for each test instance, we randomly sample 10 examples from the training set and add their source and target to the prompt. Click on the GPT-3 accuracy to see samples of prompts and outputs.

3Unobserved local structures of size 2 (2-LS), considering only parent-child relations.

Download datasets and splits used in paper

Dataset Split Method # Splits Download Dataset and splits Comments
COVR-10 Grammar 10 covr10.zip
COVR Grammar/
n-LS
124/
22
covr.zip
Overnight Template 5 (per domain) overnight.zip
Schema2QA Template 5 s2q.zip Both utterances and targets are normalized for better evaluation, and are anonymized to resolve column ambiguity
Atis Template 5 atis.zip Normalized variables for better evaluation

Experiments

Code to run experiments.

Code to compute easiness.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published