increase N for experiments on Wu et al 2023 baseline, 3 or 4 layer variants reported in results (report additional runs that had not completed by original draft) #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
small change: report experiments that had not completed by the original draft deadline, increasing to n=10 trained from Transformers the Wu et al 2023 baseline, 3 or 4 layer variants
draft PDF link: https://raw.githubusercontent.com/willy-b/RASP-for-ReCOGS/fdde271505961a7adf5d6b51b63d2d52108c98f2/rasp-for-recogs_pos-wbruns-2024-draft.pdf
Layers here are Transformer blocks (as we wanted to check if adding more blocks would get it to learn more tree-like representation of grammar and avoid the mistakes predicted by non-tree/non-recursive solution reported by Wu et al 2023 and confirmed by me).
Specifically, when I say 4 layers, I mean
4 x BertLayer
in the Encoder and Decoder.Here is Wu et al 2023's baseline Transformer in that configuration: