Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Refactor] Refactor BERT with new data preprocessing #1124

Merged
merged 71 commits into from
Jan 30, 2020

Conversation

zburning
Copy link
Contributor

Description

New data preprocessing.
Refactor BERT squad script.
Add XLNet squad script.
Update & add corresponding results.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mli
Copy link
Member

mli commented Jan 22, 2020

Job PR-1124/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1124/7/index.html

@leezu
Copy link
Contributor

leezu commented Jan 22, 2020

@zburning please resolve the conflicts. As the Bert results match reported performance and some XLNet results still show a gap, how about removing the XLNet changes from this PR to unblock this PR?

@mli
Copy link
Member

mli commented Jan 23, 2020

Job PR-1124/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1124/9/index.html

@mli
Copy link
Member

mli commented Jan 23, 2020

Job PR-1124/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1124/10/index.html

@zburning zburning changed the title [Refactor] Refactor BERT squad and add XLNet squad scripts with new data preprocessing [Refactor] Refactor BERT with new data preprocessing Jan 23, 2020
parser.add_argument('--bert_dataset',
type=str,
default='book_corpus_wiki_en_uncased',
choices=[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add an API nlp.data.list_datasets() to list all available datasets in gluonnlp. Otherwise every time a new model is added, we need to revise the choice list in the script..

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete scripts/bert/test-682b5d15.bpe. I think it's included by mistake.

@mli
Copy link
Member

mli commented Jan 24, 2020

Job PR-1124/13 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1124/13/index.html

@mli
Copy link
Member

mli commented Jan 29, 2020

Job PR-1124/14 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1124/14/index.html

@mli
Copy link
Member

mli commented Jan 29, 2020

Job PR-1124/15 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1124/15/index.html

@mli
Copy link
Member

mli commented Jan 29, 2020

Job PR-1124/16 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1124/16/index.html

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@leezu leezu merged commit 2e6d73a into dmlc:master Jan 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants