-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[basicdataset] Add PennTreebank dataset #1580
Conversation
...dataset/src/test/resources/mlrepo/dataset/nlp/ai/djl/basicdataset/penntreebank/metadata.json
Outdated
Show resolved
Hide resolved
...dataset/src/test/resources/mlrepo/dataset/nlp/ai/djl/basicdataset/penntreebank/metadata.json
Outdated
Show resolved
Hide resolved
...dataset/src/test/resources/mlrepo/dataset/nlp/ai/djl/basicdataset/penntreebank/metadata.json
Outdated
Show resolved
Hide resolved
basicdataset/src/main/java/ai/djl/basicdataset/nlp/PennTreebank.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. I made a small change to fix the license. If you haven't done much with licenses, the Apache2 license is typically used for code but there are different licenses used for other kinds of content (books, movies, documents, papers, datasets, artwork, etc). It is important to get the license right and follow the terms because the license describes what you are legally allowed to do with the content.
Besides that, the metadata is uploaded, so you can not modify your tests to use the DJL central repository instead of the local repository.
basicdataset/src/test/java/ai/djl/basicdataset/PennTreebankTextTest.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #1580 +/- ##
============================================
- Coverage 72.08% 70.87% -1.22%
- Complexity 5126 5427 +301
============================================
Files 473 507 +34
Lines 21970 23757 +1787
Branches 2351 2587 +236
============================================
+ Hits 15838 16837 +999
- Misses 4925 5630 +705
- Partials 1207 1290 +83
Continue to review full report at Codecov.
|
Description
Add a version of Penn Treebank which is free on github but without POS tags as Torchtext, it had been pre-processed.
close #1579.