Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebasing tpu branch on a more recent fairseq upstream commit #19

Merged
merged 214 commits into from
Nov 19, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
214 commits
Select commit Hold shift + click to select a range
b002d00
v0.7.1 -> v0.7.2 (#891)
Jul 19, 2019
be5821b
Switch to torch.nn.functional.gelu when available
Jul 19, 2019
8af5554
Improve interactive generation (support --tokenizer and --bpe)
Jul 19, 2019
c811e0e
Store task in the criterion base class
Jul 19, 2019
ffe53d6
Create standalone label_smoothed_nll_loss
Jul 19, 2019
7efde22
Allow not specifying --warmup-init-lr
Jul 19, 2019
69d0f7f
Rename _load_model_ensemble -> load_model_ensemble_and_task
Jul 19, 2019
f812e52
Rename data.transforms -> data.encoders
Jul 21, 2019
1f96d28
Fix topp sampling issues (#882)
Jul 21, 2019
5f78106
Default to mmap and infer dataset implementations automatically
Jul 21, 2019
62b5498
Update GPT-2 BPE
Jul 21, 2019
9c89e88
Misc improvements to torch hub interface
Jul 22, 2019
47fd985
Move Masked LM components to legacy/ -- new ones are coming
Jul 22, 2019
bccfa7d
Add fallback for SLURM config
Jul 22, 2019
906411d
Fix --reset-meters
Jul 22, 2019
51ba352
Simplify hubconf
Jul 22, 2019
654affc
Add new Datasets
Jul 22, 2019
e8d609a
Add new Masked LM task + criterion
Jul 22, 2019
a03fe6f
Implement sparse transformer fixed attention pattern (#804)
Jul 22, 2019
30123e2
Fix read_binarized.py script
Jul 23, 2019
af6b361
Initializing mask as a tensor of ints (not long) (#875)
taylanbil Jul 23, 2019
208295d
Update README.md
Jul 23, 2019
b49ea81
check save_dir before beginning training
Jul 24, 2019
3d764a3
Update torch.hub usage
Jul 25, 2019
8835d93
Standardize on 'teacher forcing' rather than 'input feeding' which is…
Jul 25, 2019
17fcc72
Add RoBERTa README
Jul 27, 2019
40f1687
Add return_all_hiddens flag to hub interface
Jul 27, 2019
5218a7c
Fix compatibility with PyTorch 1.0.x (Fixes #906)
Jul 28, 2019
abc13e2
Make hub_utils.generator inherit from nn.Module
Jul 28, 2019
8207f26
Misc dataset improvements
Jul 28, 2019
1362b21
Correctly zero padding index in TransformerSentenceEncoder
Jul 28, 2019
c446c44
Add Adamax optimizer
Jul 28, 2019
76ff39f
Change default --num-workers to 1
Jul 28, 2019
a80cade
Update BPE library code
Jul 29, 2019
8d036c2
Add RoBERTa
Jul 29, 2019
ce7f044
Add instructions to load RoBERTa models on PyTorch 1.0
Jul 29, 2019
36df0da
Fix RoBERTa model import (fixes #918)
Jul 29, 2019
2f6d8b3
Add missing files for RoBERTa hub interface
Jul 29, 2019
2fe45f0
Update README.md to add top-p sampling (#783)
xingz9 Jul 29, 2019
33597e5
Support different --max-positions and --tokens-per-sample
Jul 29, 2019
138dc8e
adding glue data preprocessing scripts (#771)
Jul 29, 2019
c132b9b
Fix tokenization (fixes #926) (#929)
Jul 30, 2019
e75cff5
Relicense fairseq under MIT license (#786)
Jul 30, 2019
3b2cecd
1) replaced fstring 2) fixed error from max-positions arg
Jul 30, 2019
d82517e
Add roberta.decode to hub interface to decode BPE (#931)
Jul 30, 2019
b651b00
Wmt19 models (#767)
nng555 Jul 31, 2019
37eb9f2
Use commandline interface in preprocess_GLUE_tasks.sh (#937)
villmow Jul 31, 2019
c5650bf
Update language_model README.md (#941)
nadongguri Jul 31, 2019
fe8a163
Roberta add classification finetuning example readme (#790)
ngoyal2707 Jul 31, 2019
94722a9
Fix citation errors (#791)
nng555 Jul 31, 2019
3e0e5be
Fix small syntax error in hub_utils.py (fixes #942)
Aug 1, 2019
5b2be87
Update PyTorch Hub interface
Aug 1, 2019
4abadbd
Fix sampling with beam>1
Aug 1, 2019
430905d
Changed tensor comparison return type from uint8 to bool (#21113)
izdeby Aug 1, 2019
45f23f6
Add more details for bulk BPE encoding
Aug 1, 2019
ea6cc1d
Use ==/!= to compare str, bytes, and int literals (#948)
cclauss Aug 1, 2019
ccb5dea
Fix wmt19 links (#796)
nng555 Aug 1, 2019
5f34252
Update beam search code to support torch.bool change
Aug 2, 2019
abb7ed4
Update READMEs for torch.hub
Aug 2, 2019
f02f70c
Add single-models for WMT'19 for hub tutorial
Aug 2, 2019
3903f46
Fewer torch.hub requirements (#959)
Aug 2, 2019
9012e87
Avoid cast in PositionalEmbeddings to fix BLEU drop in pytorch native…
cndn Aug 2, 2019
12258e5
Fix generating with a fixed prefix
Aug 3, 2019
c728b86
remove default params from args so architecture works properly
alexeib Aug 3, 2019
1684e16
Add doc string for Roberta.encode function
Aug 4, 2019
5d543f9
fixed roberta finetuning with --find-unused-parameters on multiGPU
Aug 5, 2019
e40e4b2
Add back set_epoch functionality lost in RoBERTa merge
Aug 6, 2019
2b7843d
Add code to realign RoBERTa features to word-level tokenizers
Aug 7, 2019
1e55bbd
Fix tests and GLUE finetuning (fixes #989)
Aug 7, 2019
a9eda73
Added mask_fill api and some examples in README (#807)
Aug 7, 2019
9a1038f
fixed reloading from checkpoint (#811)
Aug 7, 2019
72f9364
Asr initial push (#810)
Aug 8, 2019
439ead5
Integrate with Apache Arrow/Plasma in-memory store for large datasets…
Aug 8, 2019
6398aa9
replace 'mkdir' with 'mkdir -p' (#997)
gmhafiz Aug 8, 2019
3563e59
added superglue dev set results to readme
Aug 9, 2019
838e108
MacOS requires c++ flag (#1000)
vincentqb Aug 9, 2019
b6c55b6
added sentence ranking task and loss (#809)
jingfeidu Aug 9, 2019
a00ce13
Fix Python 3.5 compat
Aug 10, 2019
8324919
Add WSC task and criterion
Aug 10, 2019
c0a5d29
Fix torch.hub for MNLI
Aug 10, 2019
3bbdc55
Update --restore-file logic (partially fixes #999)
Aug 12, 2019
969f447
Remove LAMB optimizer (at least until we can test it more)
Aug 12, 2019
2b68e91
Lint
Aug 12, 2019
d003664
Minor fixes for RACE finetuning (#818)
Aug 12, 2019
0563d87
ignore files starting with . e.g. .ipynb_checkpoints (#819)
uralik Aug 12, 2019
577e4fa
fix cosine scheduler docstring
Aug 13, 2019
a171c2d
added readme code for inference with GLUE finetuned model
Aug 13, 2019
a33ac06
Add Commonsense QA task
Aug 13, 2019
d015d23
Add fairseq-validate
Aug 13, 2019
baa8ce1
Updates for PyTorch 1.2 masking/bool behavior
Aug 14, 2019
7c89e13
Fix tests
Aug 14, 2019
ffffe04
v0.7.2 -> v0.8.0 (#1017)
Aug 14, 2019
b870468
Update READMEs
Aug 14, 2019
f840564
initial light and dynamic convolution kernels (#547)
nng555 Aug 14, 2019
1d44cc8
added effcient wsc task/criterion for winogrande (#825)
ngoyal2707 Aug 15, 2019
ac66df4
Update README
Aug 15, 2019
49177c9
Backward reranking public (#667)
nng555 Aug 15, 2019
a8e3211
Update README
Aug 15, 2019
ed27ed8
BMUF Resetting local state param
Aug 15, 2019
a3cfd51
added hf bert bpe
Aug 16, 2019
851c022
added check in token block dataset for multiple consecutive blank lines
Aug 16, 2019
732d15a
implement tri-stage lr_scheduler (#1028)
Aug 17, 2019
0c75c76
Fix bug (the returned value has a dimension mismatch) in label-smooth…
violet-zct Aug 19, 2019
02cb5a4
remove shlex.quote in scripts/spm_train.py (#972)
freewym Aug 19, 2019
79460d3
add constrains when checking multiple consecutive blank lines (#1031)
Trinkle23897 Aug 19, 2019
2eb53b8
Add instructions to resume training from released RoBERTa models (fix…
Aug 19, 2019
6ce55e4
Small fixes
Aug 19, 2019
c81fed4
Back out "[fairseq][PR] Fix bug (the returned value has a dimension m…
Aug 19, 2019
4812f64
Fix method has same name as property
Aug 20, 2019
9e5edc1
Give path when checkpoint can't be found (#1040)
aryamccarthy Aug 20, 2019
7a31fe0
vggblock support without pooling and pooling_kernel_size missing self…
siddalmia Aug 21, 2019
a2f5361
Multiset (#838)
alexeib Aug 21, 2019
ba5f829
Parameterized criterions (#808)
0xjc Aug 21, 2019
93057cc
fix string format to work in python 3.5 (#1050)
Trinkle23897 Aug 21, 2019
3c2cf3b
Misc changes
Aug 22, 2019
8c509a9
Add links to cuda models (#828)
nng555 Aug 22, 2019
d4c9136
Fix year in noisy channel citation (#842)
nng555 Aug 22, 2019
6e2bd79
wav2vec everstore support
Aug 23, 2019
4fc3953
Cythonize token block dataset (#834)
Aug 23, 2019
833f053
Suppress leaked semaphore warnings
Aug 23, 2019
8a8c069
fix cython dependency in the setup (#847)
Aug 26, 2019
3ab8e0f
wav2vec everstore support fix
Aug 27, 2019
396ff7f
installing numpy headers for cython
Aug 27, 2019
920b85d
Minor update of README.md of language model example (#1063)
soskek Aug 27, 2019
d2410c4
Minor cleanup for setup.py
Aug 27, 2019
108f94b
use numpy function for filter by size when possible (#845)
Aug 28, 2019
0a96d22
Fix multi-gpu training (fixes #1088)
Aug 29, 2019
8777465
Adopt Contributor Covenant
zpao Aug 30, 2019
4a7cd58
set numpy seed explicitly + other minor fixes (#850)
alexeib Aug 30, 2019
c1951aa
add missing colorize dataset
alexeib Aug 31, 2019
746e59a
Improve support for `python setup.py build_ext --inplace`
Aug 31, 2019
8d4588b
Cleaner handling of numpy-based extensions in setup.py
Aug 31, 2019
20dfba7
fixed numpy based size filtering (#854)
Sep 1, 2019
6c00b33
Fix an error in the command about Hierarchical Neural Story Generatio…
altale Sep 3, 2019
1f0f7cd
added cython to install_requires
Sep 3, 2019
1566cfb
Fix multilingual translation bug for to-many case
pipibjc Sep 4, 2019
3e3fe72
Return predicted token for RoBERTa filling mask
raedle Sep 5, 2019
1fd8943
Average local optimizer param after warmup and during bmuf sync
Sep 12, 2019
e1ba32a
added fast stats sync option (#858)
Sep 16, 2019
a3882ab
Update README.md
Sep 17, 2019
31dd13f
Fix link to RACE fine-tuning instructions.
nelson-liu Sep 17, 2019
718677e
dont project maske tokens for mlm loss (#859)
Sep 18, 2019
8dbee4a
Minor fix to make adafactor work for >2d conv kernels (#1122)
akhileshgotmare Sep 18, 2019
f994c9b
Add autogenerated cython files to gitignore (#860)
jma127 Sep 18, 2019
0eaaf35
Add cython language_level hints
Sep 19, 2019
a8a85c2
Add dataset class for weighted sampling with replacement. (#861)
jma127 Sep 19, 2019
3233540
added multilingual masked LM training (#849)
Sep 20, 2019
e869c80
Update README.race.md
Sep 20, 2019
10f9349
Remove extraneous call to RNG in multi-GPU code path
Sep 20, 2019
3b09b98
fixed train valid epoch iter
Sep 23, 2019
3f4fc50
Miscellaneous documentation improvements: (#868)
jma127 Sep 23, 2019
2ed65b6
fixed corner case in mlm criterion when all tokens get masked
Sep 23, 2019
fa7dea6
Issue 1146: Minor fix to roberta pre-training readme (#1165)
mortonjt Sep 24, 2019
e073ddf
PR for Issue #1154: Two comments in lstm.py seem to be incorrect
vineetk1 Sep 26, 2019
2314979
Update getting_started.rst (#1188)
Michaelvll Sep 27, 2019
62e65c4
Explain the language modelling format in RoBERTa pretraining readme
louismartin Sep 27, 2019
6c1da0f
Fixing BMUF warmup and sync strategy
Sep 27, 2019
86857a5
Levenshtein Transformer paper code
kahne Sep 27, 2019
1cb267e
Fixing example of batched predictions for Roberta (#1195)
justachetan Sep 27, 2019
ea1a410
RoBERTa now supported on TPU and TensorFlow via transformers library
Sep 28, 2019
4ac2c5f
Implementation of the WeCNLP abstract "Cross+Self-Attention for Trans…
stephanpeitz Sep 29, 2019
1351972
fix typo in README of examples/translation
Sep 29, 2019
acb6fba
Fix torch.hub to not depend on libnat
Sep 30, 2019
1c66792
Implementation of the paper "Jointly Learning to Align and Translate …
sarthakgarg Sep 30, 2019
58e43cb
extract FP16OptimizerMixin for share the same logic in PyText (#1180)
chenyangyu1988 Oct 1, 2019
de348d1
Native Torchscript Wordpiece Tokenizer Op for BERTSquadQA, Torchscrip…
Oct 4, 2019
315c463
Add periodic CUDA cache cleanup (#882)
jma127 Oct 4, 2019
4cb895b
add pre-trained wav2vec model
alexeib Oct 5, 2019
6f58e15
Setting Global sync to 50 in BMUF
Oct 7, 2019
c216522
fix max lengths in Levenshtein Tramsformer
kahne Oct 8, 2019
34e79c5
ensemble levts
Oct 8, 2019
63b6b3f
Add printing of PyTorch memory summary on OOM (#885)
jma127 Oct 8, 2019
b6e001f
Fix data loading memory issue in pyspeech
Oct 9, 2019
33646ac
wav2letter integration
0xjc Oct 10, 2019
c4893ca
Add ctc loss to ASR task (#1233)
Oct 10, 2019
cce92bd
add new_arange function + FIX BUGS of returning attn values
MultiPath Oct 11, 2019
02b74c5
fix the random mask function for CMLM model
MultiPath Oct 11, 2019
d80ad54
Added option to save checkpoints using Path Manager.
sujitoc Oct 12, 2019
e3a40d9
fix libnat imports
kahne Oct 15, 2019
b5f41f8
Add Unit test cases for BMUF
Oct 15, 2019
3dcb5c7
fix levenshtein transfromer attn
kahne Oct 18, 2019
c8a7b62
fixed a bug in preprocess glue dataset dev filename (#1270)
DikshaMeghwal Oct 18, 2019
b8d024e
add missing function to FairseqLanguageModel
Oct 18, 2019
a3c629b
Fix typos on Examples for Nonautoregressive translation
MultiPath Oct 20, 2019
66d24dc
Enable separate models for insertion and deletion;
MultiPath Oct 20, 2019
34e6a5e
Fix load_dataset signature (#1281)
louismartin Oct 22, 2019
2d51e04
Rename "loaded {} batches" to "loaded {} blocks" (#1279)
louismartin Oct 22, 2019
e49b302
fix score
kahne Oct 22, 2019
8defa9d
Add warmup support in reduce_on_plateau lr schedule
Oct 23, 2019
5a2f76e
NAT productionization
cndn Oct 24, 2019
39faa0a
Reset both WPS and UPS on first minibatch (#891)
jma127 Oct 24, 2019
d0358bb
fix inconsistency w/ recent pytorch cuda device logic
jma127 Oct 24, 2019
5b086a0
OSS tracing compliant transformer to unbreak master (#1299)
cndn Oct 24, 2019
fdf4c3e
Simplify fairseq multihead attention (#888)
halilakin Oct 25, 2019
c07362c
Convert matmuls to quantizable nn.Linear modules (#1304)
halilakin Oct 25, 2019
eb68afc
fix a type mismatch in NAT quantization run
xianxl Oct 26, 2019
dabbef4
adding layerdrop code for training, pruning, and readme (#890)
huihuifan Oct 27, 2019
50cf3bb
Fix LevT generator interface
cndn Oct 28, 2019
856d8b8
layer drop
xianxl Oct 30, 2019
f30fc7d
Fix MultiheadAttention and torch hub
Oct 31, 2019
99c524c
Fix fairspeq unit test
Oct 31, 2019
4c6b689
Remove in_proj_weight/in_proj_bias in multihead attention and fix the…
halilakin Nov 1, 2019
828c1ca
Fix BPE for dual learning
chtran Nov 1, 2019
a0f7599
Fix building of docs
Nov 2, 2019
fd7dcac
option to suppress loss report
taylanbil Nov 8, 2019
7a23b93
Making tpu training work
taylanbil Jun 13, 2019
f17ad03
send meters to device
taylanbil Nov 14, 2019
734b14f
Revert inplace masked_fill_s so convergence occurs
taylanbil Nov 16, 2019
d370e6b
Merge branch 'tpu-rebase-master' of github.com:taylanbil/fairseq into…
taylanbil Nov 16, 2019
043b6a9
git wtf
taylanbil Nov 16, 2019
12aaf54
Clean up comments, unused imports, and reuse var in checkpoint saving
taylanbil Nov 18, 2019
8de1826
Added comments to various places of tpu related code change, and fixe…
taylanbil Nov 18, 2019
5120a2b
Added comments to various places of tpu related code change, and fixe…
taylanbil Nov 18, 2019
bbfeec9
More documentation for sequence padding
taylanbil Nov 18, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,16 @@ ENV/
.mypy_cache/

# Generated files
fairseq/temporal_convolution_tbc
/fairseq/temporal_convolution_tbc
/fairseq/modules/*_layer/*_forward.cu
/fairseq/modules/*_layer/*_backward.cu

# data
data-bin/

# reranking
/examples/reranking/rerank_data

# Cython-generated C++ source files
/fairseq/data/data_utils_fast.cpp
/fairseq/data/token_block_utils_fast.cpp
77 changes: 76 additions & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,77 @@
# Code of Conduct
Facebook has adopted a Code of Conduct that we expect project participants to adhere to. Please [read the full text](https://code.fb.com/codeofconduct) so that you can understand what actions will and will not be tolerated.

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to make participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at <opensource-conduct@fb.com>. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq

10 changes: 4 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Contributing to FAIR Sequence-to-Sequence Toolkit (PyTorch)
# Contributing to Facebook AI Research Sequence-to-Sequence Toolkit (fairseq)
We want to make contributing to this project as easy and transparent as
possible.

Expand All @@ -22,9 +22,7 @@ Complete your CLA here: <https://code.facebook.com/cla>
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.

## Coding Style
We try to follow the PEP style guidelines and encourage you to as well.

## License
By contributing to FAIR Sequence-to-Sequence Toolkit, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
By contributing to Facebook AI Research Sequence-to-Sequence Toolkit (fairseq),
you agree that your contributions will be licensed under the LICENSE file in
the root directory of this source tree.
43 changes: 17 additions & 26 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,30 +1,21 @@
BSD License
MIT License

For fairseq software
Copyright (c) Facebook, Inc. and its affiliates.

Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name Facebook nor the names of its contributors may be used to
endorse or promote products derived from this software without specific
prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
33 changes: 0 additions & 33 deletions PATENTS

This file was deleted.

112 changes: 67 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,53 @@
# Introduction <img src="fairseq_logo.png" width="50">
# <img src="fairseq_logo.png" width="30"> Introduction

Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
modeling and other text generation tasks. It provides reference implementations
of various sequence-to-sequence models, including:
modeling and other text generation tasks.

### What's New:

- September 2019: [Nonautoregressive translation code released](examples/nonautoregressive_translation/README.md)
- August 2019: [WMT'19 models released](examples/wmt19/README.md)
- July 2019: fairseq relicensed under MIT license
- July 2019: [RoBERTa models and code released](examples/roberta/README.md)
- June 2019: [wav2vec models and code released](examples/wav2vec/README.md)

### Features:

Fairseq provides reference implementations of various sequence-to-sequence models, including:
- **Convolutional Neural Networks (CNN)**
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
- [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
- **_New_** [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
- [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)
- [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
- [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
- [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
- **LightConv and DynamicConv models**
- **_New_** [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
- [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
- **Long Short-Term Memory (LSTM) networks**
- [Luong et al. (2015): Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/abs/1508.04025)
- [Wiseman and Rush (2016): Sequence-to-Sequence Learning as Beam-Search Optimization](https://arxiv.org/abs/1606.02960)
- Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
- **Transformer (self-attention) networks**
- [Vaswani et al. (2017): Attention Is All You Need](https://arxiv.org/abs/1706.03762)
- [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
- [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
- **_New_** [Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling](examples/language_model/transformer_lm/README.md)
- **_New_** [Shen et al. (2019): Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)

Fairseq features:
- Attention Is All You Need (Vaswani et al., 2017)
- [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
- [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
- [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/transformer_lm/README.md)
- [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
- [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
- [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )
- **Non-autoregressive Transformers**
- Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
- Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
- [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)


**Additionally:**
- multi-GPU (distributed) training on one machine or across multiple machines
- fast generation on both CPU and GPU with multiple search algorithms implemented:
- beam search
- Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
- sampling (unconstrained and top-k)
- sampling (unconstrained, top-k and top-p/nucleus)
- large mini-batch training even on a single GPU via delayed updates
- mixed precision training (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
Expand All @@ -39,35 +59,33 @@ translation and language modeling datasets.

# Requirements and Installation

* [PyTorch](http://pytorch.org/) version >= 1.0.0
* [PyTorch](http://pytorch.org/) version >= 1.2.0
* Python version >= 3.5
* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
* **For faster training** install NVIDIA's [apex](https://github.com/NVIDIA/apex) library with the `--cuda_ext` option

To install fairseq:
```bash
pip install fairseq
```

Please follow the instructions here to install PyTorch: https://github.com/pytorch/pytorch#installation.
On MacOS:
```bash
CFLAGS="-stdlib=libc++" pip install fairseq
```

If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.

After PyTorch is installed, you can install fairseq with `pip`:
```
pip install fairseq
```

**Installing from source**

To install fairseq from source and develop locally:
```
```bash
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
```

**Improved training speed**

Training speed can be further improved by installing NVIDIA's
[apex](https://github.com/NVIDIA/apex) library with the `--cuda_ext` option.
fairseq will automatically switch to the faster modules provided by apex.

# Getting Started

The [full documentation](https://fairseq.readthedocs.io/) contains instructions
Expand All @@ -80,28 +98,32 @@ We provide pre-trained models and pre-processed, binarized test sets for several
as well as example training and evaluation commands.

- [Translation](examples/translation/README.md): convolutional and transformer models are available
- [Language Modeling](examples/language_model/README.md): convolutional models are available
- [Language Modeling](examples/language_model/README.md): convolutional and transformer models are available
- [wav2vec](examples/wav2vec/README.md): wav2vec large model is available

We also have more detailed READMEs to reproduce results from specific papers:
- [Schneider et al. (2019): wav2vec: Unsupervised Pre-training for Speech Recognition](examples/wav2vec/README.md)
- [Shen et al. (2019) Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)
- [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
- [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
- [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
- [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
- [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )
- [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)
- [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
- [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
- [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
- [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
- [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
- [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
- [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
- [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
- [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)

# Join the fairseq community

* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users

# License
fairseq(-py) is BSD-licensed.
fairseq(-py) is MIT-licensed.
The license applies to the pre-trained models as well.
We also provide an additional patent grant.

# Citation

Expand Down
Loading