Skip to content

Commit

Permalink
Merge r1.7.1 to main (#3824)
Browse files Browse the repository at this point in the history
* Tn bug 1.7.0 (#3730)

* fix es and fr bug

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add file

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [TTS] Fix bugs in E2E TTS, Mixer-TTS and FastPitch (#3740)

* fix bugs

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* fix bug in e2e tts and mixer tts

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* Mirror AN4 data while servers are down (#3743)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Bugfix for GPT eval  (#3744)

* use tokens_cut not tokens

Signed-off-by: ericharper <complex451@gmail.com>

* remove precision conversion and comment jit for bias gelu

Signed-off-by: ericharper <complex451@gmail.com>

* revert comment update mbs in config

Signed-off-by: ericharper <complex451@gmail.com>

* calculate micro_batch_size during complete and compute_logprobs

Signed-off-by: ericharper <complex451@gmail.com>

* ASR SSL update (#3746)

* ssl update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* tutorial update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* Fix SSL configs for 1.7 (#3748)

* ssl update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* tutorial update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* revert configs

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* revert configs

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* punct process bug fix (#3747)

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updated conformer models. (#3741)

Signed-off-by: Vahid <vnoroozi@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* Yuya/megatron t5 glue eval (#3751)

* Add megatron t5 glue eval-only script

Signed-off-by: Yu Yao <yuya@nvidia.com>

* Update megatron t5 glue eval default configs

Signed-off-by: Yu Yao <yuya@nvidia.com>

* Update megatron t5 glue eval configs

Signed-off-by: Yu Yao <yuya@nvidia.com>

* Update config comments

Signed-off-by: Yu Yao <yuya@nvidia.com>

Co-authored-by: Yu Yao <yuya@nvidia.com>

* Specify gpus in SSL notebook (#3753)

* ssl update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* tutorial update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* revert configs

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* revert configs

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* specify gpus

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* Duplex model inference fix, money encoder fix (#3754)

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* Update docs for RNNT and overriding fused batch size (#3755)

Signed-off-by: smajumdar <titu1994@gmail.com>

* fix consumed samples calculation + PTune Model bugs (#3738)

* fix the way computing consumed samples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fixed ptune model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make sure notebook is working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added try-catch

Signed-off-by: Yi Dong <yidong@nvidia.com>

Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* fix directories in ssl notebook (#3758)

* ssl update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* tutorial update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* revert configs

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* revert configs

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* specify gpus

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* update dirs

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* TN docs update (#3735)

* TN docs update: audio based docs added, quick start, ref fixed, etc

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deployment script dir and Sp TN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>

* Update Tacotron2_Training.ipynb (#3769)

Signed-off-by: Jason <jasoli@nvidia.com>

* fix dockerfile (#3778)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Prompt-Tuning-Documentation (#3777)

* Update megatron.rst

* Updated example prompt tuning script's doc string

* Update megatron.rst

* Update megatron.rst

Co-authored-by: Eric Harper <complex451@gmail.com>

* Prompt tuning bug fix (#3780)

* Making updated code backwards compatible with previous prompt tuned models

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Fixed backward compatiablity bug

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Removed random import

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* revert changes (#3785)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Fixed soft prompt eval loading bug (#3805)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* mT5 whole word masking and T5 finetuning config fixes (#3776)

* O2 and whole word masking changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update yaml

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Tok and O2 fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix arg passing

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix checkpoint path

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Raise error if FP16 training is tried with O2 recipe. (#3806)

* raise error

Signed-off-by: ericharper <complex451@gmail.com>

* update assert

Signed-off-by: ericharper <complex451@gmail.com>

* update error message

Signed-off-by: ericharper <complex451@gmail.com>

* update error message

Signed-off-by: ericharper <complex451@gmail.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* remove test

Signed-off-by: ericharper <complex451@gmail.com>

* revert bad merges

Signed-off-by: ericharper <complex451@gmail.com>

* revert change partitions

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
  • Loading branch information
14 people authored and fayejf committed Mar 22, 2022
1 parent 50952f5 commit 4211f17
Show file tree
Hide file tree
Showing 12 changed files with 88 additions and 73 deletions.
55 changes: 28 additions & 27 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -2181,33 +2181,34 @@ pipeline {
}


stage('L2: Megatron GPT Convert from Megatron-LM checkpoing and Eval') {
when {
anyOf {
branch 'main'
changeRequest target: 'main'
}
}
failFast true
steps {
sh "python -m torch.distributed.launch --nproc_per_node=2 \
examples/nlp/language_modeling/megatron_lm_ckpt_to_nemo.py \
--checkpoint_folder=/home/TestData/nlp/megatron_gpt/data/gpt/iter_0008700 \
--checkpoint_name=model_optim_rng.pt \
--hparams_file=/home/TestData/nlp/megatron_gpt/data/gpt/iter_0008700/hparams.yaml \
--nemo_file_path=examples/nlp/language_modeling/small_gpt.nemo \
--model_type=gpt \
--pipeline_model_parallel_size=1 \
--gpus_per_node=2 \
--tensor_model_parallel_size=2"
sh "python examples/nlp/language_modeling/megatron_gpt_eval.py \
--model_file=examples/nlp/language_modeling/small_gpt.nemo \
--tokens_to_generate=32 \
--tensor_model_parallel_size=2 \
--prompt='This is a test.'"
sh "rm examples/nlp/language_modeling/small_gpt.nemo"
}
}
// TODO: Add this test back. Test was failing on CI machines due to HW error
// stage('L2: Megatron GPT Convert from Megatron-LM checkpoing and Eval') {
// when {
// anyOf {
// branch 'main'
// changeRequest target: 'main'
// }
// }
// failFast true
// steps {
// sh "python -m torch.distributed.launch --nproc_per_node=2 \
// examples/nlp/language_modeling/megatron_lm_ckpt_to_nemo.py \
// --checkpoint_folder=/home/TestData/nlp/megatron_gpt/data/gpt/iter_0008700 \
// --checkpoint_name=model_optim_rng.pt \
// --hparams_file=/home/TestData/nlp/megatron_gpt/data/gpt/iter_0008700/hparams.yaml \
// --nemo_file_path=examples/nlp/language_modeling/small_gpt.nemo \
// --model_type=gpt \
// --pipeline_model_parallel_size=1 \
// --gpus_per_node=2 \
// --tensor_model_parallel_size=2"
// sh "python examples/nlp/language_modeling/megatron_gpt_eval.py \
// --model_file=examples/nlp/language_modeling/small_gpt.nemo \
// --tokens_to_generate=32 \
// --tensor_model_parallel_size=2 \
// --prompt='This is a test.'"
// sh "rm examples/nlp/language_modeling/small_gpt.nemo"
// }
// }
stage('L2: Megatron Change Partitions') {
when {
anyOf {
Expand Down
16 changes: 11 additions & 5 deletions docs/source/nlp/megatron.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,18 +173,20 @@ Prompt tuning is a continuous or soft prompt approach to finding the optimal pro
Implementation Overview
^^^^^^^^^^

Our current prompt tuning implementation adapt’s Lester et. al’s EMNLP 2021 "`The Power of Scale for Parameter-Efficient Prompt Tuning <https://arxiv.org/abs/2104.08691>`_" to prompt tuning for GPT style models. In this implementation, a number of soft tokens specified by the user are prepended to the beginning of the discrete token input embeddings during the forward pass. During training, all model parameters are frozen except for those corresponding to the soft tokens. Only the soft prompt parameters are updated via gradient decent in the backward pass. Each soft token has the same dimensionality as a regular token embedding from the model’s vocabulary corresponding to the ``hidden_size`` hyperparameter. Soft token embeddings can be initialized randomly or with selected existing embeddings from the pretrained model.
Our current prompt tuning implementation adapt’s Lester et. al’s EMNLP 2021 "`The Power of Scale for Parameter-Efficient Prompt Tuning <https://arxiv.org/abs/2104.08691>`_" to prompt tuning for GPT style models. In this implementation, a number of soft tokens specified by the user are prepended to the beginning of the discrete token input embeddings during the forward pass. During training, all model parameters are frozen except for those corresponding to the soft tokens. Only the soft prompt parameters are updated via gradient decent in the backward pass. Each soft token has the same dimensionality as a regular token embedding from the model’s vocabulary corresponding to the ``hidden_size`` hyperparameter. Soft token embeddings can be initialized randomly or with selected existing embeddings from the pretrained model.

As of NeMo 1.7 prompt tuning now works with tensor parallel > 1.

Data Formatting
^^^^^^^^^^

The dataset should be a .json file where each json object has 2 fields: ``prompt_tag`` and ``text``.
The dataset should be a .jsonl file where each json object has 3 fields: ``prompt_tag``, ``text``, and ``answer``.

.. code::
{"prompt_tag": [tag1], "text": [text1]}
{"prompt_tag": [tag1], "text": [text2]}
{"prompt_tag": [tag1], "text": [text3]}
{"prompt_tag": [tag1], "text": [text1], "answer": [answer1]}
{"prompt_tag": [tag1], "text": [text2], "answer": [answer2]}
{"prompt_tag": [tag1], "text": [text3], "answer": [answer3]}
.. _data-example-label:

Expand Down Expand Up @@ -218,6 +220,9 @@ Prompt Tuning Specific Config Values
* - **model.new_prompt_init_text**
- list of strings
- The text you want to use for soft prompt initalization if ``model.new_prompt_init_methods`` is set to ['text']. The text is tokenized and clipped or tiled to match ``model.num_prompt_tokens``. The vocab embeddings associated with each token are copied and use to initialize the soft prompts.
* - **model.calc_loss_on_answer_only**
- bool
- Whether to calculate cross entropy loss on the full text input or only the answer portion of the input during prompt tuning.
* - **model.data.train_ds**
- string
- path to training dataset .json or .jsonl file. See `Data Formatting`_ for an example
Expand All @@ -228,6 +233,7 @@ Prompt Tuning Specific Config Values

Example Prompt Tuning Command for the First Task
^^^^^^^^^^

.. code::
EXPR_NAME='winogrande_prompt_tuning'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ trainer:
enable_checkpointing: False
replace_sampler_ddp: False
max_epochs: null
max_steps: 1000 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches
max_steps: 3000 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches
log_every_n_steps: 10
val_check_interval: 50
val_check_interval: 250
limit_val_batches: 50
limit_test_batches: 500
accumulate_grad_batches: 1 # do not modify, grad acc is automatic for training megatron models
Expand Down Expand Up @@ -43,7 +43,7 @@ model:
# specify micro_batch_size, global_batch_size, and model parallelism
# gradient accumulation will be done automatically based on data_parallel_size
micro_batch_size: 4 # limited by GPU memory
global_batch_size: 16 # will use more micro batches to reach global batch size
global_batch_size: 8 # will use more micro batches to reach global batch size
tensor_model_parallel_size: 1 # intra-layer model parallelism
pipeline_model_parallel_size: 1 # inter-layer model parallelism

Expand Down Expand Up @@ -117,7 +117,7 @@ model:

optim:
name: fused_adam
lr: 2e-4
lr: 1e-5
weight_decay: 0.01
betas:
- 0.9
Expand All @@ -126,4 +126,4 @@ model:
name: CosineAnnealing
warmup_steps: 50
constant_steps: 10
min_lr: 2e-5
min_lr: 1e-6
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ exp_manager:
resume_ignore_no_checkpoint: True
create_checkpoint_callback: True
checkpoint_callback_params:
monitor: val_acc
monitor: validation_acc
save_top_k: 10
mode: max
always_save_nemo: False # TODO: add support
filename: 'megatron_t5--{val_acc:.3f}-{step}'
filename: 'megatron_t5--{validation_acc:.3f}-{step}'
model_parallel_size: ${model.tensor_model_parallel_size}
save_best_model: True

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ exp_manager:
resume_ignore_no_checkpoint: True
create_checkpoint_callback: True
checkpoint_callback_params:
monitor: val_acc
monitor: validation_acc
save_top_k: 10
mode: max
always_save_nemo: False # TODO: add support
filename: 'megatron_t5--{val_acc:.3f}-{step}'
filename: 'megatron_t5--{validation_acc:.3f}-{step}'
model_parallel_size: ${model.tensor_model_parallel_size}
save_best_model: True

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -162,18 +162,13 @@ def create_tokens_and_tokentypes(tokens_a, tokens_b, cls_id, sep_id):
MaskedLmInstance = collections.namedtuple("MaskedLmInstance", ["index", "label"])


def is_start_piece(piece, tokenizer_type='wordpiece'):
def is_start_piece(piece):
"""Check if the current word piece is the starting piece. (BERT)"""
# When a word has been split into
# WordPieces, the first token does not have any marker and any subsequence
# tokens are prefixed with ##. So whenever we see the ## token, we
# append it to the previous set of word indexes.
if tokenizer_type == 'wordpiece':
return not piece.startswith("##")
elif tokenizer_type == 'sentencepiece':
return piece.startswith('▁')
else:
raise ValueError(f"Tokenizer type {tokenizer_type} is not supported.")
return not piece.startswith("##")


def create_masked_lm_predictions(
Expand Down Expand Up @@ -217,15 +212,11 @@ def create_masked_lm_predictions(
# Note that Whole Word Masking does *not* change the training code
# at all -- we still predict each WordPiece independently, softmaxed
# over the entire vocabulary.
if (
whole_word_masking
and len(cand_indexes) >= 1
and not is_start_piece(vocab_id_to_token_dict[token], tokenizer_type=tokenizer_type)
):
if whole_word_masking and len(cand_indexes) >= 1 and not is_start_piece(vocab_id_to_token_dict[token]):
cand_indexes[-1].append(i)
else:
cand_indexes.append([i])
if is_start_piece(vocab_id_to_token_dict[token], tokenizer_type=tokenizer_type):
if is_start_piece(vocab_id_to_token_dict[token]):
token_boundary[i] = 1

output_tokens = list(tokens)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,10 @@ def __init__(
if not self.tokenizer.legacy:
raise ValueError("Sentencepiece Tokenizer must have legacy = False to add special tokens.")
self.tokenizer_type = 'sentencepiece'
if whole_word_masking:
raise ValueError(
"Whole word masking is not supported with sentencepiece tokenizers and only with wordpiece tokenizers. Please set it to False."
)

self.cls_id = tokenizer.cls_id
self.sep_id = tokenizer.sep_id
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,10 +139,6 @@ def __init__(self, cfg: DictConfig, trainer: Trainer):
tensor_model_parallel_size=cfg.get('tensor_model_parallel_size', 1),
)

# TODO: Not sure how to use lists of modules with PTL.
# This means we can only use pipeline parallelism without the interleaved schedule.
self.model = build_model(model_provider_func=self.model_provider_func, wrap_with_ddp=False)[0]

# Prompt tuning initialization
self.use_soft_prompts = self.cfg.get('use_soft_prompts', False)

Expand All @@ -156,12 +152,27 @@ def __init__(self, cfg: DictConfig, trainer: Trainer):
self.num_prompt_tokens = cfg.get('num_prompt_tokens', 100)

if self.cfg.get('existing_prompt_tags', None):
# Assign prompt tag ids if none were present in the config
if type(self.cfg.existing_prompt_tags[0]) == str:
existing_prompt_tags = self.cfg.existing_prompt_tags
num_prompt_tags = len(existing_prompt_tags)
existing_prompt_tags = [
(existing_prompt_tags[tag_id], tag_id + 1) for tag_id in range(num_prompt_tags)
]

with open_dict(self.cfg):
self.cfg.existing_prompt_tags = existing_prompt_tags

# Fill table with prev tuned prompt tags and their ids
self.prompt_table = set(self.cfg.existing_prompt_tags)

# Get max prompt id from table for starting point of new prompt ids
self.next_prompt_id = max(self.prompt_table, key=lambda x: x[1])[1]

# TODO: Not sure how to use lists of modules with PTL.
# This means we can only use pipeline parallelism without the interleaved schedule.
self.model = build_model(model_provider_func=self.model_provider_func, wrap_with_ddp=False)[0]

self.setup_optimizer_param_groups()

self.megatron_amp_o2 = cfg.get('megatron_amp_O2', False)
Expand Down Expand Up @@ -662,13 +673,13 @@ def setup(self, stage=None):
init_consumed_samples = 0
self.init_consumed_samples = init_consumed_samples

# Initalize soft prompts before loading datasets and training
if self.use_soft_prompts:
self.init_new_prompts()

if stage == 'predict':
return
else:
# Initalize soft prompts before loading datasets and training
if self.use_soft_prompts:
self.init_new_prompts()

# TODO: consider adding a ModelPT guard to check if model is being restored.
# allowing restored models to optionally setup datasets
self.build_train_valid_test_datasets()
Expand Down Expand Up @@ -737,6 +748,9 @@ def configure_optimizers(self):
fp32_grad_accum = False
# TODO: contiguous grad bucket for fp16 is also planned to be supported
contiguous_grad_bucket = False
raise ValueError(
"fp16 training is not yet supported with O2. Please set megatron_amp_O2 to False in the model config."
)

# TODO: this should be true when not using pipeline parallelism
# we will support that for bf16 when we have async handler from apex
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,7 @@ def build_pretraining_data_loader(self, dataset, consumed_samples):
)

def setup(self, stage=None):
resume_checkpoint_path = self.trainer.checkpoint_connector.resume_checkpoint_path
resume_checkpoint_path = self.trainer.checkpoint_connector.resume_from_checkpoint_fit_path
if resume_checkpoint_path:
try:
init_consumed_samples = int(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,12 +125,12 @@ def build_train_valid_test_datasets(self):
seed=self._cfg.seed,
skip_warmup=self._cfg.data.skip_warmup,
dataset_type=self._cfg.data.get('dataset_type', 't5'),
max_ngram_size=self._cfg.get('max_ngram_size', 10),
mean_ngram_size=self._cfg.get('mean_ngram_size', None),
geometric_dist=self._cfg.get('geometric_dist', True),
permutation=self._cfg.get('permutation', False),
whole_word_masking=self._cfg.get('whole_word_masking', True),
favor_long_ngrams=self._cfg.get('favor_long_ngrams', False),
max_ngram_size=self._cfg.data.get('max_ngram_size', 10),
mean_ngram_size=self._cfg.data.get('mean_ngram_size', None),
geometric_dist=self._cfg.data.get('geometric_dist', True),
permutation=self._cfg.data.get('permutation', False),
whole_word_masking=self._cfg.data.get('whole_word_masking', True),
favor_long_ngrams=self._cfg.data.get('favor_long_ngrams', False),
)
logging.info(f'Length of train dataset: {len(self._train_ds)}')
logging.info(f'Length of val dataset: {len(self._validation_ds)}')
Expand Down
2 changes: 1 addition & 1 deletion nemo/core/optim/optimizer_with_main_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def __init__(
'which is supposed to be accumulated after grad op.'
)
assert contiguous_grad_bucket, (
'currently async_grad_allreduce is supported only ' 'with async_grad_allreduce.'
'currently async_grad_allreduce is supported only ' 'with contiguous_grad_bucket.'
)

self._fp32_grad_accum = fp32_grad_accum
Expand Down
5 changes: 2 additions & 3 deletions tools/text_processing_deployment/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,8 @@ RUN apt-get install build-essential -y && apt-get install wget -y
RUN wget https://github.com/protocolbuffers/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
RUN tar xzvf protobuf-2.5.0.tar.gz
RUN cd protobuf-2.5.0 && ./configure && make && make install && ldconfig
COPY ../../nemo_text_processing/ /tmp/nemo/nemo_text_processing/
RUN bash /tmp/nemo/nemo_text_processing/setup.sh
RUN conda install -c conda-forge thrax=1.3.4 -y
RUN git clone https://github.com/yzhang123/sparrowhawk.git
RUN cd sparrowhawk && git checkout test && apt-get install -y autoconf && bash autoreconf && ./configure && make && make install && ldconfig
RUN git clone https://github.com/kward/shunit2.git
RUN echo "DONE"
RUN echo "DONE"

0 comments on commit 4211f17

Please sign in to comment.