Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Add AMP + Update Benchmarking Script #1405

Merged
merged 64 commits into from
Nov 6, 2020
Merged

Add AMP + Update Benchmarking Script #1405

merged 64 commits into from
Nov 6, 2020

Conversation

sxjscience
Copy link
Member

@sxjscience sxjscience commented Oct 26, 2020

Description

Add AMP support + update benchmarking script.

  • Testing utility of fp16 backbone
  • Update tests and benchmarking script
  • Update SQuAD finetuning to use amp
  • Use boolean masking

Some issues:

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

cc @dmlc/gluon-nlp-team

@github-actions
Copy link

6 similar comments
@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@codecov
Copy link

codecov bot commented Oct 26, 2020

Codecov Report

Merging #1405 into master will increase coverage by 0.42%.
The diff coverage is 87.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1405      +/-   ##
==========================================
+ Coverage   85.13%   85.55%   +0.42%     
==========================================
  Files          53       53              
  Lines        6928     6987      +59     
==========================================
+ Hits         5898     5978      +80     
+ Misses       1030     1009      -21     
Impacted Files Coverage Δ
src/gluonnlp/models/transformer.py 98.94% <ø> (ø)
src/gluonnlp/optimizer.py 82.22% <ø> (-0.20%) ⬇️
src/gluonnlp/models/transformer_xl.py 80.80% <14.28%> (-1.65%) ⬇️
src/gluonnlp/utils/testing.py 94.16% <94.11%> (-0.04%) ⬇️
src/gluonnlp/attention_cell.py 80.39% <100.00%> (ø)
src/gluonnlp/data/sampler.py 96.57% <100.00%> (+0.02%) ⬆️
src/gluonnlp/models/bart.py 93.75% <100.00%> (+7.50%) ⬆️
src/gluonnlp/models/gpt2.py 98.26% <100.00%> (+0.01%) ⬆️
src/gluonnlp/utils/misc.py 58.58% <0.00%> (-0.62%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1726dd2...6b2e1ea. Read the comment docs.

@sxjscience sxjscience changed the title [WIP]Add AMP + Use Boolean Mask + Update Benchmarking Script [WIP]Add AMP + Update Benchmarking Script Oct 26, 2020
@github-actions
Copy link

5 similar comments
@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update testing.py

Update attention_cell.py

Update testing.py

Update testing.py

Update testing.py

Update test_models_bert.py

Update run_batch_squad.sh

Update generate_commands.py

Update run_batch_squad.sh

Update run_batch_squad.sh

Update run_batch_squad.sh

Add region

Update generate_commands.py

Update run_squad.template

Try to use clip 1.0

update

Update README.md

Update attention_cell.py

Update benchmark_gluonnlp.py

Update attention_cell.py

Update testing.py

Update run_squad.py

Update attention_cell.py

Update attention_cell.py

Update attention_cell.py

update

Update attention_cell.py

update

Update numbers + log + weight

update

update

Update testing.py
@github-actions
Copy link

1 similar comment
@github-actions
Copy link

@sxjscience sxjscience marked this pull request as ready for review November 5, 2020 16:21
@sxjscience sxjscience requested a review from a team as a code owner November 5, 2020 16:21
@sxjscience sxjscience changed the title [WIP]Add AMP + Update Benchmarking Script Add AMP + Update Benchmarking Script Nov 5, 2020
@github-actions
Copy link

github-actions bot commented Nov 5, 2020

@github-actions
Copy link

github-actions bot commented Nov 5, 2020

Copy link
Contributor

@barry-jin barry-jin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 188 to 189
### Run with AWS Batch
We can quickly run the squad finetuning via the [AWS Batch support](../../tools/batch).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the batch support refactored so that it can be bootstrapped in any AWS account? including VPC setup, security group, etc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it doesn't work we should avoid advertising this as user facing feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's for the purpose of running the experiments quickly if you have batch access. Otherwise, we will end up writing everything inside the batch folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also ok to just keep the script somewhere local or in a gist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also considered this. The problem is that it will increase the communication time and it can add some values to the user if we will later have the cloudformation script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the feature doesn't work for a user as is, it will cause confusion and friction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping it as part of infra code is also ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be let's still keep it in the batch folder.

Comment on lines 1 to 10
#!/bin/bash

set -ex

LOG_PATH=$1
SAVE_DIR_NAME=${2:-squad_2.0}

while read -r job_name job_id; do
aws s3 sync s3://gluon-nlp-dev/batch/${job_id}/temp ${SAVE_DIR_NAME}/${job_name}
done < ${LOG_PATH}
Copy link
Member

@szha szha Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should check in all the convenience scripts in the scripts folder, because this folder is user-facing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, I feel that it's fine to help reproduce the experiments for our own purpose. And also later add to the scheduled check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved now.

szha
szha previously requested changes Nov 6, 2020
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the non-user-facing scripts from the user facing scripts folder.

@szha szha dismissed their stale review November 6, 2020 00:37

concerns addressed. great addition and a step closer to full prod support

@github-actions
Copy link

github-actions bot commented Nov 6, 2020

@szha szha merged commit dd45270 into dmlc:master Nov 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants