Ade20k benchmark #27

A-Jacobson · 2022-11-21T08:22:15Z

ade20k benchmark in the style of the resnet benchmark. definitely not generated by gpt.

Notes:

Added download_ade20k.py - considered adding a download=True flag to the main script but it felt complicated to include along with the streaming object. Thought this may be a more simple compromise but am open to suggestions.
mcloud run config still points to my fork. will need to be changed once this is merged.
light on tests.. probably?

Additionally:

Did some renaming and refactoring to keep this in line with the resnet blog. Examples include:

local dataset splits arg from training, validation, test - > train, val, test
build_ade20k_dataloader -> build_ade20k_dataspec - this is was the resnet example is doing and these functions are actually retuning dataspecs
composer_deeplab -> build_composer_deeplab - now takes pytorch initialization functions rather than composer Initializer objects

Landanjs

This was a lot to review haha, great job getting this up!

Do you wandb runs for this?

Requested big changes:

Combine build_ade20k_dataspec and build_streaming_ade20k_dataspec, then add a is_streaming argument. This isn't perfect, but it does reduce the data.py and main.py code a bit.
Should the synthetic ADE20k dataset be removed?
I think we should remove the option for v1 streaming since this will be deprecated in the future.
Can we remove the plus argument for the model and associated code? Reduces the code a bit and I don't have a good reason to keep the code around anymore.
I think a smoke test (e.g. test_trainer in the ResNet benchmark) is a main component for benchmarks. Would it be possible to add this? @dblalock for clarification on its importance

Nits:

Add newline between docstring TL;DR and Args:, between Args: and Return: (mostly in data.py I think?).
Double check that there is 1 newline at the end of each file

README.md

deeplab/main.py

deeplab/yamls/deeplabv3.yaml

deeplab/data.py

deeplab/README.md

dblalock

This is awesome. I'm so excited to finally train a segmentation model.

But anyway, LGTM other than some local nitpicking. We'll also have to dedup the MCLI vs standalone configs, but that's a separate PR.

README.md

deeplab/README.md

deeplab/transforms.py

deeplab/model.py

dblalock · 2022-11-22T10:08:18Z

deeplab/model.py

+    if sync_bn and world_size > 1:
+        local_world_size = dist.get_local_world_size()


Do we need sync batchnorm? Just looking for ways to simplify what we give to users + customers.

it seems that we aren't using it but that's very much batch size dependent. It's likely going to be useful if someone has a smaller gpu or large enough image size to force their per device batch size down to ~4 or less.

deeplab/data.py

Co-authored-by: dblalock <dwb4ke@virginia.edu>

* ResNet Benchmark (mosaicml#25) Updates the ResNet benchmark to no longer use yahp. Co-authored-by: Matthew <growlix@users.noreply.github.com> Co-authored-by: dblalock <dwb4ke@virginia.edu> * correcting license and headers (mosaicml#29) * Forgot to change branch in resnet benchmark... (mosaicml#30) Forgot to change branch... * Update LLM benchmark with eval, HF models, bugfixes (mosaicml#26) * Ade20k benchmark (mosaicml#27) * Vitaliy/compare hf mosaic (mosaicml#28) * compare mosaic GPT vs HF GPT2 * cleanup * updt abhi cmts Co-authored-by: Vitaliy Chiley <vitaliy@moasic.com> * Vitaliy/sync foundry tests (mosaicml#32) * porting foundry tests * rm unittest * xfail test when dataset is not set up Co-authored-by: Vitaliy Chiley <vitaliy@moasic.com> * CIFAR benchmark (mosaicml#31) * Add cifar benchmark Co-authored-by: dblalock <dwb4ke@virginia.edu> * Add codeowners for each existing benchmark (mosaicml#36) add codeowners for each existing benchmark * fix torch attn init (mosaicml#38) * add init for nn.MultiheadAttention * Bert pre-training and fine-tuning on GLUE (mosaicml#24) Adds benchmark examples for BERT pre-training and fine-tuning on GLUE, and features support for HF models as well as a Mosaic BERT which is implemented and introduced here. See README.md for a detailed description of these components. TODO in a future PR: - README: Add final speedup results - YAMLs: Add configuration files for experiments shown in results - Tests * Add precommit linting + get repo passing all checks (mosaicml#37) This adds the following pre-commit checks: yapf pyright pycln isort inserting our license header into every python file docformatter pydocstyle yamllint yamlfmt This is mostly a copy-pasted .pre-commit-config.yaml, pyproject.toml, and .yamllint.yaml from the streaming repo, along with the associated code autoformating. There was some manual intervention to fix license headers and occasional edge cases where the linters/formatters couldn't all agree (e.g., spacing before a function in another function without a docstring). I also just tightened up some of the docstrings when I was already making them satisfy the linters. * Update flash attn version (mosaicml#40) * Update flash attn version * Update llm/requirements.txt Co-authored-by: Abhi Venigalla <77638579+abhi-mosaic@users.noreply.github.com> Co-authored-by: Abhi Venigalla <77638579+abhi-mosaic@users.noreply.github.com> * Update URL to streaming docs * printing state * change batch number for fast forwarding * fast forward batches properly * Raise error if global batch size not divisible by world size (mosaicml#41) Adding in batchsize error, s.t. a user won't accidentally use an incorrect batchsize * fixed save_overwrite * hard peg version of mosaicml-streaming * set mosaicml-streaming constraint to <0.2.0 * fixed part of bad merge * mostly back to sophia's version * mcli-streaming<0.2.0 * fixed lambada task * attempt to bump streaming version * attempt at using StreamingDataset * used wrong StreamingDataset spec * one more misaligned field * passing shuffle seed to train dataloader * didn't upload changes to data_c4 * updated config Co-authored-by: Landan Seguin <landan@mosaicml.com> Co-authored-by: Matthew <growlix@users.noreply.github.com> Co-authored-by: dblalock <dwb4ke@virginia.edu> Co-authored-by: Vitaliy Chiley <vitaliy@mosaicml.com> Co-authored-by: Jeremy D <115047575+bmosaicml@users.noreply.github.com> Co-authored-by: Austin <A-Jacobson@users.noreply.github.com> Co-authored-by: Vitaliy Chiley <vitaliy@moasic.com> Co-authored-by: dblalock <davis@mosaicml.com> Co-authored-by: Alex Trott <alex@mosaicml.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: Abhi Venigalla <77638579+abhi-mosaic@users.noreply.github.com> Co-authored-by: Bandish Shah <bandish@mosaicml.com> Co-authored-by: bcui19 <bcui8377@gmail.com> Co-authored-by: Sophia Wisdom <sophiawisdom1999@gmail.com>

A-Jacobson added 3 commits November 16, 2022 12:42

outline

8a79bec

robbed readme from landan

7a30e2b

type in readme

cad6dff

A-Jacobson requested review from Landanjs and dblalock November 21, 2022 08:22

A-Jacobson added 3 commits November 21, 2022 00:26

type in readme again

dc288e8

swap imagenet assets

1c53fef

update made readme

06a6d1d

Landanjs reviewed Nov 21, 2022

View reviewed changes

A-Jacobson added 5 commits November 21, 2022 15:18

cleanup unused code, break out transforms

0bf90eb

update readme, fix test params

e821463

remove streaming import

38d00b8

proper omegaconf type

1ed495b

fix path args

eacd882

dblalock approved these changes Nov 22, 2022

View reviewed changes

A-Jacobson and others added 6 commits November 22, 2022 17:53

add smoketests

9746fed

no sampler for streaming dataset

4773fcb

update mcloud yaml

c35af5c

fix yaml comment

0c3e51f

Update README.md

7c8734a

Co-authored-by: dblalock <dwb4ke@virginia.edu>

Apply suggestions from code review

88494a6

Co-authored-by: dblalock <dwb4ke@virginia.edu>

A-Jacobson merged commit 0cf9ef7 into mosaicml:main Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ade20k benchmark #27

Ade20k benchmark #27

Uh oh!

A-Jacobson commented Nov 21, 2022 •

edited

Loading

Uh oh!

Landanjs left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dblalock left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dblalock Nov 22, 2022

Uh oh!

A-Jacobson Nov 23, 2022

Uh oh!

Uh oh!

Uh oh!

		if sync_bn and world_size > 1:
		local_world_size = dist.get_local_world_size()

Ade20k benchmark #27

Ade20k benchmark #27

Uh oh!

Conversation

A-Jacobson commented Nov 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Landanjs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dblalock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dblalock Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

A-Jacobson Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

A-Jacobson commented Nov 21, 2022 •

edited

Loading

Landanjs left a comment •

edited

Loading