CIFAR benchmark #31

Landanjs · 2022-11-23T04:18:38Z

CIFAR benchmark.

I couldn't figure out how to do tests with synthetic local data since the torchvision CIFAR10 dataset is very particular. I was able to setup a local MDS dataset that can be used for testing. Although, I'm not sure if we should have streaming in this benchmark.

I copied as much as possible form composer. We didn't have previous recipes (I think?), so I added something random.

I didn't add the dataset check as in the other benchmarks since it is fairly easy to download the data using the torchvision dataset. I also didn't include multi-node information since I didn't think that made sense for CIFAR10.

WandB results: 93.25 baseline; 93.92 recipe (one seed each

Any ideas for a figure? I can do wandb plots, but need to figure out dark mode like Abhi 😅

A-Jacobson

I think it's fine to just have runnable code for CIFAR. The benefit of this benchmark (to me) is that it's quick and easy to run. I doubt people care much for specific recipes on cifar, it's so quick and easy you'd likely have a weird combination of methods overfitting the eval process anyway.

dblalock

LGTM; just some minor local stuff

cifar/README.md

cifar/model.py

cifar/yamls/resnet56.yaml

Co-authored-by: dblalock <dwb4ke@virginia.edu>

…benchmarks into landan/cifar_benchmark

* ResNet Benchmark (mosaicml#25) Updates the ResNet benchmark to no longer use yahp. Co-authored-by: Matthew <growlix@users.noreply.github.com> Co-authored-by: dblalock <dwb4ke@virginia.edu> * correcting license and headers (mosaicml#29) * Forgot to change branch in resnet benchmark... (mosaicml#30) Forgot to change branch... * Update LLM benchmark with eval, HF models, bugfixes (mosaicml#26) * Ade20k benchmark (mosaicml#27) * Vitaliy/compare hf mosaic (mosaicml#28) * compare mosaic GPT vs HF GPT2 * cleanup * updt abhi cmts Co-authored-by: Vitaliy Chiley <vitaliy@moasic.com> * Vitaliy/sync foundry tests (mosaicml#32) * porting foundry tests * rm unittest * xfail test when dataset is not set up Co-authored-by: Vitaliy Chiley <vitaliy@moasic.com> * CIFAR benchmark (mosaicml#31) * Add cifar benchmark Co-authored-by: dblalock <dwb4ke@virginia.edu> * Add codeowners for each existing benchmark (mosaicml#36) add codeowners for each existing benchmark * fix torch attn init (mosaicml#38) * add init for nn.MultiheadAttention * Bert pre-training and fine-tuning on GLUE (mosaicml#24) Adds benchmark examples for BERT pre-training and fine-tuning on GLUE, and features support for HF models as well as a Mosaic BERT which is implemented and introduced here. See README.md for a detailed description of these components. TODO in a future PR: - README: Add final speedup results - YAMLs: Add configuration files for experiments shown in results - Tests * Add precommit linting + get repo passing all checks (mosaicml#37) This adds the following pre-commit checks: yapf pyright pycln isort inserting our license header into every python file docformatter pydocstyle yamllint yamlfmt This is mostly a copy-pasted .pre-commit-config.yaml, pyproject.toml, and .yamllint.yaml from the streaming repo, along with the associated code autoformating. There was some manual intervention to fix license headers and occasional edge cases where the linters/formatters couldn't all agree (e.g., spacing before a function in another function without a docstring). I also just tightened up some of the docstrings when I was already making them satisfy the linters. * Update flash attn version (mosaicml#40) * Update flash attn version * Update llm/requirements.txt Co-authored-by: Abhi Venigalla <77638579+abhi-mosaic@users.noreply.github.com> Co-authored-by: Abhi Venigalla <77638579+abhi-mosaic@users.noreply.github.com> * Update URL to streaming docs * printing state * change batch number for fast forwarding * fast forward batches properly * Raise error if global batch size not divisible by world size (mosaicml#41) Adding in batchsize error, s.t. a user won't accidentally use an incorrect batchsize * fixed save_overwrite * hard peg version of mosaicml-streaming * set mosaicml-streaming constraint to <0.2.0 * fixed part of bad merge * mostly back to sophia's version * mcli-streaming<0.2.0 * fixed lambada task * attempt to bump streaming version * attempt at using StreamingDataset * used wrong StreamingDataset spec * one more misaligned field * passing shuffle seed to train dataloader * didn't upload changes to data_c4 * updated config Co-authored-by: Landan Seguin <landan@mosaicml.com> Co-authored-by: Matthew <growlix@users.noreply.github.com> Co-authored-by: dblalock <dwb4ke@virginia.edu> Co-authored-by: Vitaliy Chiley <vitaliy@mosaicml.com> Co-authored-by: Jeremy D <115047575+bmosaicml@users.noreply.github.com> Co-authored-by: Austin <A-Jacobson@users.noreply.github.com> Co-authored-by: Vitaliy Chiley <vitaliy@moasic.com> Co-authored-by: dblalock <davis@mosaicml.com> Co-authored-by: Alex Trott <alex@mosaicml.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: Abhi Venigalla <77638579+abhi-mosaic@users.noreply.github.com> Co-authored-by: Bandish Shah <bandish@mosaicml.com> Co-authored-by: bcui19 <bcui8377@gmail.com> Co-authored-by: Sophia Wisdom <sophiawisdom1999@gmail.com>

Landanjs added 20 commits November 22, 2022 15:32

Add cifar benchmark

98048db

Add initialization to main, update README

37e999d

Fix init dist and more readme

310741d

Update readme and fix bug in model

4048fd1

Remove barrier

4012012

Increase timeout

bfd3ed8

Increase timeout

3bcfe66

fix model check

da4bf25

Add tests and add local check

dab4a65

Remove tar oops...

86dd529

Remove random file

c55d1d4

Fix endlines and add custom collate

299b9aa

More licenses and endlines

be9049d

Add licenes to init

a3bd6ae

Add licenes and rm file / newlines

9fb46b0

No EMA?

c639d8c

Add image

74b2642

Update assets

af285b7

Remove assets

700780c

try assets again

9d718c4

A-Jacobson approved these changes Nov 24, 2022

View reviewed changes

dblalock approved these changes Nov 24, 2022

View reviewed changes

Landanjs and others added 4 commits November 29, 2022 13:10

Update to readmr

470f5d9

Co-authored-by: dblalock <dwb4ke@virginia.edu>

Minor tweaks

e201fe2

Co-authored-by: dblalock <dwb4ke@virginia.edu>

Address review

9b4ab49

Merge branch 'landan/cifar_benchmark' of https://github.com/mosaicml/…

337235e

…benchmarks into landan/cifar_benchmark

Landanjs merged commit ca6da3e into main Nov 29, 2022

Landanjs deleted the landan/cifar_benchmark branch November 29, 2022 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CIFAR benchmark #31

CIFAR benchmark #31

Uh oh!

Landanjs commented Nov 23, 2022 •

edited

Loading

Uh oh!

A-Jacobson left a comment

Uh oh!

dblalock left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CIFAR benchmark #31

CIFAR benchmark #31

Uh oh!

Conversation

Landanjs commented Nov 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

A-Jacobson left a comment

Choose a reason for hiding this comment

Uh oh!

dblalock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Landanjs commented Nov 23, 2022 •

edited

Loading