SparseZoo v1.6.0
Model Additions: Generative AI
- CodeGen Mono 2B and 350M trained on BigQuery and ThePile datasets for base and one-shot pruned, quantized, and sparse quantized models (view)
- CodeGen Multi 2B and 350M trained on BigQuery and ThePile datasets for base and one-shot pruned, quantized, and sparse quantized modes (view)
- Llama-2 7B baseline and quantized models for pre-trained and chat datasets (view)
- Llama-2 7B dense, pruned, quantized, and sparse quantized models and recipes for the platypus instruction tuning dataset and gsm8k arithmetic reasoning dataset (view)
- MPT 7B baseline, quantized, and sparse quantized models for pre-trained and chat datasets (view)
- MPT 7B dense, pruned, quantized, and sparse quantized models and recipes for dolly instruction tuning dataset and gsm8k arithmetic reasoning dataset (view)
- OPT 1.3b, 2.7B, 6.7B, and 13B dense, pruned, quantized, and sparse quantized models and recipes for the OPT pre-trained dataset (view)
Model Additions: Computer Vision
- YOLOv8 n, s, m, l, and x dense pruned, quantized, and sparse quantized models and recipes for COCO and VOC datasets (view)
New Features:
-
Version support added:
-
The initial feature set for SparseZoo V3 web UI is now live, where the home page has been restructured to include and highlight generative AI models.
-
SparseZoo V2 model file structure and V2 stubs enabled, which expands the number of supported files and reduces the number of bytes that need to be downloaded for model checkpoints, folders, and files. It also simplifies the stubs used to access models in the SparseZoo. (Documentation: V2 file structure and stubs docs will be added in v1.7) (#286, #271, #355, #359, #354 , #361, #363, #368, #370, #373)
-
SparseZoo Analyze CLI and APIs added to enable simple functions for quickly checking general and sparsification info for params, operations, reads/writes, and overall model layouts. (#288, #344, #345)
-
RegistryMixin
class and patterns added, enabling a centralized and universal registry across Neural Magic's repos and products. (#365)
Model Changes: Computer Vision
- EfficientNet-B0 to B5, EfficientNet V2 S, M, and L have been updated with example transfer recipes for base and quantized versions from the ImageNet dataset. (view)
- MobileNet V1 models have been updated with corrected metrics and model card updates to include updated instructions for transfer and sparsifiation across dense, sparse, and quantized versions for the ImageNet dataset. (view)
Model Changes: Natural Language Processing
To address DeepSparse deployment pipelines failing due to the missing files, the following models have been updated to include new tokenizer files for the deployment directory across dense, sparse, and sparse quantized versions, with the targeted datasets:
- BERT-Base, BERT-Base cased: CoNLL-2003, MNLI, QQP, SQuAD, SST-2, Twitter Financial News, and general Wikipedia BookCorpus
- BERT-Large, DistilBert: CoNLL-2003, GoEmotions, MNLI, QQP, SQuAD, SST-2, Wikipedia BookCorpus
- BioBERT: BC2GM, BC5CDR chem, BC5CDR Disease, BioASQ, Chemprot, DDI, GAD, JNLPBA, NCBI Disease, PubMed, PubMedQA
- BioBERT cased: BC2GM, BC5CDR chem, BC5CDR Disease, BioASQ, Chemprot, GAD, JNLPBA, NCBI Disease, PubMed, PubMedQA
- MobileBERT: SQuAD
- oBERT base: CoNLL-2003, GoEmotions, IMDB, MNLI, Wikipedia pre-trained, QQP, SQuAD, SST-2
- oBERT large, medium, and small, oBERTa medium, and small: Wikipedia pre-trained, SQuAD
- oBERTa base: CoNLL-2003, GoEmotions, IMDB, MNLI, Wikipedia pre-trained, QQP, SQuAD, SQuAD v2, SST-2
- RoBERTa base, RoBERTa large: CoNLL-2003, IMDB, MNLI, QQP, SQuAD, SQuAD v2, SST-2
Product Changes:
- Extra information about the benchmarking device each benchmark was run on has been added to the Python interface for benchmarking results. (#294)
- README and documentation updated to include: Slack Community name change, Contact Us form introduction, Python version changes; corrections for YOLOv5 torchvision, Transformers, and SparseZoo broken links; and installation command. (#307)
- Improved support for large ONNX files to improve loading performance and limit memory performance issues, especially for LLMs. (#308, #320)
- SparseZoo model folders that are downloaded through the Python API will now be saved locally under their repo name instead of model id. This is to enable easier tracking of which models have been downloaded to a user's system. (#317, #369)
- Python 3.7 support is deprecated. (#348)
- Pydantic version pinned to <2.0 preventing potential issues with untested versions. (#339)
- File path endings are added to download logs, enabling more useful output information when downloading models. (#346, #360)
ONNX utility functions have been broken out into multiple files, enabling better structure for future enhancements. The namespace and imports all remain the same. (#353)
Resolved Issues:
- A test for checking throughput values no longer fails, resulting in successful test cases passing. (#306)
- Metric names were not matching due to different formatting, such as spaces and casing. For example, top1accuracy and Top 1 Accuracy will now match. (#310)
- Google Analytics errors were being shown to the user if the libraries were used too frequently on the same system. (#318, #322, #324, #327)
- In some cases, logging information was duplicated due to multiple streams being registered, such as when DeepSparse benchmarks were run. This is now fixed to ensure logs are no longer duplicated. (#330)
- Unit and integration tests now remove temporary test files and limit test file creation, which were not being properly deleted. (#329)
Known Issues:
- The compile time for dense LLMs can be very slow. Compile time to be addressed in forthcoming release.
- Docker images are not currently pushing. A resolution is forthcoming for functional Docker builds. [RESOLVED]