-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: TensorFlow Official Model Garden Redesign #130
Changes from 1 commit
cf074ba
92c5f51
8bd74d8
6af750e
45b4405
0383446
d4d499e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,207 @@ | ||||||
# TensorFlow Official Model Garden Redesign | ||||||
|
||||||
| Status | Proposed | | ||||||
:-------------- |:---------------------------------------------------- | | ||||||
| **Author(s)** | Jing Li (jingli@google.com), Hongkun Yu (hongkuny@google.com), Xiaodan Song (xiaodansong@google.com) | | ||||||
| **Sponsor** | Edd Wilder-James (ewj@google.com) | | ||||||
| **Updated** | 2019-08-02 | | ||||||
|
||||||
## Objective | ||||||
|
||||||
This document presents a proposal to redesign TensorFlow official model garden. | ||||||
We aim to provide a central and reliable place to contain popular examples, | ||||||
state-of-the-art models and tutorials to demonstrate the best practice in TF2.0 | ||||||
and illustrate real-world use cases. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
The current [TF official model garden](https://github.com/tensorflow/models/tree/master/official) | ||||||
mainly has ad hoc support. Example models are implemented using mixed TensorFlow | ||||||
APIs in different coding styles and some of them have convergence and/or | ||||||
performance regression. With TensorFlow 2.0 launch, there’s a great desire to | ||||||
provide tensorflow users a clear and central place to showcase reliable TF2.0 | ||||||
models with the best practices to follow. | ||||||
|
||||||
We want to take this opportunity to substantially improve the state of the | ||||||
official model garden, and provide seamlessly end-to-end training and inference | ||||||
user experience on a wide range of accelerators and mobile device chips. We hope | ||||||
to encourage community to contribute innovations and improve TensorFlow | ||||||
efficiency and usability. | ||||||
|
||||||
## User Benefit | ||||||
|
||||||
We aim to provide the best modeling experience via this revamp effort: | ||||||
|
||||||
* Usability and reliability | ||||||
* keep official models well-maintained and tested for both performance and | ||||||
convergence. | ||||||
* provide accessible model distribution via [TensorFlow Hub](https://www.tensorflow.org/hub) and share state-of-the-art research accomplishments. | ||||||
* make training on both GPU and TPU an easy switch. | ||||||
* provide reusable components for research and production. | ||||||
* End-to-end solutions | ||||||
* provide seamless end-to-end training and inference solutions, where inference covers serving on TPU, GPU, mobile and edge devices. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this include Google Coral devices? |
||||||
* provide hyper parameter sets to tune models for various resource constraints. | ||||||
* provide solutions with hyper parameters to scale model training to TPU pods or multi-worker GPUs. | ||||||
* provide variants derived from standard models to tackle various practical tasks. | ||||||
|
||||||
## Design Proposal | ||||||
|
||||||
### Official model directory reorgnization | ||||||
|
||||||
We are going to reorganize the official model directory to provide: | ||||||
|
||||||
* common libraries, mainly two types: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moving common code into a central location is always a good design practice, but let's be careful not to create another SLIM here. There are still models (Deeplab anyone) that are chained to SLIM. Remember, part of what would be valuable here is showing good examples on how to write a model, not necessarily good examples on how to write a model garden with intertwined and complicated dependencies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree on the last part.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same concern. Software engineering design may not be applied to deep learning model garden these days. Personally, a model garden is simply somewhere user can obtain the whole runnable model from a very sub directory. Sharing common library, and any similar things, sounds like a SDK built on top of TF to me. Maybe we can have TF SDK, and then the garden. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for raising this concern! We definitely need a good balance between implementation duplication and complicated dependencies. The current design is to split the modeling process to two stages: 1)common networks (in modeling directory), such as resnet, transformer, so that these networks can be reused in specific model/task, e.g. resnet50, mask r-cnn, sequence model; 2) specific model/task along with public dataset (e.g. in NLP and vision directory), models will be defined in its own sub directory, data preprocessing and eval and other utils for the same type of tasks/datasets can be shared. If you have better suggestions, we are definitely open to them. Thanks again! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How will these be different from https://www.tensorflow.org/api_docs/python/tf/keras/applications ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, Keras applications are going on an external repo x Keras migration RFC at #202. |
||||||
* Common training util library in TF2.0, model configuration and | ||||||
hyperparameter definition in a consistent style. | ||||||
* Model category related common library, e.g. primitives as basic building | ||||||
block for NLP models. We will follow the fundamental design of Keras | ||||||
layer/network/model to define and utilize model building blocks. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: Also note that whenever possible, we should prefer the pattern of Functional model + custom layers, rather than subclassed models (which have less functionality). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the suggestions! We do plan to follow the pattern of functional model + custom layers. |
||||||
* popular state-of-the-art (SOTA) models for end users as a product. | ||||||
* reference models for performance benchmark testing. | ||||||
* For models provided as SOTA models, we will share the network and | ||||||
modeling code, but have separate *main* modules. The main | ||||||
module for benchmark testing will have addtional flags and setups for | ||||||
performance testing. | ||||||
|
||||||
The following table shows the detailed view of proposed model directory | ||||||
structure. The SOTA model list will be updated to cover more categories. | ||||||
|
||||||
| Directory | Subdirectories | | Explainations | | ||||||
:-------------- |:---------------------|:--|:------------------------------ | | ||||||
| modeling | | | Common modeling libraries | | ||||||
| | layers | | Common modules/layers, not built-in tensorflow layers yet.| | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Putting layers here is dangerous to me. By not built-in tensorflow layers yet, I assume they are layers introduced by newly published papers. If we put the layers here, and the layers were included in tensorflow later, will we rewrite the related code in this model garden? If not, this will be another slim/contrib.layers of TF 1.x. Maybe to push the implementation in TF hard? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how are these layers different from tf-addons There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With this, we are going in the direction of TF1.x again. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rachellj218 , I believe we discussed having a process to ensure any broadly useful layers/etc make it to Addons and/or tf.text. We should update the doc to reflect that, as it seems to be a common concern-- maybe a section describing what you would like the evaluation/graduation process to be? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for raising this! As karmel@ mentioned, we do have the plan to graduate the common layers to tensorflow/addons. I clarified in the RFC. Thanks! I removed optimizers/ subdirectory. I agree it should be good to add to Addons directly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your clarification! In the latest RFC:
It seems to me that graduation will be another SLIM. The graduation will create different versions of one network which are based on different API, say a network uses 3 layers A, B and C firstly introduced in model garden, these 3 layers move to TF Addons one by one, thus we have four versions of network code. That is pretty confusing. And, as the interfaces of these layers are probably going to be different, it won't be as easy as re-write one line. Eventually, the effort to maintain these code will be huge. What about implement the layers directly in TF Addons, which I think is much more flexible than the TF repo? Thanks, hoping that my concern won't be too annoying... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with this concern. IMO it's less of a headache if they are moved from the temporary folder to Addons prior to a model-garden pip release. We can promise a quick review and patch release for official model additions. To ensure this we could have a model-garden team member on addons team. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need a consolidation stage for modeling components. This is not to develop infra but write models. We will guarantee layers moved to tf-addon must be removed in model garden. It is also up to how do you define "layers" as some of them are not common components as the ones in tf-addon now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1, good idea. We can work in close coordination. Maybe model garden can put its private layers in each model's module. And when you find that some layers need be shared by two models or more, I believe it's a good time to move those layers to tf.addons, rather than modeling.layers. What do you think? |
||||||
| | networks | | Well-known networks built on top of layers, e.g. transformer | | ||||||
| | optimziers | | New or customized optimizers | | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to layers, I think we are creating TF dialect here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same concern, addons has additional optimizers that can be graduated to core. Any optimizer/layer that isn't useful as a supplementary package can just be committed to the model garden repository IMO. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed optimizers subdirectory. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same with optimizers https://github.com/tensorflow/addons/tree/master/tensorflow_addons/optimizers There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! We do have the plan to graduate the common layers to tensorflow/addons. I clarified in the RFC. Thanks! optimizers/ subdirectory was removed from RFC. I agree it should be good to add to Addons directly. |
||||||
| | training | | Training utils, e.g. example custom training loop | | ||||||
| utils | | | Miscellaneous Utilities | | ||||||
| | hyperparameters | | Common flags and model parameters. | | ||||||
| | ... | | | | ||||||
| benchmarks | | | benchmark testing and reference models to validate tensorflow | | | ||||||
| | utils | | | | ||||||
| | examples | | reference models for testing/validating end-to-end tensforflow | | ||||||
| | | Resnet | | | ||||||
| | | BERT | | | ||||||
| | | Transformer | | | ||||||
| | | NCF | | | ||||||
| nlp | | | models/tasks for Natural Language Processing | | ||||||
| | utils | | NLP specific utils, e.g. input dataset | | ||||||
| | BERT | | NLP specific utils, e.g. input dataset | | ||||||
| | | BERT core modeling | | | ||||||
| | | tasks | specific tasks on open public datasets, e.g Squad, MNLI | | ||||||
| | XLNET | | | | ||||||
| | GPT | | | | ||||||
| | Transformer | | | | ||||||
| | GNMT | | | | ||||||
| | ... | | | | ||||||
| vision | | | models/tasks for Computer Vision | | ||||||
| | image_classification | | | | ||||||
| | | resnet | | | ||||||
| | | EfficientNet | | | ||||||
| | | MnasNet | | | ||||||
| | | ... | | | ||||||
| | object_detection | | | | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unet is mainly used for segmentation. So instead of
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Updated to reflect the suggestion. We are still debating the best structure for vision models, this may subject to changes in near future. |
||||||
| | | RetinaNet | | | ||||||
| | | Mask-RCNN | | | ||||||
| | | UNet | | | ||||||
| | | ShapeMask | | | ||||||
| | | ... | | | ||||||
| recommendation| | | | | ||||||
| | NCF | | | | ||||||
| staging | | | accepting community contributions | | ||||||
| archive | | | deprecated models, not officially supported | | ||||||
| r1 | | | tf1.x models and utils | | ||||||
| | utils | | | | ||||||
| | resnet50 | | | | ||||||
| | transformer | | | | ||||||
| | ncf | | | | ||||||
| | wide_deep | | | | ||||||
| | boosted_trees | | | | ||||||
|
||||||
### Pretrained model repository | ||||||
|
||||||
We are going to provide the pretrained models for research exploration and | ||||||
real-world application development. The plan is to integrate with [TensorFlow Hub](https://www.tensorflow.org/hub), | ||||||
where users can access the Hub modules and SavedModel for pretrained checkpoints and links to the code in the model | ||||||
garden. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please avoid use of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your suggestion! We plan to follow the practice in cloud tpu model repo, for example, https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mnasnet_main.py#L674. A default parameter dictionary will be provided, but users can overwrite the default value or extend to new key/value with string, dict or yaml. |
||||||
|
||||||
### Convergence and Performance Testing | ||||||
|
||||||
We have a benchmark testing framework to execute continuous performance and | ||||||
accuracy tests for TensorFlow on different types of accelerators. All official | ||||||
TF2.0 models are required to provide accuracy tests and these tests will be | ||||||
automatically expanded to performance tests for continuous regression testing | ||||||
and monitoring. | ||||||
|
||||||
## Model Garden Sustainability | ||||||
|
||||||
### Model Launch Criteria | ||||||
To ensure that official models are well-maintained and tested, we are going to enforce the following criteria for launching a new model in the official model garden, except for staging folder: | ||||||
|
||||||
* Follow the best practice guideline for each model category. | ||||||
* Unit tests to verify the basics of the model. | ||||||
* Integrate the model to benchmark testing to ensure model’s accuracy should be on par with the original paper / SOTA results. | ||||||
* README with commands and procedures to reproduce the SOTA results, including: | ||||||
* Input data generation if necessary | ||||||
* Model execution, including all hyperparameters. | ||||||
|
||||||
### Community contribution and staging | ||||||
|
||||||
Due to fast ML development, we can’t possibly support all best-in-class models | ||||||
up to date on our own. We highly encourage users to contribute to the official | ||||||
model garden. After model garden refactoring (Phase 1), we plan to provide | ||||||
a full list of wanted models to tensorflow community and encourage tensorflow | ||||||
users to claim and contribute the models to the model garden. | ||||||
|
||||||
We have different requirements from unifying interface, supporting all the chips | ||||||
and platforms and enabling benchmarks for reference models. Thus, we could have | ||||||
different stages of models. As we may have immediate needs to add some quick | ||||||
models for benchmark and debugging, we will provide a staging folder to host | ||||||
some drafts of SOTA or popular models. Once the staging models can converge and | ||||||
support major functionalities of standard official models, we can judge whether | ||||||
they meet the launch standard and migrate to official models or migrate them to | ||||||
benchmark references. | ||||||
|
||||||
### Maintenance and Deprecation | ||||||
|
||||||
Given the nature of this repository, old models may become less and less | ||||||
useful to the community as time goes on. In order to keep the repository | ||||||
sustainable, we will be performing bi-annual reviews of our models to ensure | ||||||
everything still belongs to the repo. For models to be retired, the current plan | ||||||
is to move them to the archive directory and these models won't run regression | ||||||
tests to ensure the quality and convergence. | ||||||
|
||||||
The following details the policy for models in mature and staging phases: | ||||||
|
||||||
* Models graduated from staging subdirectory | ||||||
|
||||||
The models will be maintained by the model garden team. After we start to | ||||||
accept community contributions, we will put the contributors as model owners. | ||||||
|
||||||
These models will have continuous convergence and performance testing to | ||||||
make sure no regression. In general, we won’t deprecate these models unless: | ||||||
* the model isn’t compatible with the TF APIs any more and have to be replaced by a new version | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this imply that the model will only target latest version of TF? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We plan to do model garden release for major TF release, e.g. TF 2.1, 2.2. Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool! Thanks for your reply. |
||||||
* a strictly better model shows up and the old model isn't needed by the community/market. | ||||||
|
||||||
* Models in staging: | ||||||
The model garden team will do quarterly review to check the status with the | ||||||
model contributors, such as: | ||||||
* model convergence | ||||||
* unit tests | ||||||
* convergence tests | ||||||
* coding style meets the TF2.0 best practice. | ||||||
If there’s no further commitment to improve the status in next 90 days, we | ||||||
will mark the model as deprecated, which is subject to be deleted. | ||||||
|
||||||
### Official Model Releases | ||||||
We will do release for the model garden starting from TF 2.0. Unit tests and | ||||||
regression tests need to pass against the TF release. Deprecated models will be | ||||||
removed from the release branch. | ||||||
|
||||||
We will also create pip package per release version. | ||||||
|
||||||
## Milestones | ||||||
|
||||||
| Phases | Milestones | Notes | | ||||||
|:-------- |:-----------------| :----------------------| | ||||||
| Phase_1 | 1. Finished directory reorganization. 2. Add common modeling library. 3. Have 2-3 SOTA models for both NLP and Vision. | Not accepting community contributions during refactorization.| | ||||||
| Phase_2 | Expand repository to cover more model types| Will accept community contributions on the solicited model list.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the new model garden still lie in
https://github.com/tensorflow/models/tree/master/official
or as a "root" repo? Putting such important resources in a sub directory may make (new) users confused I guess.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this same question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current plan is still to keep the model garden lie in the current directory. But we are going to work with TF Hub to provide a unified UI to provide both pretrained models and links to model codes, hopefully it will make it easier for users to find. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what others' thoughts, but to me, model garden directory is
https://github.com/tensorflow/models/tree/master
, while feeling hard to understand what theofficial
sub directory means - because in current directory hierarchy, the MobileNets are published inresearch/slim
where I also take them as official...Anyway, UI and links will be great! Thank you for reply.