Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: TensorFlow Official Model Garden Redesign #130

Merged
merged 7 commits into from
Mar 6, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 207 additions & 0 deletions rfcs/20190802-model-garden-redesign.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# TensorFlow Official Model Garden Redesign

| Status | Proposed |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Jing Li (jingli@google.com), Hongkun Yu (hongkuny@google.com), Xiaodan Song (xiaodansong@google.com) |
| **Sponsor** | Edd Wilder-James (ewj@google.com) |
| **Updated** | 2019-08-02 |

## Objective

This document presents a proposal to redesign TensorFlow official model garden.
We aim to provide a central and reliable place to contain popular examples,
state-of-the-art models and tutorials to demonstrate the best practice in TF2.0
and illustrate real-world use cases.

## Motivation

The current [TF official model garden](https://github.com/tensorflow/models/tree/master/official)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the new model garden still lie in https://github.com/tensorflow/models/tree/master/official or as a "root" repo? Putting such important resources in a sub directory may make (new) users confused I guess.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this same question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current plan is still to keep the model garden lie in the current directory. But we are going to work with TF Hub to provide a unified UI to provide both pretrained models and links to model codes, hopefully it will make it easier for users to find. Thanks!

Copy link

@zhenhuaw-me zhenhuaw-me Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what others' thoughts, but to me, model garden directory is https://github.com/tensorflow/models/tree/master, while feeling hard to understand what the official sub directory means - because in current directory hierarchy, the MobileNets are published inresearch/slim where I also take them as official...

Anyway, UI and links will be great! Thank you for reply.

mainly has ad hoc support. Example models are implemented using mixed TensorFlow
APIs in different coding styles and some of them have convergence and/or
performance regression. With TensorFlow 2.0 launch, there’s a great desire to
provide tensorflow users a clear and central place to showcase reliable TF2.0
models with the best practices to follow.

We want to take this opportunity to substantially improve the state of the
official model garden, and provide seamlessly end-to-end training and inference
user experience on a wide range of accelerators and mobile device chips. We hope
to encourage community to contribute innovations and improve TensorFlow
efficiency and usability.

## User Benefit

We aim to provide the best modeling experience via this revamp effort:

* Usability and reliability
* keep official models well-maintained and tested for both performance and
convergence.
* provide accessible model distribution via [TensorFlow Hub](https://www.tensorflow.org/hub) and share state-of-the-art research accomplishments.
* make training on both GPU and TPU an easy switch.
* provide reusable components for research and production.
* End-to-end solutions
* provide seamless end-to-end training and inference solutions, where inference covers serving on TPU, GPU, mobile and edge devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this include Google Coral devices?

* provide hyper parameter sets to tune models for various resource constraints.
* provide solutions with hyper parameters to scale model training to TPU pods or multi-worker GPUs.
* provide variants derived from standard models to tackle various practical tasks.

## Design Proposal

### Official model directory reorgnization

We are going to reorganize the official model directory to provide:

* common libraries, mainly two types:
Copy link

@jayfurmanek jayfurmanek Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving common code into a central location is always a good design practice, but let's be careful not to create another SLIM here. There are still models (Deeplab anyone) that are chained to SLIM.

Remember, part of what would be valuable here is showing good examples on how to write a model, not necessarily good examples on how to write a model garden with intertwined and complicated dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the last part. models/research is a typical example of this. The whole garden there is so complex and huge that it is nowhere near usable for a project until unless you:

  • Just want the model off the shelf without any modification
  • You know in an out of each and every part of it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern. Software engineering design may not be applied to deep learning model garden these days.

Personally, a model garden is simply somewhere user can obtain the whole runnable model from a very sub directory. Sharing common library, and any similar things, sounds like a SDK built on top of TF to me. Maybe we can have TF SDK, and then the garden.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for raising this concern! We definitely need a good balance between implementation duplication and complicated dependencies. The current design is to split the modeling process to two stages: 1)common networks (in modeling directory), such as resnet, transformer, so that these networks can be reused in specific model/task, e.g. resnet50, mask r-cnn, sequence model; 2) specific model/task along with public dataset (e.g. in NLP and vision directory), models will be defined in its own sub directory, data preprocessing and eval and other utils for the same type of tasks/datasets can be shared.

If you have better suggestions, we are definitely open to them. Thanks again!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design is to split the modeling process to two stages: 1)common networks (in modeling directory), such as resnet, transformer, so that these networks can be reused in specific model/task, e.g. resnet50, mask r-cnn, sequence model

How will these be different from https://www.tensorflow.org/api_docs/python/tf/keras/applications ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Keras applications are going on an external repo x Keras migration RFC at #202.
Right?

* Common training util library in TF2.0, model configuration and
hyperparameter definition in a consistent style.
* Model category related common library, e.g. primitives as basic building
block for NLP models. We will follow the fundamental design of Keras
layer/network/model to define and utilize model building blocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Network is a private class, so objects will need to subclass either Layer or Model.

Also note that whenever possible, we should prefer the pattern of Functional model + custom layers, rather than subclassed models (which have less functionality).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! We do plan to follow the pattern of functional model + custom layers.

* popular state-of-the-art (SOTA) models for end users as a product.
* reference models for performance benchmark testing.
* For models provided as SOTA models, we will share the network and
modeling code, but have separate *main* modules. The main
module for benchmark testing will have addtional flags and setups for
performance testing.

The following table shows the detailed view of proposed model directory
structure. The SOTA model list will be updated to cover more categories.

| Directory | Subdirectories | | Explainations |
:-------------- |:---------------------|:--|:------------------------------ |
| modeling | | | Common modeling libraries |
| | layers | | Common modules/layers, not built-in tensorflow layers yet.|
Copy link

@zhenhuaw-me zhenhuaw-me Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting layers here is dangerous to me. By not built-in tensorflow layers yet, I assume they are layers introduced by newly published papers. If we put the layers here, and the layers were included in tensorflow later, will we rewrite the related code in this model garden? If not, this will be another slim/contrib.layers of TF 1.x. Maybe to push the implementation in TF hard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are these layers different from tf-addons

Copy link
Member

@seanpmorgan seanpmorgan Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splintering the TF ecosystem with multiple repos that contain new layers/optimizers seems counter-productive. Addons is already going to manage the burden of graduating layers into core so why have 2 repos for the same thing.

cc @karmel @facaiy for visibility

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, we are going in the direction of TF1.x again.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rachellj218 , I believe we discussed having a process to ensure any broadly useful layers/etc make it to Addons and/or tf.text. We should update the doc to reflect that, as it seems to be a common concern-- maybe a section describing what you would like the evaluation/graduation process to be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this! As karmel@ mentioned, we do have the plan to graduate the common layers to tensorflow/addons. I clarified in the RFC. Thanks!

I removed optimizers/ subdirectory. I agree it should be good to add to Addons directly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your clarification! In the latest RFC:

Temporary folder to hold common modules/layers, not built-in tensorflow layers yet during refactoring. Broadly used layers will be graduated to tensorflow/addons and/or tf.text.

It seems to me that graduation will be another SLIM. The graduation will create different versions of one network which are based on different API, say a network uses 3 layers A, B and C firstly introduced in model garden, these 3 layers move to TF Addons one by one, thus we have four versions of network code. That is pretty confusing. And, as the interfaces of these layers are probably going to be different, it won't be as easy as re-write one line. Eventually, the effort to maintain these code will be huge.

What about implement the layers directly in TF Addons, which I think is much more flexible than the TF repo?

Thanks, hoping that my concern won't be too annoying...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this concern. IMO it's less of a headache if they are moved from the temporary folder to Addons prior to a model-garden pip release. We can promise a quick review and patch release for official model additions. To ensure this we could have a model-garden team member on addons team.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a consolidation stage for modeling components. This is not to develop infra but write models. We will guarantee layers moved to tf-addon must be removed in model garden. It is also up to how do you define "layers" as some of them are not common components as the ones in tf-addon now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure this we could have a model-garden team member on addons team.

+1, good idea. We can work in close coordination.

Maybe model garden can put its private layers in each model's module. And when you find that some layers need be shared by two models or more, I believe it's a good time to move those layers to tf.addons, rather than modeling.layers. What do you think?

| | networks | | Well-known networks built on top of layers, e.g. transformer |
| | optimziers | | New or customized optimizers |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to layers, I think we are creating TF dialect here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern, addons has additional optimizers that can be graduated to core.

Any optimizer/layer that isn't useful as a supplementary package can just be committed to the model garden repository IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed optimizers subdirectory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| | optimziers | | New or customized optimizers |
| | optimizers | | New or customized optimizers |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We do have the plan to graduate the common layers to tensorflow/addons. I clarified in the RFC. Thanks!

optimizers/ subdirectory was removed from RFC. I agree it should be good to add to Addons directly.

| | training | | Training utils, e.g. example custom training loop |
| utils | | | Miscellaneous Utilities |
| | hyperparameters | | Common flags and model parameters. |
| | ... | | |
| benchmarks | | | benchmark testing and reference models to validate tensorflow | |
| | utils | | |
| | examples | | reference models for testing/validating end-to-end tensforflow |
| | | Resnet | |
| | | BERT | |
| | | Transformer | |
| | | NCF | |
| nlp | | | models/tasks for Natural Language Processing |
| | utils | | NLP specific utils, e.g. input dataset |
| | BERT | | NLP specific utils, e.g. input dataset |
| | | BERT core modeling | |
| | | tasks | specific tasks on open public datasets, e.g Squad, MNLI |
| | XLNET | | |
| | GPT | | |
| | Transformer | | |
| | GNMT | | |
| | ... | | |
| vision | | | models/tasks for Computer Vision |
| | image_classification | | |
| | | resnet | |
| | | EfficientNet | |
| | | MnasNet | |
| | | ... | |
| | object_detection | | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unet is mainly used for segmentation. So instead of object_detection, rather make two sub-directories object_detection and segmentation. segmentation will then contain:

  • DeepLab v3/v3+
  • Unet
  • FastSCNN
    ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated to reflect the suggestion. We are still debating the best structure for vision models, this may subject to changes in near future.

| | | RetinaNet | |
| | | Mask-RCNN | |
| | | UNet | |
| | | ShapeMask | |
| | | ... | |
| recommendation| | | |
| | NCF | | |
| staging | | | accepting community contributions |
| archive | | | deprecated models, not officially supported |
| r1 | | | tf1.x models and utils |
| | utils | | |
| | resnet50 | | |
| | transformer | | |
| | ncf | | |
| | wide_deep | | |
| | boosted_trees | | |

### Pretrained model repository

We are going to provide the pretrained models for research exploration and
real-world application development. The plan is to integrate with [TensorFlow Hub](https://www.tensorflow.org/hub),
where users can access the Hub modules and SavedModel for pretrained checkpoints and links to the code in the model
garden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid use of configs as much as possible. The code should be flexible enough to change the parameters in the code itself. Changing things though a config file makes sense for some people but not for all. It is easy to maintain the changes in the code but it is hard to track the changes in the config file and at the same time, it is hard to validate the changes as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion! We plan to follow the practice in cloud tpu model repo, for example, https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mnasnet_main.py#L674. A default parameter dictionary will be provided, but users can overwrite the default value or extend to new key/value with string, dict or yaml.


### Convergence and Performance Testing

We have a benchmark testing framework to execute continuous performance and
accuracy tests for TensorFlow on different types of accelerators. All official
TF2.0 models are required to provide accuracy tests and these tests will be
automatically expanded to performance tests for continuous regression testing
and monitoring.

## Model Garden Sustainability

### Model Launch Criteria
To ensure that official models are well-maintained and tested, we are going to enforce the following criteria for launching a new model in the official model garden, except for staging folder:

* Follow the best practice guideline for each model category.
* Unit tests to verify the basics of the model.
* Integrate the model to benchmark testing to ensure model’s accuracy should be on par with the original paper / SOTA results.
* README with commands and procedures to reproduce the SOTA results, including:
* Input data generation if necessary
* Model execution, including all hyperparameters.

### Community contribution and staging

Due to fast ML development, we can’t possibly support all best-in-class models
up to date on our own. We highly encourage users to contribute to the official
model garden. After model garden refactoring (Phase 1), we plan to provide
a full list of wanted models to tensorflow community and encourage tensorflow
users to claim and contribute the models to the model garden.

We have different requirements from unifying interface, supporting all the chips
and platforms and enabling benchmarks for reference models. Thus, we could have
different stages of models. As we may have immediate needs to add some quick
models for benchmark and debugging, we will provide a staging folder to host
some drafts of SOTA or popular models. Once the staging models can converge and
support major functionalities of standard official models, we can judge whether
they meet the launch standard and migrate to official models or migrate them to
benchmark references.

### Maintenance and Deprecation

Given the nature of this repository, old models may become less and less
useful to the community as time goes on. In order to keep the repository
sustainable, we will be performing bi-annual reviews of our models to ensure
everything still belongs to the repo. For models to be retired, the current plan
is to move them to the archive directory and these models won't run regression
tests to ensure the quality and convergence.

The following details the policy for models in mature and staging phases:

* Models graduated from staging subdirectory

The models will be maintained by the model garden team. After we start to
accept community contributions, we will put the contributors as model owners.

These models will have continuous convergence and performance testing to
make sure no regression. In general, we won’t deprecate these models unless:
* the model isn’t compatible with the TF APIs any more and have to be replaced by a new version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that the model will only target latest version of TF?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan to do model garden release for major TF release, e.g. TF 2.1, 2.2. Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks for your reply.

* a strictly better model shows up and the old model isn't needed by the community/market.

* Models in staging:
The model garden team will do quarterly review to check the status with the
model contributors, such as:
* model convergence
* unit tests
* convergence tests
* coding style meets the TF2.0 best practice.
If there’s no further commitment to improve the status in next 90 days, we
will mark the model as deprecated, which is subject to be deleted.

### Official Model Releases
We will do release for the model garden starting from TF 2.0. Unit tests and
regression tests need to pass against the TF release. Deprecated models will be
removed from the release branch.

We will also create pip package per release version.

## Milestones

| Phases | Milestones | Notes |
|:-------- |:-----------------| :----------------------|
| Phase_1 | 1. Finished directory reorganization. 2. Add common modeling library. 3. Have 2-3 SOTA models for both NLP and Vision. | Not accepting community contributions during refactorization.|
| Phase_2 | Expand repository to cover more model types| Will accept community contributions on the solicited model list.|