Apache Submarine (Submarine for short) is an End-to-End Machine Learning Platform to allow data scientists to create end-to-end machine learning workflows. On Submarine, data scientists can finish each stage in the ML model lifecycle, including data exploration, data pipeline creation, model training, serving, and monitoring.
Some open-source and commercial projects are trying to build an end-to-end ML platform. What's the vision of Submarine?
- Many platforms lack easy-to-use user interfaces (API, SDK, and IDE, etc.)
- In the same company, data scientists in different teams usually spend much time on developments of existing feature sets and models.
- Data scientists put emphasis on domain-specific tasks (e.g. Click-Through-Rate), but they need to implement their models from scratch with SDKs provided by existing platforms.
- Many platforms lack a unified workbench to manage each component in the ML lifecycle.
Theodore Levitt once said:
“People don’t want to buy a quarter-inch drill. They want a quarter-inch hole.”
- Run/Track distributed training
experiment
on prem or cloud via easy-to-use UI/API/SDK. - Easy for data scientists to manage versions of
experiment
and dependencies ofenvironment
. - Support popular machine learning frameworks, including TensorFlow, PyTorch, Horovod, and MXNet
- Provide pre-defined template for data scientists to implement domain-specific tasks easily (e.g. using DeepFM template to build a CTR prediction model)
- Support many compute resources (e.g. CPU and GPU, etc.)
- Support Kubernetes and YARN
- Pipeline is also on the backlog, we will look into pipeline for training in the future.
- Submarine aims to provide a notebook service (e.g. Jupyter notebook) which allows users to manage notebook instances running on the cluster.
- Model management for model-serving/versioning/monitoring is on the roadmap.
As mentioned above, Submarine attempts to provide Data-Scientist-friendly UI to make data scientists have a good user experience. Here're some examples.
# New a submarine client of the submarine server
submarine_client = submarine.ExperimentClient(host='http://localhost:8080')
# The experiment's environment, could be Docker image or Conda environment based
environment = EnvironmentSpec(image='apache/submarine:tf-dist-mnist-test-1.0')
# Specify the experiment's name, framework it's using, namespace it will run in,
# the entry point. It can also accept environment variables. etc.
# For PyTorch job, the framework should be 'Pytorch'.
experiment_meta = ExperimentMeta(name='mnist-dist',
namespace='default',
framework='Tensorflow',
cmd='python /var/tf_dist_mnist/dist_mnist.py --train_steps=100')
# 1 PS task of 2 cpu, 1GB
ps_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M',
replicas=1)
# 1 Worker task
worker_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M',
replicas=1)
# Wrap up the meta, environment and task specs into an experiment.
# For PyTorch job, the specs would be "Master" and "Worker".
experiment_spec = ExperimentSpec(meta=experiment_meta,
environment=environment,
spec={'Ps':ps_spec, 'Worker': worker_spec})
# Submit the experiment to submarine server
experiment = submarine_client.create_experiment(experiment_spec=experiment_spec)
# Get the experiment ID
id = experiment['experimentId']
submarine_client.get_experiment(id)
submarine_client.wait_for_finish(id)
submarine_client.get_log(id)
submarine_client.list_experiments(status='running')
For a quick-start, see Submarine On K8s
(Available on 0.7.0-SNAPSHOT, see Roadmap)
If you want to know more about Submarine's architecture, components, requirements and design doc, they can be found on Architecture-and-requirement
Detailed design documentation, implementation notes can be found at: Implementation notes
Read the Apache Submarine Community Guide
How to contribute Contributing Guide
Issue Tracking: https://issues.apache.org/jira/projects/SUBMARINE
What to know more about what's coming for Submarine? Please check the roadmap out: https://cwiki.apache.org/confluence/display/SUBMARINE/Roadmap
The Apache Submarine project is licensed under the Apache 2.0 License. See the LICENSE file for details.