- Convert FfDL Model defition to Watson Studio Deep Learning Service definition
- Train and Serve using Watson Studio Deep Learning Service
Since Watson Studio Deep Learning and FfDL use different model definition file i.e. manifest.yml to define their training jobs, please use this simple script to help you convert between the two different version of the manifest.yml. The convert-to-WML.py and convert-to-FfDL.py are the conversion scripts for converting your FfDL training job's manifest.yml to Watson Studio Deep Learning format and vice versa.
-
Clone and go into this directory
git clone https://github.com/IBM/FfDL cd FfDL/etc/converter
-
Since Watson Machine Learning only takes Cloud Object Storage, if you are using a Local S3 Object Storage on your cluster, please refer to the user-guide Cloud Object Store to provision your cloud object storage, upload data to cloud object storage, and configure your FfDL's
manifest.yml
to use the cloud object storage endpoint and buckets. -
Use the following commands to install the necessary Python packages and run the Python Job to build your custom Watson Studio Deep Learning/FfDL
manifest.yml
.
-
<inputfile>:
The manifest file you want to convert. -
<outputfile>:
The filename for the converted manifest file. Default ismanifest-WML.yaml
/manifest-FfDL.yaml
. -
<samplefile>:
The sample manifest format file with all the default values. Default issample-WML.yaml
/sample-FfDL.yaml
.pip install -r requirement.txt # Convert FfDL manifest.yml to Watson Studio Deep Learning format python convert-to-WML.py -i <inputfile> -o <outputfile> -s <samplefile> # Convert Watson Studio Deep Learning manifest.yml to FfDL format python convert-to-FfDL.py -i <inputfile> -o <outputfile> -s <samplefile>
Now, a converted file should be created with all the information in your original manifest file.
-
Copy the new YAML file and use it for your FfDL/Watson Studio Deep Learning training job.
-
Note that all the T-shirt size in Watson Studio Deep Learning requires GPU, so that will be the default conversion. If you only want to run on CPU, please modify the
gpus
section to 0 along withcpus
andmemory
based on your need. Then, change the framework version with the one enabled in CPU. You can find the list of CPU framework version at user-guide.md. Below is the T-shirt size table between Watson Studio Deep Learning and FfDL.
T-shirt Tiers | GPUs | RAM (GB) | CPUs |
---|---|---|---|
k80 | 1 | 24 | 4 |
k80x2 | 2 | 48 | 8 |
k80x4 | 4 | 96 | 16 |
p100 | 1 | 24 | 8 |
p100x2 | 2 | 48 | 16 |
v100 | 1 | 24 | 26 |
v100x2 | 2 | 48 | 52 |
Please follow the Train and Serve using Watson Studio Deep Learning Service instructions.
-
If you are converting Watson Studio Deep Learning yml to FfDL format, both
training_data_reference
andtraining_results_reference
need to be in the same object storage (could be different bucket) because FfDL only takes one object storage connection. -
In Watson Studio Deep Learning, TensorFlow version is only available up to 1.5
-
Caffe2 is not available yet in Watson Studio Deep Learning. Thus, the conversion script won't take any caffe2 input.
-
The conversion script won't take
small
,medium
, andlarge
T-shirt size because they will be deprecated soon.
The example FfDL manifest.yml is the sample-FfDL.yaml. The description for each field is available at the user-guide.md.
The example Watson Studio Deep Learning manifest.yml is the sample-WML.yaml. The description for each field is available at the model definition guide at Watson Studio.