Separate data uploading from job submission for DS2 cloud training and add support for multiple shards uploading. #205

xinghai-sun · 2017-08-15T10:17:22Z

Resolve #205

Separate data uploading from training job submission for DS2 cloud training.
Add supports for multiple shards packing and uploading.
Update cloud/REAME.md

…d add support for multiple shards uploading.

wanghaoshuang · 2017-08-15T10:31:13Z

deep_speech_2/cloud/README.md


 ```
 {"audio_filepath": "/home/disk1/LibriSpeech/dev-clean/1272/128104/1272-128104-0000.flac", "duration": 5.855, "text
 ": "mister quilter is the ..."}
 {"audio_filepath": "/home/disk1/LibriSpeech/dev-clean/1272/128104/1272-128104-0001.flac", "duration": 4.815, "text
 ": "nor is mister ..."}
 ```
+- `OUT_MANIFESTS`: Paths (in local filesystem) to write the updated output manifest files to. Multiple paths can be concatenated with a whitespace delimeter. The values of `audio_filepath` in the output manifests are jjjjjkknew paths in PaddleCloud filesystem.


jjjjjkknew -> new ?

wanghaoshuang · 2017-08-15T10:34:51Z

deep_speech_2/cloud/README.md


-## Step-2  Configure computation resource
+You have to take this step only once, when it is your first time to do the cloud training. Later on, the data is persisitent on the cloud filesystem and is reusable for multple jobs.


multple -> multiple

wanghaoshuang · 2017-08-15T10:42:56Z

deep_speech_2/cloud/pcloud_submit.sh

-MEAN_STD_FILE="../mean_std.npz"
-# Configure output path in PaddleCloud filesystem
-CLOUD_DATA_DIR="/pfs/dlnel/home/sunxinghai@baidu.com/deepspeech2/data"
+TRAIN_MANIFEST="cloud/cloud.manifest.test"


'cloud.manifest.test' -> 'cloud.manifest.train'?

wanghaoshuang · 2017-08-15T10:53:59Z

deep_speech_2/cloud/upload_data.py

-    "--dev_manifest_path",
-    default="../datasets/manifest.dev",
+    "--in_manifest_paths",
+    default=["../datasets/manifest.test", "../datasets/manifest.dev"],


这里默认值是故意不设置为/manifest.train么？

改过来了，不是故意的，调试时临时设置忘记改了。

wanghaoshuang · 2017-08-15T11:09:01Z

deep_speech_2/cloud/upload_data.py

-    pcloud_cp(args.vocab_file, cloud_vocab_file)
-    pcloud_cp(args.mean_std_file, cloud_mean_file)
+    upload_data(args.in_manifest_paths, args.out_manifest_paths,
+                args.local_tmp_dir, args.cloud_data_dir, 10)


10 -> args.num_shards

xinghai-sun added 2 commits August 15, 2017 16:58

Seperate data uploading from job summission for DS2 cloud training an…

55e0a29

…d add support for multiple shards uploading.

Update README for DS2 cloud training.

4490258

xinghai-sun requested review from pkuyym and wanghaoshuang August 15, 2017 10:17

wanghaoshuang requested changes Aug 15, 2017

View reviewed changes

Update DS2 cloud training according to review comments.

88eabac

wanghaoshuang approved these changes Aug 15, 2017

View reviewed changes

xinghai-sun merged commit 69ebc58 into PaddlePaddle:develop Aug 15, 2017

xinghai-sun deleted the cloud_shards branch August 15, 2017 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate data uploading from job submission for DS2 cloud training and add support for multiple shards uploading. #205

Separate data uploading from job submission for DS2 cloud training and add support for multiple shards uploading. #205

xinghai-sun commented Aug 15, 2017 •

edited

Loading

wanghaoshuang Aug 15, 2017

xinghai-sun Aug 15, 2017

wanghaoshuang Aug 15, 2017

xinghai-sun Aug 15, 2017

wanghaoshuang Aug 15, 2017

xinghai-sun Aug 15, 2017

wanghaoshuang Aug 15, 2017

xinghai-sun Aug 15, 2017

wanghaoshuang Aug 15, 2017

xinghai-sun Aug 15, 2017


		## Step-2 Configure computation resource
		You have to take this step only once, when it is your first time to do the cloud training. Later on, the data is persisitent on the cloud filesystem and is reusable for multple jobs.

Separate data uploading from job submission for DS2 cloud training and add support for multiple shards uploading. #205

Separate data uploading from job submission for DS2 cloud training and add support for multiple shards uploading. #205

Conversation

xinghai-sun commented Aug 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinghai-sun commented Aug 15, 2017 •

edited

Loading