Fix local recordio reader #3479

helinwang · 2017-08-14T20:29:04Z

No description provided.

gongweibao

LGTM

typhoonzero · 2017-08-15T04:12:03Z

python/paddle/v2/reader/creator.py

+            path = paths
+        else:
+            path = ",".join(paths)
+        f = rec.reader(path)


When training on distributed environment, reader should fetch only a part of all the data.

@typhoonzero The files will be read file by file. If we need to fetch only part of the data, maybe we can split them to different files. cloud_reader is much better for the distributed training environment.

cloud_reader is using master to dispatch tasks, I think we still need a version with out master, or master can dispatch files rather than recordio. This is in case that there are still many cases using Mapreduce output directly as training inputs.

Anyway, will discuss in another issue.

I see, thanks. The recordio file will be read entirely to parse the index before reading the first item, so in that case we need to shard the files.

typhoonzero · 2017-08-15T04:12:32Z

python/paddle/v2/reader/creator.py

@@ -57,7 +57,7 @@ def reader():
    return reader


-def recordio_local(paths, buf_size=100):
+def recordio(paths, buf_size=100):


Maybe rename to recordio_reader to indicate it's a reader?

@typhoonzero It's used as paddle.v2.reader.creator.recordio, probably reader is indicated in the import path.

typhoonzero

LGTM++

helinwang requested a review from gongweibao August 14, 2017 20:29

fix local recordio reader

2da240c

helinwang force-pushed the recordio branch 2 times, most recently from 3914045 to 785db4e Compare August 14, 2017 22:20

Add recordio as paddle's dependency.

c3bda2a

helinwang force-pushed the recordio branch from 785db4e to c3bda2a Compare August 14, 2017 22:32

Merge branch 'develop' into recordio

b95668d

gongweibao approved these changes Aug 15, 2017

View reviewed changes

typhoonzero reviewed Aug 15, 2017

View reviewed changes

typhoonzero approved these changes Aug 15, 2017

View reviewed changes

typhoonzero mentioned this pull request Aug 15, 2017

Master support to dispatch plain file rather than only recordio, in case of using Mapreduce output directly as training input. #3493

Closed

helinwang merged commit 245f622 into PaddlePaddle:develop Aug 15, 2017

helinwang deleted the recordio branch August 15, 2017 05:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix local recordio reader #3479

Fix local recordio reader #3479

helinwang commented Aug 14, 2017

gongweibao left a comment

typhoonzero Aug 15, 2017

helinwang Aug 15, 2017

typhoonzero Aug 15, 2017

helinwang Aug 15, 2017

typhoonzero Aug 15, 2017

helinwang Aug 15, 2017

typhoonzero left a comment

Fix local recordio reader #3479

Fix local recordio reader #3479

Conversation

helinwang commented Aug 14, 2017

gongweibao left a comment

Choose a reason for hiding this comment

typhoonzero Aug 15, 2017

Choose a reason for hiding this comment

helinwang Aug 15, 2017

Choose a reason for hiding this comment

typhoonzero Aug 15, 2017

Choose a reason for hiding this comment

helinwang Aug 15, 2017

Choose a reason for hiding this comment

typhoonzero Aug 15, 2017

Choose a reason for hiding this comment

helinwang Aug 15, 2017

Choose a reason for hiding this comment

typhoonzero left a comment

Choose a reason for hiding this comment