Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feed support for ParallelExecutor #9637

Merged
merged 3 commits into from
Apr 4, 2018
Merged

Conversation

panyx0718
Copy link
Contributor

TODO: Need to work with reader when both reader and
feed are enabled. Normally feed replaces reader input
or var produced from upstream ops.

@@ -135,7 +144,9 @@ def bottleneck_block(input, num_filters, stride, cardinality, reduction_ratio):
return fluid.layers.elementwise_add(x=short, y=scale, act='relu')


def SE_ResNeXt152Small(batch_size=2):
def SE_ResNeXt152Small(batch_size=2, use_feed=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SE_ResNeXt152 can be replaced with SE_ResNeXt50, the example is here. SE_ResNeXt50 consumes smaller memory than SE_ResNeXt152. And also be faster.

Copy link
Contributor Author

@panyx0718 panyx0718 Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not from this PR, let's leave this as a followup PR.

or numpy array.
:return: fetched value list.
"""
feed_tensor_dict = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the type of feed_dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

feed_tensor = feed_dict[feed_name]
if not isinstance(feed_tensor, core.LoDTensor):
feed_tensor = core.LoDTensor()
feed_tensor.set(feed_dict[feed_name], self._act_places[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways that feeding data:

  1. All the tensors are transferred to GPU(0) and then transferring sub data to other GPUs by P2P.
  2. All the tensors are in CPU side and then transferring sub data to GPUs by CPU->GPU
    Which is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can adopt the first, parallel_do also use this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe this is the current solution. set() place the tensor in GPU0, SplitLoDTensor transfer them to each GPUs.

@chengduoZH
Copy link
Contributor

I think is ok, how about you @qingqing01 ?

chengduoZH
chengduoZH previously approved these changes Apr 4, 2018
Copy link
Contributor

@chengduoZH chengduoZH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@panyx0718 panyx0718 merged commit 043c230 into PaddlePaddle:develop Apr 4, 2018
@chengduoZH chengduoZH added the parallel_exe parallel executor label Apr 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallel_exe parallel executor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants