-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feed support for ParallelExecutor #9637
Conversation
@@ -135,7 +144,9 @@ def bottleneck_block(input, num_filters, stride, cardinality, reduction_ratio): | |||
return fluid.layers.elementwise_add(x=short, y=scale, act='relu') | |||
|
|||
|
|||
def SE_ResNeXt152Small(batch_size=2): | |||
def SE_ResNeXt152Small(batch_size=2, use_feed=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SE_ResNeXt152
can be replaced with SE_ResNeXt50
, the example is here. SE_ResNeXt50
consumes smaller memory than SE_ResNeXt152
. And also be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not from this PR, let's leave this as a followup PR.
or numpy array. | ||
:return: fetched value list. | ||
""" | ||
feed_tensor_dict = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the type of feed_dict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
feed_tensor = feed_dict[feed_name] | ||
if not isinstance(feed_tensor, core.LoDTensor): | ||
feed_tensor = core.LoDTensor() | ||
feed_tensor.set(feed_dict[feed_name], self._act_places[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two ways that feeding data:
- All the tensors are transferred to GPU(0) and then transferring sub data to other GPUs by P2P.
- All the tensors are in CPU side and then transferring sub data to GPUs by CPU->GPU
Which is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can adopt the first, parallel_do also use this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe this is the current solution. set() place the tensor in GPU0, SplitLoDTensor transfer them to each GPUs.
I think is ok, how about you @qingqing01 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
TODO: Need to work with reader when both reader and
feed are enabled. Normally feed replaces reader input
or var produced from upstream ops.