-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix decay bug #11520
Fix decay bug #11520
Conversation
# clone ops | ||
for op in origin_block.ops: | ||
self._clone_op(pserver_program, new_sub_block, op) | ||
# self._append_pserver_non_opt_ops(new_sub_block, op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
origin_block_desc = op.attr('sub_block') | ||
origin_block = self.origin_program.block(origin_block_desc.id) | ||
assert isinstance(origin_block, Block) | ||
print("origin_block's parent_idx: ", origin_block.parent_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line be cleared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
self._append_pserver_non_opt_ops(lr_decay_block, op, endpoint) | ||
self._append_pserver_non_opt_ops(lr_decay_block, op) | ||
# append sub blocks to pserver_program in lr_decay_op | ||
__clone_sub_block__(op, pserver_program) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function put into _append_pserver_non_opt_ops
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the lr_op is not the opt ops, so the function could be called here
varlist = [varlist] | ||
for var in varlist: | ||
if var not in program.global_block().vars: | ||
print("Find not in var:\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three lines should be cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I forgot to remove these
varlist = [varlist] | ||
for var in varlist: | ||
if var not in program.global_block().vars: | ||
print("Find not in var:\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear not needed lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have removed these lines in next commit
|
||
# clone ops | ||
for op in origin_block.ops: | ||
self._clone_op(pserver_program, new_sub_block, op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be program
instead of pserver_program
? Since program
passed in explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will change the pserver_program
to program
in next commit
block.clone_variable(var) | ||
|
||
program.global_block().append_op( | ||
type=op.type, inputs=inputs, outputs=outputs, attrs=op.attrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are the ops appended to global_block instead of block
? I guess these ops belong to the decay sub_block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems all lr_decay_ops belong to global block now, but maybe the block will be nested in the lr_decay_op in the future, so I will put it in the block in next commit, thanks for reviewing
@@ -1116,7 +1142,35 @@ def _is_splited_grad_var(self, var, var_dict): | |||
break | |||
return grad_block | |||
|
|||
def _append_pserver_non_opt_ops(self, optimize_block, opt_op, endpoint): | |||
def _clone_op(self, program, block, op): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can have a better name, such as _clone_lr_decay_block_ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I will change the name
2. Follow the original hierarchy of blocks 3. Change the function's name and remove debug lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
assert isinstance(origin_block, Block) | ||
# we put the new sub block to new block to follow the block | ||
# hierarchy of the original blocks | ||
new_sub_block = program.create_block(new_block.idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding blocks to pserver_program
will corrupt the execution of parameter server, for the sub-blocks may be executed automatically by the listen_and_serv
op. This PR should be reverted.
This close #11429