Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable standalone executor for single-GPU training #45913

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 25 additions & 12 deletions python/paddle/fluid/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -1555,23 +1555,32 @@ def _can_use_interpreter_core(program, place):
place, core.CustomPlace):
return False

use_standalone_executor_for_compiled_program = os.environ.get(
use_standalone_executor_for_distribution = os.environ.get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT, why change to for distribution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, the FLAGS_CONVERT_GRAPH_TO_PROGRAM is no longer control whether to run Compiled Program with Standalone Executor, but control whether to run distribution.
It is better to change the FLAGS name at the same time. However, considering that it is just a temporary FLAGS and QAs have been set it in their testing environment, keeping the same FLAGS name can avoid the requirement for QA to reset FLAGS when they try to test the distributed scenario of Standalone Executor.

'FLAGS_CONVERT_GRAPH_TO_PROGRAM',
None) in [1, '1', True, 'True', 'true']

# Only support fleet when 'FLAGS_CONVERT_GRAPH_TO_PROGRAM' is set to true
from paddle.distributed.fleet import fleet
if fleet._role_maker is not None and not use_standalone_executor_for_compiled_program:
warnings.warn("Standalone executor is not used for fleet",
UserWarning)
return False

compiled = isinstance(program,
compiler.CompiledProgram) or isinstance(
program._graph, compiler.CompiledProgram)
if compiled:
compiled_program = program if isinstance(
program, compiler.CompiledProgram) else program._graph

# delete this code after supporting compiled_program._graph
if compiled_program._program is None:
warnings.warn("Standalone executor is not used for Graph",
UserWarning)
return use_standalone_executor_for_distribution

# delete this code after supporting distribution
if compiled_program._build_strategy is not None and (
compiled_program._build_strategy.is_distribution
or compiled_program._build_strategy.num_trainers > 1):
warnings.warn(
"Standalone executor is not used for distribution",
UserWarning)
return use_standalone_executor_for_distribution

# Unsupported case 1: data parallel
if compiled_program._is_data_parallel and len(
compiled_program._get_places(
Expand Down Expand Up @@ -1611,10 +1620,14 @@ def _can_use_interpreter_core(program, place):
UserWarning)
return False

return use_standalone_executor_for_compiled_program
else:
assert isinstance(program, Program)
return True
# delete this code after supporting fleet
from paddle.distributed.fleet import fleet
if fleet._role_maker is not None:
warnings.warn("Standalone executor is not used for fleet",
UserWarning)
return use_standalone_executor_for_distribution

return True

# NOTE: This is an experimental feature. If `export FLAGS_USE_STANDALONE_EXECUTOR=1 `,
# use StandaloneExecutor to run the program.
Expand Down