Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Xinyi-ECNU
Copy link

@Xinyi-ECNU Xinyi-ECNU commented Aug 23, 2024

Design for introducing cluster-level prefill-decode disaggregation design to Llumnix. Based on dynamic rescheduling of requests in Llumnix, this design allows Llumnix to manage prefill/decoding instances and the scheduling of requests on these instances. Specifically, this PR designs broader scheduling semantics, enabling the rules for PD disaggregation to be expressed as customized policies within Llumnix.

llumnix/arg_utils.py Outdated Show resolved Hide resolved
llumnix/backends/backend_interface.py Outdated Show resolved Hide resolved
llumnix/backends/backend_interface.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/llm_engine.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/scheduler.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/scheduler.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/llm_engine.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/llm_engine.py Outdated Show resolved Hide resolved
@Xinyi-ECNU Xinyi-ECNU requested a review from zhypku August 26, 2024 06:02
llumnix/backends/vllm/scheduler.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/scheduler.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/scheduler.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/scheduler.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/global_scheduler.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/migration_scheduler.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/migration_scheduler.py Outdated Show resolved Hide resolved
llumnix/llumlet/local_migration_scheduler.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@zhypku zhypku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to call this feature as: scheduling-defined pdd :)

llumnix/arg_utils.py Outdated Show resolved Hide resolved
llumnix/backends/backend_interface.py Outdated Show resolved Hide resolved

@abstractmethod
def get_pre_migration_request(self) -> Optional[MigratingRequest]:
"""Retrieves the request which meets the migration conditions from the running queue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_migration_request is confusing. we need a new name.

"""Retrieves the request which meets the migration conditions from the running queue.

This method iterates over the running queue in reverse order and returns the last request
that has moved past the prefilling stage and met the migration conditions. In the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "beyond the prefilling stage" fails to convey our key point: the number of steps exceeds the expected_steps. I suggest to delete it. pdd is the special situation when expected_steps is set to 1

llumnix/backends/vllm/llm_engine.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/llm_engine.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/global_scheduler.py Outdated Show resolved Hide resolved
@@ -30,16 +30,19 @@ def __init__(self,
# instance load and instance info args
self.load_metric = global_scheduler_config.load_metric
self.enable_defrag = global_scheduler_config.enable_defrag
self.enable_pd_disaggregation = global_scheduler_config.enable_pd_disaggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PDD is the external manifestation of expected_steps = 1.
I thinl that self.enable_pd_disaggregation -> self.expected_steps is better.

@@ -61,6 +67,8 @@ def __init__(self, *args, **kwargs) -> None:
self.prefilling_seq_groups = []
self.scheduler_lock = threading.Lock()
self.migrating_out_request_last_stage = []
self.pre_migration = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for self.pre_migration is True?
pass this by parameter maybe better

@@ -49,3 +51,6 @@ def __init__(
self.scaling_policy = scaling_policy
self.scale_up_threshold = scale_up_threshold*(-1)
self.scale_down_threshold = scale_down_threshold*(-1)

self.enable_pd_disaggregation = enable_pd_disaggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to implement a scheduling method based on the number of steps completed by the request.
Don't use prefill_xxx or decode_xxx. It is a special case. This also includes other files chenaged by this pr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to implement a scheduling method based on the number of steps completed by the request. Don't use prefill_xxx or decode_xxx. It is a special case. This also includes other files chenaged by this pr.

At the scheduling layer and backend, they are indeed blind to the PDD configuration and work based on the number of steps. However, higher-level management/configuration (such as the global scheduler/LLM engine manager) needs to determine whether to enable PDD. Only with this configuration enabled can scheduling actions based on the step be initiated.


sorted_src_instance_infos, sorted_dst_instance_infos, pre_migration = self._get_migration_pattern(migrate_target)
return self.pair_migration_policy.pair_migration(sorted_src_instance_infos, sorted_dst_instance_infos, pre_migration)
def _get_migration_pattern(self, migrate_target:str) -> Dict[str, InstanceInfo]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_migration_pattern seems this function will return a migration pattern.
Consider renaming this function.

if self.available_prefill_instance_num > 0 and len(self.prefill_inst_ids_set) < self.available_prefill_instance_num:
self.prefill_inst_ids_set.add(instance_id)
else:
self.decoding_inst_ids_set.add(instance_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation seems to prevent us from designating certain instances as prefill_instances or decode_instances.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation seems to prevent us from designating certain instances as prefill_instances or decode_instances.

For now, since the setting is homogeneous, whether or not to designate has little impact. However, it is still necessary to designate certain instances as prefill_instances or decode_instances, which will be the next step.

@@ -43,21 +52,58 @@ def __init__(self,

self.num_instances = 0
self.instance_id_set: Set[str] = set()
# prefill_inst_ids_set contains all instances that allow prefilling.
self.prefill_inst_ids_set: Set[str] = set()
self.decoding_inst_ids_set: Set[str] = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prefill_xxx, decode_xxx

llumnix/llm_engine_manager.py Outdated Show resolved Hide resolved
else:
asyncio.create_task(self._migrate(MigrationTarget.GENERAL, 1))
# pylint: disable=W0703
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific exception might occur here?

llumnix/llm_engine_manager.py Outdated Show resolved Hide resolved
llumnix/llm_engine_manager.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/migration_scheduler.py Outdated Show resolved Hide resolved
@Xinyi-ECNU Xinyi-ECNU changed the title [Core] Support for Prefill-Decode Disaggregation feature [Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature Aug 27, 2024
@CLAassistant
Copy link

CLAassistant commented Aug 28, 2024

CLA assistant check
All committers have signed the CLA.

llumnix/backends/backend_interface.py Outdated Show resolved Hide resolved
llumnix/backends/backend_interface.py Outdated Show resolved Hide resolved
llumnix/config.py Outdated Show resolved Hide resolved
llumnix/config.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/global_scheduler.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/migration_scheduler.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/migration_scheduler.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/migration_scheduler.py Outdated Show resolved Hide resolved
llumnix/llm_engine_manager.py Outdated Show resolved Hide resolved
llumnix/llumlet/llumlet.py Outdated Show resolved Hide resolved
llumnix/llumlet/migration_coordinator.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/scheduler.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/scheduler.py Outdated Show resolved Hide resolved
llumnix/llumlet/request.py Outdated Show resolved Hide resolved
llumnix/llumlet/llumlet.py Outdated Show resolved Hide resolved
llumnix/backends/backend_interface.py Show resolved Hide resolved
llumnix/llumlet/request.py Outdated Show resolved Hide resolved
llumnix/llumlet/request.py Show resolved Hide resolved
@@ -98,29 +98,36 @@ def from_args(cls,
llumlet = engine_class.remote(instance_id, backend_type, migration_config, *args, **kwargs)
return llumlet

def migrate_out(self, dst_instance_name: str) -> List[str]:
def migrate_out(self, dst_instance_name: str, num_requests: int) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function, num_request is used like a boolean (num_requests == 1)

And for pdd, I think you need a function named migrate_out_singlestage.

consider refactor this fucntion

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function, num_request is used like a boolean (num_requests == 1)

And for pdd, I think you need a function named migrate_out_singlestage.

consider refactor this fucntion

Removed the logic to treat num_request as a boolean. We have reused migrate_out_multistage to send blocks in one stage for pdd and dont need additional function. Please check.


# Enable the prefill-decoding disaggregration.
DECODING_2_DECODING = "DECODING_2_DECODING"
PREFILL_2_DECODING = "PREFILL_2_DECODING"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don't need the concepts of prefill or decode types in PairMigration.
a set of src instanceInfo and a set of dst instanceInfo are enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don't need the concepts of prefill or decode types in PairMigration. a set of src instanceInfo and a set of dst instanceInfo are enough.

PairMigrationConstraints are primarily used by the global scheduler to provide the current migration decision. Within the migration scheduler, an additional step is now taken to translate PairMigrationConstraints into the src and dst of different instance type. Have discussed with zhypku before to reserve this data structure.

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 232.00 MB 280.00 MB 288.00 MB 312.00 MB 344.00 MB 368.00 MB 416.00 MB 424.00 MB 432.00 MB 440.00 MB 496.00 MB 720.00 MB 912.00 MB
rpc_speed(GB/s) 1.05 1.53 1.79 1.95 2.04 2.18 2.18 2.22 2.24 2.34 2.34 2.41 2.43 2.40 2.49 2.51 2.62 2.34 2.34 2.59 2.52 2.50 2.52 2.46 2.64 2.78 2.18 2.49 2.80 2.59 2.76 2.92 3.18 3.05 3.13 3.17 2.93 3.45 3.32 3.06
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 240.00 MB 280.00 MB 312.00 MB 384.00 MB 416.00 MB 464.00 MB 480.00 MB 488.00 MB 536.00 MB 544.00 MB 696.00 MB
gloo_speed(GB/s) 0.92 1.60 1.98 2.28 2.44 2.78 2.72 2.99 2.64 2.91 2.90 2.57 2.62 2.68 3.41 2.62 2.36 2.25 2.19 2.13 2.18 2.50 2.92 2.90 3.40 2.56 2.38 2.97 2.51 2.68 2.84 2.63 2.73 2.64 2.81
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 208.00 MB 232.00 MB 240.00 MB 256.00 MB 280.00 MB 312.00 MB 320.00 MB 416.00 MB 424.00 MB 448.00 MB 464.00 MB 488.00 MB 528.00 MB 536.00 MB 752.00 MB
nccl_speed(GB/s) 0.19 0.45 0.68 0.85 1.14 1.33 1.50 1.52 2.00 1.76 1.76 2.24 2.38 2.63 2.26 2.63 2.86 2.58 3.38 3.44 4.23 3.41 3.62 3.31 3.22 4.43 5.22 4.46 2.65 2.94 4.44 5.66 5.35 5.49 3.73 3.54 5.44 3.99 3.96

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 32545.25 99194.50 192927.00 260004.45 292360.40 114263.90
decode p25 p50 p75 p95 p99 mean
latency(ms) 54.14 59.66 73.90 135.57 231.50 72.70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants