-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to call this feature as: scheduling-defined pdd :)
|
||
@abstractmethod | ||
def get_pre_migration_request(self) -> Optional[MigratingRequest]: | ||
"""Retrieves the request which meets the migration conditions from the running queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre_migration_request is confusing. we need a new name.
"""Retrieves the request which meets the migration conditions from the running queue. | ||
|
||
This method iterates over the running queue in reverse order and returns the last request | ||
that has moved past the prefilling stage and met the migration conditions. In the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "beyond the prefilling stage" fails to convey our key point: the number of steps exceeds the expected_steps. I suggest to delete it. pdd is the special situation when expected_steps is set to 1
@@ -30,16 +30,19 @@ def __init__(self, | |||
# instance load and instance info args | |||
self.load_metric = global_scheduler_config.load_metric | |||
self.enable_defrag = global_scheduler_config.enable_defrag | |||
self.enable_pd_disaggregation = global_scheduler_config.enable_pd_disaggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PDD is the external manifestation of expected_steps = 1.
I thinl that self.enable_pd_disaggregation -> self.expected_steps is better.
llumnix/backends/vllm/scheduler.py
Outdated
@@ -61,6 +67,8 @@ def __init__(self, *args, **kwargs) -> None: | |||
self.prefilling_seq_groups = [] | |||
self.scheduler_lock = threading.Lock() | |||
self.migrating_out_request_last_stage = [] | |||
self.pre_migration = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for self.pre_migration is True?
pass this by parameter maybe better
llumnix/config.py
Outdated
@@ -49,3 +51,6 @@ def __init__( | |||
self.scaling_policy = scaling_policy | |||
self.scale_up_threshold = scale_up_threshold*(-1) | |||
self.scale_down_threshold = scale_down_threshold*(-1) | |||
|
|||
self.enable_pd_disaggregation = enable_pd_disaggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to implement a scheduling method based on the number of steps completed by the request.
Don't use prefill_xxx or decode_xxx. It is a special case. This also includes other files chenaged by this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to implement a scheduling method based on the number of steps completed by the request. Don't use prefill_xxx or decode_xxx. It is a special case. This also includes other files chenaged by this pr.
At the scheduling layer and backend, they are indeed blind to the PDD configuration and work based on the number of steps. However, higher-level management/configuration (such as the global scheduler/LLM engine manager) needs to determine whether to enable PDD. Only with this configuration enabled can scheduling actions based on the step be initiated.
|
||
sorted_src_instance_infos, sorted_dst_instance_infos, pre_migration = self._get_migration_pattern(migrate_target) | ||
return self.pair_migration_policy.pair_migration(sorted_src_instance_infos, sorted_dst_instance_infos, pre_migration) | ||
def _get_migration_pattern(self, migrate_target:str) -> Dict[str, InstanceInfo]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_migration_pattern seems this function will return a migration pattern.
Consider renaming this function.
if self.available_prefill_instance_num > 0 and len(self.prefill_inst_ids_set) < self.available_prefill_instance_num: | ||
self.prefill_inst_ids_set.add(instance_id) | ||
else: | ||
self.decoding_inst_ids_set.add(instance_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation seems to prevent us from designating certain instances as prefill_instances or decode_instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation seems to prevent us from designating certain instances as prefill_instances or decode_instances.
For now, since the setting is homogeneous, whether or not to designate has little impact. However, it is still necessary to designate certain instances as prefill_instances or decode_instances, which will be the next step.
@@ -43,21 +52,58 @@ def __init__(self, | |||
|
|||
self.num_instances = 0 | |||
self.instance_id_set: Set[str] = set() | |||
# prefill_inst_ids_set contains all instances that allow prefilling. | |||
self.prefill_inst_ids_set: Set[str] = set() | |||
self.decoding_inst_ids_set: Set[str] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No prefill_xxx, decode_xxx
llumnix/llm_engine_manager.py
Outdated
else: | ||
asyncio.create_task(self._migrate(MigrationTarget.GENERAL, 1)) | ||
# pylint: disable=W0703 | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What specific exception might occur here?
c28584b
to
16a05d1
Compare
ab199a4
to
f50fa8d
Compare
937afce
to
c921ef9
Compare
@@ -98,29 +98,36 @@ def from_args(cls, | |||
llumlet = engine_class.remote(instance_id, backend_type, migration_config, *args, **kwargs) | |||
return llumlet | |||
|
|||
def migrate_out(self, dst_instance_name: str) -> List[str]: | |||
def migrate_out(self, dst_instance_name: str, num_requests: int) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function, num_request is used like a boolean (num_requests == 1)
And for pdd, I think you need a function named migrate_out_singlestage.
consider refactor this fucntion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function, num_request is used like a boolean (num_requests == 1)
And for pdd, I think you need a function named migrate_out_singlestage.
consider refactor this fucntion
Removed the logic to treat num_request as a boolean. We have reused migrate_out_multistage to send blocks in one stage for pdd and dont need additional function. Please check.
|
||
# Enable the prefill-decoding disaggregration. | ||
DECODING_2_DECODING = "DECODING_2_DECODING" | ||
PREFILL_2_DECODING = "PREFILL_2_DECODING" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we don't need the concepts of prefill or decode types in PairMigration.
a set of src instanceInfo and a set of dst instanceInfo are enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we don't need the concepts of prefill or decode types in PairMigration. a set of src instanceInfo and a set of dst instanceInfo are enough.
PairMigrationConstraints are primarily used by the global scheduler to provide the current migration decision. Within the migration scheduler, an additional step is now taken to translate PairMigrationConstraints into the src and dst of different instance type. Have discussed with zhypku before to reserve this data structure.
3c6166e
to
749a93f
Compare
|
|
Design for introducing cluster-level prefill-decode disaggregation design to Llumnix. Based on dynamic rescheduling of requests in Llumnix, this design allows Llumnix to manage prefill/decoding instances and the scheduling of requests on these instances. Specifically, this PR designs broader scheduling semantics, enabling the rules for PD disaggregation to be expressed as customized policies within Llumnix.