[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Xinyi-ECNU · 2024-08-23T08:54:44Z

Design for introducing cluster-level prefill-decode disaggregation design to Llumnix. Based on dynamic rescheduling of requests in Llumnix, this design allows Llumnix to manage prefill/decoding instances and the scheduling of requests on these instances. Specifically, this PR designs broader scheduling semantics, enabling the rules for PD disaggregation to be expressed as customized policies within Llumnix.

llumnix/arg_utils.py

llumnix/backends/backend_interface.py

llumnix/backends/vllm/llm_engine.py

llumnix/backends/vllm/scheduler.py

llumnix/backends/vllm/llm_engine.py

llumnix/backends/vllm/scheduler.py

llumnix/global_scheduler/global_scheduler.py

llumnix/global_scheduler/migration_scheduler.py

llumnix/llumlet/local_migration_scheduler.py

zhypku

I'd like to call this feature as: scheduling-defined pdd :)

llumnix/arg_utils.py

llumnix/backends/backend_interface.py

KuilongCui · 2024-08-26T11:05:54Z

llumnix/backends/backend_interface.py

+
+    @abstractmethod
+    def get_pre_migration_request(self) -> Optional[MigratingRequest]:
+        """Retrieves the request which meets the migration conditions from the running queue.


pre_migration_request is confusing. we need a new name.

KuilongCui · 2024-08-26T11:38:39Z

llumnix/backends/backend_interface.py

+        """Retrieves the request which meets the migration conditions from the running queue.
+
+        This method iterates over the running queue in reverse order and returns the last request
+        that has moved past the prefilling stage and met the migration conditions. In the current


The phrase "beyond the prefilling stage" fails to convey our key point: the number of steps exceeds the expected_steps. I suggest to delete it. pdd is the special situation when expected_steps is set to 1

llumnix/backends/vllm/llm_engine.py

llumnix/global_scheduler/global_scheduler.py

KuilongCui · 2024-08-26T12:10:47Z

llumnix/global_scheduler/global_scheduler.py

@@ -30,16 +30,19 @@ def __init__(self,
        # instance load and instance info args
        self.load_metric = global_scheduler_config.load_metric
        self.enable_defrag = global_scheduler_config.enable_defrag
+        self.enable_pd_disaggregation = global_scheduler_config.enable_pd_disaggregation


PDD is the external manifestation of expected_steps = 1.
I thinl that self.enable_pd_disaggregation -> self.expected_steps is better.

KuilongCui · 2024-08-27T02:09:45Z

llumnix/backends/vllm/scheduler.py

@@ -61,6 +67,8 @@ def __init__(self, *args, **kwargs) -> None:
        self.prefilling_seq_groups = []
        self.scheduler_lock = threading.Lock()
        self.migrating_out_request_last_stage = []
+        self.pre_migration = True


The default value for self.pre_migration is True?
pass this by parameter maybe better

KuilongCui · 2024-08-27T02:14:54Z

llumnix/config.py

@@ -49,3 +51,6 @@ def __init__(
        self.scaling_policy = scaling_policy
        self.scale_up_threshold = scale_up_threshold*(-1)
        self.scale_down_threshold = scale_down_threshold*(-1)
+
+        self.enable_pd_disaggregation = enable_pd_disaggregation


We want to implement a scheduling method based on the number of steps completed by the request.
Don't use prefill_xxx or decode_xxx. It is a special case. This also includes other files chenaged by this pr.

We want to implement a scheduling method based on the number of steps completed by the request. Don't use prefill_xxx or decode_xxx. It is a special case. This also includes other files chenaged by this pr.

At the scheduling layer and backend, they are indeed blind to the PDD configuration and work based on the number of steps. However, higher-level management/configuration (such as the global scheduler/LLM engine manager) needs to determine whether to enable PDD. Only with this configuration enabled can scheduling actions based on the step be initiated.

KuilongCui · 2024-08-27T02:25:06Z

llumnix/global_scheduler/migration_scheduler.py

-
+        sorted_src_instance_infos, sorted_dst_instance_infos, pre_migration = self._get_migration_pattern(migrate_target)
+        return self.pair_migration_policy.pair_migration(sorted_src_instance_infos, sorted_dst_instance_infos, pre_migration)
+    def _get_migration_pattern(self, migrate_target:str) -> Dict[str, InstanceInfo]:


_get_migration_pattern seems this function will return a migration pattern.
Consider renaming this function.

KuilongCui · 2024-08-27T02:29:24Z

llumnix/global_scheduler/migration_scheduler.py

+        if self.available_prefill_instance_num > 0 and len(self.prefill_inst_ids_set) < self.available_prefill_instance_num:
+            self.prefill_inst_ids_set.add(instance_id)
+        else:
+            self.decoding_inst_ids_set.add(instance_id)


The current implementation seems to prevent us from designating certain instances as prefill_instances or decode_instances.

The current implementation seems to prevent us from designating certain instances as prefill_instances or decode_instances.

For now, since the setting is homogeneous, whether or not to designate has little impact. However, it is still necessary to designate certain instances as prefill_instances or decode_instances, which will be the next step.

KuilongCui · 2024-08-27T02:30:05Z

llumnix/global_scheduler/migration_scheduler.py

@@ -43,21 +52,58 @@ def __init__(self,

        self.num_instances = 0
        self.instance_id_set: Set[str] = set()
+        # prefill_inst_ids_set contains all instances that allow prefilling.
+        self.prefill_inst_ids_set: Set[str] = set()
+        self.decoding_inst_ids_set: Set[str] = set()


No prefill_xxx, decode_xxx

llumnix/llm_engine_manager.py

KuilongCui · 2024-08-27T03:27:22Z

llumnix/llm_engine_manager.py

+            else:
+                asyncio.create_task(self._migrate(MigrationTarget.GENERAL, 1))
+        # pylint: disable=W0703
+        except Exception as e:


What specific exception might occur here?

llumnix/llm_engine_manager.py

llumnix/global_scheduler/migration_scheduler.py

CLAassistant · 2024-08-28T03:20:16Z

All committers have signed the CLA.

llumnix/backends/backend_interface.py

llumnix/config.py

llumnix/global_scheduler/global_scheduler.py

llumnix/global_scheduler/migration_scheduler.py

llumnix/llm_engine_manager.py

llumnix/llumlet/llumlet.py

llumnix/llumlet/migration_coordinator.py

llumnix/backends/vllm/scheduler.py

llumnix/llumlet/request.py

llumnix/llumlet/llumlet.py

llumnix/llumlet/local_migration_scheduler.py

llumnix/global_scheduler/migration_scheduler.py

llumnix/backends/backend_interface.py

llumnix/llumlet/request.py

KuilongCui · 2024-09-11T07:10:15Z

llumnix/llumlet/llumlet.py

@@ -98,29 +98,36 @@ def from_args(cls,
        llumlet = engine_class.remote(instance_id, backend_type, migration_config, *args, **kwargs)
        return llumlet

-    def migrate_out(self, dst_instance_name: str) -> List[str]:
+    def migrate_out(self, dst_instance_name: str, num_requests: int) -> List[str]:


In this function, num_request is used like a boolean (num_requests == 1)

And for pdd, I think you need a function named migrate_out_singlestage.

consider refactor this fucntion

In this function, num_request is used like a boolean (num_requests == 1)

And for pdd, I think you need a function named migrate_out_singlestage.

consider refactor this fucntion

Removed the logic to treat num_request as a boolean. We have reused migrate_out_multistage to send blocks in one stage for pdd and dont need additional function. Please check.

KuilongCui · 2024-09-11T07:26:07Z

llumnix/global_scheduler/migration_scheduler.py

+
+    # Enable the prefill-decoding disaggregration.
+    DECODING_2_DECODING = "DECODING_2_DECODING"
+    PREFILL_2_DECODING = "PREFILL_2_DECODING"



I believe we don't need the concepts of prefill or decode types in PairMigration.
a set of src instanceInfo and a set of dst instanceInfo are enough.

I believe we don't need the concepts of prefill or decode types in PairMigration. a set of src instanceInfo and a set of dst instanceInfo are enough.

PairMigrationConstraints are primarily used by the global scheduler to provide the current migration decision. Within the migration scheduler, an additional step is now taken to translate PairMigrationConstraints into the src and dst of different instance type. Have discussed with zhypku before to reserve this data structure.

github-actions · 2024-09-24T13:31:39Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	224.00 MB	232.00 MB	280.00 MB	288.00 MB	312.00 MB	344.00 MB	368.00 MB	416.00 MB	424.00 MB	432.00 MB	440.00 MB	496.00 MB	720.00 MB	912.00 MB
rpc_speed(GB/s)	1.05	1.53	1.79	1.95	2.04	2.18	2.18	2.22	2.24	2.34	2.34	2.41	2.43	2.40	2.49	2.51	2.62	2.34	2.34	2.59	2.52	2.50	2.52	2.46	2.64	2.78	2.18	2.49	2.80	2.59	2.76	2.92	3.18	3.05	3.13	3.17	2.93	3.45	3.32	3.06

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	192.00 MB	200.00 MB	240.00 MB	280.00 MB	312.00 MB	384.00 MB	416.00 MB	464.00 MB	480.00 MB	488.00 MB	536.00 MB	544.00 MB	696.00 MB
gloo_speed(GB/s)	0.92	1.60	1.98	2.28	2.44	2.78	2.72	2.99	2.64	2.91	2.90	2.57	2.62	2.68	3.41	2.62	2.36	2.25	2.19	2.13	2.18	2.50	2.92	2.90	3.40	2.56	2.38	2.97	2.51	2.68	2.84	2.63	2.73	2.64	2.81

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	208.00 MB	232.00 MB	240.00 MB	256.00 MB	280.00 MB	312.00 MB	320.00 MB	416.00 MB	424.00 MB	448.00 MB	464.00 MB	488.00 MB	528.00 MB	536.00 MB	752.00 MB
nccl_speed(GB/s)	0.19	0.45	0.68	0.85	1.14	1.33	1.50	1.52	2.00	1.76	1.76	2.24	2.38	2.63	2.26	2.63	2.86	2.58	3.38	3.44	4.23	3.41	3.62	3.31	3.22	4.43	5.22	4.46	2.65	2.94	4.44	5.66	5.35	5.49	3.73	3.54	5.44	3.99	3.96

github-actions · 2024-09-24T13:49:27Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	32545.25	99194.50	192927.00	260004.45	292360.40	114263.90

decode	p25	p50	p75	p95	p99	mean
latency(ms)	54.14	59.66	73.90	135.57	231.50	72.70

zhypku reviewed Aug 23, 2024

View reviewed changes

Xinyi-ECNU requested a review from zhypku August 26, 2024 06:02

zhypku reviewed Aug 26, 2024

View reviewed changes

zhypku requested review from s5u13b, KuilongCui and ZeldaHuang August 26, 2024 07:05

zhypku reviewed Aug 26, 2024

View reviewed changes

KuilongCui reviewed Aug 26, 2024

View reviewed changes

KuilongCui reviewed Aug 27, 2024

View reviewed changes

Xinyi-ECNU changed the title ~~[Core] Support for Prefill-Decode Disaggregation feature~~ [Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature Aug 27, 2024

Xinyi-ECNU force-pushed the pd_disagg branch from 598f08b to c28584b Compare August 28, 2024 03:28

Xinyi-ECNU force-pushed the pd_disagg branch from c28584b to 16a05d1 Compare September 5, 2024 03:26

s5u13b reviewed Sep 6, 2024

View reviewed changes

ZeldaHuang reviewed Sep 6, 2024

View reviewed changes

llumnix/llumlet/local_migration_scheduler.py Show resolved Hide resolved

Xinyi-ECNU force-pushed the pd_disagg branch from ab199a4 to f50fa8d Compare September 9, 2024 08:57

s5u13b reviewed Sep 9, 2024

View reviewed changes

llumnix/global_scheduler/migration_scheduler.py Show resolved Hide resolved

zhypku reviewed Sep 10, 2024

View reviewed changes

llumnix/global_scheduler/migration_scheduler.py Show resolved Hide resolved

Xinyi-ECNU force-pushed the pd_disagg branch from 937afce to c921ef9 Compare September 11, 2024 03:06

KuilongCui reviewed Sep 11, 2024

View reviewed changes

Xinyi-ECNU and others added 9 commits September 23, 2024 19:05

refactor

1170b82

fix

6082ffc

fix

d0b87d6

fix

47bf01e

fix

780affc

resolve conflict

81a8967

fix

ee41f38

fix

6970461

fix

20d10ea

Xinyi-ECNU added 2 commits September 23, 2024 19:05

fix

add0c7b

rebase

749a93f

Xinyi-ECNU force-pushed the pd_disagg branch from 3c6166e to 749a93f Compare September 24, 2024 01:59

fix ci

751e6c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Xinyi-ECNU commented Aug 23, 2024 •

edited

Loading

zhypku left a comment

KuilongCui Aug 26, 2024

KuilongCui Aug 26, 2024

KuilongCui Aug 26, 2024

KuilongCui Aug 27, 2024

KuilongCui Aug 27, 2024

Xinyi-ECNU Sep 5, 2024

KuilongCui Aug 27, 2024

KuilongCui Aug 27, 2024

Xinyi-ECNU Aug 27, 2024

KuilongCui Aug 27, 2024

KuilongCui Aug 27, 2024

CLAassistant commented Aug 28, 2024 •

edited

Loading

KuilongCui Sep 11, 2024

Xinyi-ECNU Sep 11, 2024

KuilongCui Sep 11, 2024

Xinyi-ECNU Sep 11, 2024

github-actions bot commented Sep 24, 2024

github-actions bot commented Sep 24, 2024

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Are you sure you want to change the base?

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Conversation

Xinyi-ECNU commented Aug 23, 2024 • edited Loading

zhypku left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 24, 2024

github-actions bot commented Sep 24, 2024

Xinyi-ECNU commented Aug 23, 2024 •

edited

Loading

CLAassistant commented Aug 28, 2024 •

edited

Loading