feat(BA-96): metric based model service autoscaling #3277

kyujin-cho · 2024-12-19T16:54:26Z

Resolves #2659 (BA-96)

What's changed

endpoint_auto_scaling_rules table added
implemented new scheduling logic under scale_services() which automatically increases or decreases EndpointRow.replicas factors based on the rules defined
Separated aiodataloader handler and actual bulk loading mechanism from both EndpointStatistics and KernelStatistics so that bulk loader can be reused internally

How it works

Every endpoint (model service) can have one or more auto scaling rules
Auto scaling rule is defined as:
- Metric source - inference runtime or kernel
  - inference framework: average value taken from every replicas. Supported only if both AppProxy reports the inference metrics. Check Backend.AI Enterprise guide for more details.
  - kernel: average value taken from every kernels backing the endpoint
- Metric name (e.g. cuda.shares or vllm_avg_prompt_throughput_toks_per_s)
- Comparator - method to compare live metrics with threshold value.
  - LESS_THAN: Rule triggered when current metric value goes below the threshold defined
  - LESS_THAN_OR_EQUAL: Rule triggered when current metric value goes below or equals the threshold defined
  - GREATER_THAN: Rule triggered when current metric value goes above the threshold defined
  - GREATER_THAN_OR_EQUAL: Rule triggered when current metric value goes above or equals the threshold defined
- Step size: size of step of the replica count to be changed when rule is triggered. Can be represented as both positive and negative value - when defined as negative, the rule will decrease number of replicas.
- Cooldown Seconds: Durations in seconds to skip reapplying the rule right after rule is first triggered.
- Minimum Replicas: Sets a minimum value for the replica count of the endpoint. Rule will not be triggered if the potential replica count gets below this value.
- Maximum Replicas: Sets a maximum value for the replica count of the endpoint. Rule will not be triggered if the potential replica count gets above this value.

Checklist: (if applicable)

Milestone metadata specifying the target backport version
Mention to the original issue

📚 Documentation preview 📚: https://sorna--3277.org.readthedocs.build/en/3277/

📚 Documentation preview 📚: https://sorna-ko--3277.org.readthedocs.build/ko/3277/

Co-authored-by: octodog <mu001@lablup.com>

…te()

achimnol

As @yomybaby has requested, please update the GraphQL schema to use explicit Enum types for AutoScalingMetricSource and AutoScalingMetricComparator.
Also, currently the GraphQL query results are using field names, not values despite 913569e. Please fix this as well.

achimnol

There are some async-database handling issues as well.

Co-authored-by: octodog <mu001@lablup.com>

kyujin-cho · 2025-01-02T08:31:51Z

Well, seems like graphene's Enum type expects enum names (not values) as a field input... :(

Co-authored-by: octodog <mu001@lablup.com>

HyeockJinKim · 2025-01-08T05:18:37Z

Could you review it again? @achimnol

kyujin-cho added 3 commits December 19, 2024 16:43

update

4f41b1f

add gql query & mutations

22cbde8

add migration script

edb16dc

github-actions bot assigned kyujin-cho Dec 19, 2024

kyujin-cho requested review from achimnol and HyeockJinKim December 19, 2024 16:54

github-actions bot added area:docs Documentations comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated labels Dec 19, 2024

kyujin-cho added type:feature Add new features and removed area:docs Documentations comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated labels Dec 19, 2024

github-actions bot added size:L 100~500 LoC comp:agent Related to Agent component comp:appproxy Related to App Proxy component comp:manager Related to Manager component urgency:5 It is imperative that action be taken right away. labels Dec 19, 2024

kyujin-cho added this to the 24.12 milestone Dec 19, 2024

kyujin-cho changed the title ~~feature: model service autoscaling~~ feat: model service autoscaling Dec 19, 2024

kyujin-cho changed the title ~~feat: model service autoscaling~~ feat: metric based model service autoscaling Dec 19, 2024

kyujin-cho added 2 commits December 20, 2024 02:07

add missing file

79ad37e

Merge branch 'main' into feature/model-service-autoscale

d044f4b

kyujin-cho marked this pull request as ready for review December 19, 2024 17:08

add news fragment

9bc0661

kyujin-cho force-pushed the feature/model-service-autoscale branch from 2e9102e to 9bc0661 Compare December 20, 2024 12:12

kyujin-cho and others added 4 commits December 20, 2024 12:14

chore: update GraphQL schema dump

66f28e7

Co-authored-by: octodog <mu001@lablup.com>

add min/max replicas

d494709

chore: update GraphQL schema dump

d1b0231

Co-authored-by: octodog <mu001@lablup.com>

add min/max replicas

11f57e0

achimnol added 3 commits December 30, 2024 17:50

refactor: We have typing.Self

3d4563a

fix,refactor: Use simpler types and fix missing await in session.dele…

3a60225

…te()

fix: We don't need __init__() boilerplate!

58dbb9d

achimnol requested changes Dec 30, 2024

View reviewed changes

achimnol and others added 6 commits December 30, 2024 18:24

refactor: Use simpler types

a3dddf6

Merge branch 'main' into feature/model-service-autoscale

ec0ffa4

accept PR review

33dcce8

fix graphene Enum misuse

089028a

implement CLI function

9b47811

implement CLI function

0baf316

github-actions bot added comp:client Related to Client component comp:common Related to Common component comp:cli Related to CLI component labels Jan 2, 2025

kyujin-cho and others added 10 commits January 2, 2025 15:25

fix invalid import

56798a4

fix invalid GQL definitioN

310bb44

fix typo

353e888

Merge branch 'main' into feature/model-service-autoscale

1f7d22b

chore: update GraphQL schema dump

fd04771

Co-authored-by: octodog <mu001@lablup.com>

update annotation

daa0530

chore: update GraphQL schema dump

f11f7a6

Co-authored-by: octodog <mu001@lablup.com>

restructure CLI

4342869

fix typo

e6bf903

fix typo

de6e6e7

kyujin-cho added 2 commits January 2, 2025 10:29

fix cli not working

274b390

Merge branch 'main' into feature/model-service-autoscale

32d469a

kyujin-cho requested a review from achimnol January 2, 2025 10:31

chore: update GraphQL schema dump

0fa8de8

Co-authored-by: octodog <mu001@lablup.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(BA-96): metric based model service autoscaling #3277

feat(BA-96): metric based model service autoscaling #3277

kyujin-cho commented Dec 19, 2024 •

edited by github-actions bot

Loading

achimnol left a comment •

edited

Loading

achimnol left a comment

kyujin-cho commented Jan 2, 2025

HyeockJinKim commented Jan 8, 2025

feat(BA-96): metric based model service autoscaling #3277

Are you sure you want to change the base?

feat(BA-96): metric based model service autoscaling #3277

Conversation

kyujin-cho commented Dec 19, 2024 • edited by github-actions bot Loading

What's changed

How it works

achimnol left a comment • edited Loading

Choose a reason for hiding this comment

achimnol left a comment

Choose a reason for hiding this comment

kyujin-cho commented Jan 2, 2025

HyeockJinKim commented Jan 8, 2025

kyujin-cho commented Dec 19, 2024 •

edited by github-actions bot

Loading

achimnol left a comment •

edited

Loading