Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(BA-96): metric based model service autoscaling #3277

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

kyujin-cho
Copy link
Member

@kyujin-cho kyujin-cho commented Dec 19, 2024

Resolves #2659 (BA-96)

What's changed

  • endpoint_auto_scaling_rules table added
  • implemented new scheduling logic under scale_services() which automatically increases or decreases EndpointRow.replicas factors based on the rules defined
  • Separated aiodataloader handler and actual bulk loading mechanism from both EndpointStatistics and KernelStatistics so that bulk loader can be reused internally

How it works

  • Every endpoint (model service) can have one or more auto scaling rules
  • Auto scaling rule is defined as:
    • Metric source - inference runtime or kernel
      • inference framework: average value taken from every replicas. Supported only if both AppProxy reports the inference metrics. Check Backend.AI Enterprise guide for more details.
      • kernel: average value taken from every kernels backing the endpoint
    • Metric name (e.g. cuda.shares or vllm_avg_prompt_throughput_toks_per_s)
    • Comparator - method to compare live metrics with threshold value.
      • LESS_THAN: Rule triggered when current metric value goes below the threshold defined
      • LESS_THAN_OR_EQUAL: Rule triggered when current metric value goes below or equals the threshold defined
      • GREATER_THAN: Rule triggered when current metric value goes above the threshold defined
      • GREATER_THAN_OR_EQUAL: Rule triggered when current metric value goes above or equals the threshold defined
    • Step size: size of step of the replica count to be changed when rule is triggered. Can be represented as both positive and negative value - when defined as negative, the rule will decrease number of replicas.
    • Cooldown Seconds: Durations in seconds to skip reapplying the rule right after rule is first triggered.
    • Minimum Replicas: Sets a minimum value for the replica count of the endpoint. Rule will not be triggered if the potential replica count gets below this value.
    • Maximum Replicas: Sets a maximum value for the replica count of the endpoint. Rule will not be triggered if the potential replica count gets above this value.

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue

📚 Documentation preview 📚: https://sorna--3277.org.readthedocs.build/en/3277/


📚 Documentation preview 📚: https://sorna-ko--3277.org.readthedocs.build/ko/3277/

@github-actions github-actions bot added area:docs Documentations comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated labels Dec 19, 2024
@kyujin-cho kyujin-cho added type:feature Add new features and removed area:docs Documentations comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated labels Dec 19, 2024
@github-actions github-actions bot added size:L 100~500 LoC comp:agent Related to Agent component comp:appproxy Related to App Proxy component comp:manager Related to Manager component urgency:5 It is imperative that action be taken right away. labels Dec 19, 2024
@kyujin-cho kyujin-cho added this to the 24.12 milestone Dec 19, 2024
@kyujin-cho kyujin-cho changed the title feature: model service autoscaling feat: model service autoscaling Dec 19, 2024
@kyujin-cho kyujin-cho changed the title feat: model service autoscaling feat: metric based model service autoscaling Dec 19, 2024
@kyujin-cho kyujin-cho marked this pull request as ready for review December 19, 2024 17:08
@kyujin-cho kyujin-cho force-pushed the feature/model-service-autoscale branch from 2e9102e to 9bc0661 Compare December 20, 2024 12:12
kyujin-cho and others added 4 commits December 20, 2024 12:14
Co-authored-by: octodog <mu001@lablup.com>
Co-authored-by: octodog <mu001@lablup.com>
Copy link
Member

@achimnol achimnol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @yomybaby has requested, please update the GraphQL schema to use explicit Enum types for AutoScalingMetricSource and AutoScalingMetricComparator.
Also, currently the GraphQL query results are using field names, not values despite 913569e. Please fix this as well.

Copy link
Member

@achimnol achimnol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image image There are some async-database handling issues as well.

@github-actions github-actions bot added comp:client Related to Client component comp:common Related to Common component comp:cli Related to CLI component labels Jan 2, 2025
@kyujin-cho
Copy link
Member Author

Well, seems like graphene's Enum type expects enum names (not values) as a field input... :(
image

@kyujin-cho kyujin-cho requested a review from achimnol January 2, 2025 10:31
Co-authored-by: octodog <mu001@lablup.com>
@HyeockJinKim
Copy link
Collaborator

Could you review it again? @achimnol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:docs Documentations comp:agent Related to Agent component comp:appproxy Related to App Proxy component comp:cli Related to CLI component comp:client Related to Client component comp:common Related to Common component comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated size:XL 500~ LoC type:feature Add new features urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE! urgency:5 It is imperative that action be taken right away.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support auto scaling on Model Service
3 participants