Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volcano support another GPU sharing scheme: mGPU(https://www.volcengine.com/docs/6419/145065) #2711

Closed
fjding opened this issue Feb 27, 2023 · 10 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@fjding
Copy link

fjding commented Feb 27, 2023

What would you like to be added:

Hi, In cloud-native environments, volcano has become the standard for batch scheduling, more and more cloud users use volcano as a batch scheduler in ByteDance Volcano Engine, But the GPU sharing scheme (mGPU) of Volcano Engine does not match the Volcano scheduler GPU scheme. We want to raise a PR to let the Volcano scheduler support ByteDance Volcano Engine mGPU scheduling. In this way, users of the volcano engine can directly use the mGPU scheduling capability after installing volcano scheduler.

mGPU Link: https://www.volcengine.com/docs/6419/145065

Why is this needed:

@fjding fjding added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 27, 2023
@hwdef
Copy link
Member

hwdef commented Feb 27, 2023

What work needs to be done to adapt mgpu?

Currently, Volcano's support for GPUs is already scalable. Please see if the current design can meet the demand

#2643

@fjding
Copy link
Author

fjding commented Feb 28, 2023

Thanks for you replay,the device api is well designed,I think this device api can satisfy the expansion of mgpu schedule. The implementation of mgpu is different from the nvida gpushare currently volcano supported, so I want to confirm two issues:

  1. When will the official version of this device api be released?
  2. If we extend the mgpu, we will create a directory named pkg/scheduler/api/devices/mgpu/gpushare. After our development is completed, can it be merged into the master branch of volcano and released as a GA version?

@hwdef
Copy link
Member

hwdef commented Mar 1, 2023

Thanks for you replay,the device api is well designed,I think this device api can satisfy the expansion of mgpu schedule. The implementation of mgpu is different from the nvida gpushare currently volcano supported, so I want to confirm two issues:

1. When will the official version of this device api be released?

2. If we extend the mgpu, we will create a directory named pkg/scheduler/api/devices/mgpu/gpushare. After our development is completed, can it be merged into the master branch of volcano and released as a GA version?
  1. If there is no accident, it will be three months after the release of version 1.7, that is, in April.
  2. This requires the maintainer to make a decision.

@fjding
Copy link
Author

fjding commented Mar 1, 2023

Can this issue be discussed at the weekly meeting?
Who can I contact for help?

@hwdef
Copy link
Member

hwdef commented Mar 1, 2023

@wangyang0616
Can you offer any help?

@hwdef
Copy link
Member

hwdef commented Mar 1, 2023

could you please offer an email or wechat?

@fjding
Copy link
Author

fjding commented Mar 1, 2023

could you please offer an email or wechat?

Thanks,My wechat id:fj_ding

@wangyang0616
Copy link
Member

wangyang0616 commented Mar 3, 2023

  1. If there is no accident, it will be three months after the release of version 1.7, that is, in April.

I agree with @hwdef , the community plans to release the v1.8 version in April, and will release this part of the function

  1. This requires the maintainer to make a decision.

Please provide some help, thanks! @william-wang @archlitchi

@stale
Copy link

stale bot commented Jun 10, 2023

Hello 👋 Looks like there was no activity on this issue for last 90 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2023
@stale
Copy link

stale bot commented Aug 10, 2023

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants