[Devices] Offer support for hardware-accelerated inference in Firecracker #1179

raduweiss · 2019-07-15T12:53:23Z

Doing hardware-accelerated inference in a serverless environment is compelling use case.

However, adding straight up GPU passthrough means that microVM can't oversubscribe memory, and we need to add PCI emulation to Firecracker, which comes with a lot of extra complexity/attack surface.

The first step here will be to research the options and alternatives (e.g., GPU passthrough, or something else), and figure out the path forward.

Related issues: #849, #776.

nlauchande · 2019-07-15T16:11:48Z

I am very interested on this usecase.

richardliaw · 2020-03-19T20:37:10Z

+1, very interested in this use case. Any update on this? (I understand it's still in the research phase)

zaharidichev · 2020-11-18T14:24:00Z

@raduweiss is this something that anyone is working on atm? Is it still on the roadmap?

ananos · 2020-11-18T15:40:44Z

Hi @zaharidichev,

we have some thoughts on this [1], shared them earlier this year in the slack wοrkspace [2], but a chat is still pending I'm afraid. We have a rough proof-of-concept implementation on firecracker, based on the design principles of [1], which exhibits negligible overhead for image inference (jetson-inference backend, using tensorRT, tested on an NVIDIA jetson nano & a generic x86_64 machine with an RTX 2060 SUPER & another machine with a T4). We should be able to open-source the whole stack pretty soon. Feel free to drop us a line if you're interested in our early PoC.

Essentially, the idea is that we abstract away the hardware-specific operations via a slim runtime library/system, that supports any kind of backend (ranging from a simple CUDA/OpenCL function to a TensorFlow operation/app). Combined with a simple virtio frontend/backend implementation we are able to forward operations from a guest to the host/monitor, which in turn executes the actual "acceleratable" function on the hardware accelerator.

Another option (if latency is not critical to you) could be to use rCUDA, which we plan to try but haven't had the time yet...

BTW, @raduweiss we should plan to have that chat [2] at some point -- give us a shout when you are available!

cheers,
Tassos

[1] https://blog.cloudkernels.net/posts/vaccel/
[2] https://firecracker-microvm.slack.com/archives/CDL3FUR8B/p1591093992140800

raduweiss · 2020-11-18T16:50:02Z

@ananos , yeah our bad, we totally dropped the ball here. Our apologies! I'll reply directly so we can talk.

ananos · 2020-12-04T16:28:08Z

Hi @zaharidichev, all

just wanted to share our blog post about our approach on the above: https://blog.cloudkernels.net/posts/vaccel_v2/

using nvidia-container-runtime & a docker image we've put together, you are able to run the jetson-inference image classification example from a Firecracker VM. You can find more info in the above post or @ https://vaccel.org. Of course, you can ping us, we will be more than happy to share how to try out vAccel on Firecracker.

cheers,
Tassos

amrragab8080 · 2021-03-24T19:52:30Z

Any update on the GPU support in Firecracker?

raduweiss · 2021-03-31T08:20:48Z

We’ve been thinking about / experimenting in this space in the last months, and we'll keep at it this year, but there’s no ETA for this feature right now. For maximum utility in a serverless platform paradigm [a], a single GPU hardware resource needs to be safely used by multiple microVMs, without trading off the other capabilities that Firecracker users like (e.g., CPU/memory oversubscription, fast snapshot-restore, or high mutation rate of the host’s microVMs). This is a pretty complex problem, and we’re still exploring our options.

As with the other larger features, as we approach what we think is a good design here, we'll post some form of RFC to get community feedabck.

We’d be happy to hear of any use cases to so we can factor them in – feel free to update this thread, or share them directly on our Slack [b]!

[a] https://github.com/firecracker-microvm/firecracker/blob/master/CHARTER.md
[b] firecracker-microvm Slack workspace link

pdames · 2021-05-26T00:53:02Z

Any updates? My team is interested in running Ray on Firecracker, but the current lack of GPU support would erode the value of doing so.

raduweiss · 2021-07-02T06:52:01Z

Any updates? My team is interested in running Ray on Firecracker, but the current lack of GPU support would erode the value of doing so.

Sorry for not getting back here sooner, we were still working through our options. We've settled on implementing plain PCIe GPU passthrough, which comes at the cost of requiring micoVMs to start with the their full memory mapped, will probably negate the advantages of using snapshot-restore, and requires the full GPU to be attached to a microVM - all things we wanted to see if we could improve upon, but we didn't find way that upholds all our tenets.

We will want to get broad feedback from the community here on how to actually present this as a feature (we'll start a discussion in the following weeks). Given the trade-offs above, we will consider building a separate Firecracker mode or Firecracker variant, or something along those lines.

zvonkok · 2022-05-09T10:25:52Z

@raduweiss I am leading the enablement of GPUs and other NV accelerators on Kata containers. I was trying to use the Slack Invite in the README.md but it is invalid.

What would be the best way to get into the loop on the PCIe implementation in firecracker? I fixed and I'm currently fixing several other issues (BAR sizes, MDEV support, ...) in Kata's PCIe (QEMU) implementation.

Would be nice if I could get hands-on with some pre-released artifacts to start testing on our side.

raduweiss · 2022-05-25T13:30:33Z

Hi @zvonkok . We've re-prioritized our roadmap, and for 2022 we're not pursuing the Firecracker PCIe implementation / GPU passthrough work anymore.

DemiMarie · 2022-11-04T01:16:47Z

@raduweiss: what would be needed for a “good” solution? Could https://libvf.io be helpful?

mmcclean-aws · 2023-05-27T15:25:58Z

Any plans to support Inferentia and Trainium based instances ? They expose the accelerators via PCI to the OS but I see PCI support is not planned for firecracker. See docs for more details on the devices exposed.

kalyazin · 2023-06-09T22:58:15Z

Hi @mmcclean-aws . Like discussed offline, an immediate obstacle for supporting Inferentia and Trainium instances is that they are virtualised (as opposed to bare metal), so Firecracker can't run on them, because AWS doesn't support nested virtualisation. Besides that, since Inf2 has 12 accelerators, and each accelerator can only be used in a single-tenant manner, the instance can carry up to 12 microVMs at the same time, which does not allow to extract oversubscription that is a key Firecracker's benefit. The only potential benefit (if/when bare metal Inf2* instances are available) could be shorter VM startup time if an instance needs to be partitioned dynamically.

kalyazin · 2023-06-13T21:39:12Z

Hi @peterdelevoryas . What is your specific motivation for moving off Qemu? Is that merely because of the Rust safety features? Firecracker was developed with CPU workloads in mind, and design decisions have been often driven by that (eg using MMIO virtio transport vs PCI). Being a live product, we may find it possible to reconsider those if sufficiently compelling reasons for doing so arise.

peterdelevoryas · 2023-06-14T01:02:42Z

Hi @peterdelevoryas . What is your specific motivation for moving off Qemu? Is that merely because of the Rust safety features? Firecracker was developed with CPU workloads in mind, and design decisions have been often driven by that (eg using MMIO virtio transport vs PCI). Being a live product, we may find it possible to reconsider those if sufficiently compelling reasons for doing so arise.

I don’t have any super strong reasons to migrate off QEMU, I just like the idea of something stripped down and written in Rust, and the fact it’s completely open source, free, and run in production for real aws workloads. I just don’t want to live with QEMU forever, even if just for the fact that I don’t enjoy mailing list development.

Edit: I noticed cloud-hypervisor, and realize that resolves this for me! nvm. I actually agree, firecracker should keep doing non-passthrough stuff, cloud-hypervisor makes more sense for passthrough use cases unless you can manage to integrate PCI passthrough into the microvm environment somehow.

DemiMarie · 2023-08-07T19:57:57Z

A few comments:

https://libvf.io provides support for GPU virtualization. However, multi-tenant GPU virtualization requires trusting the proprietary vendor hardware & firmware to do its job. @raduweiss: Does Amazon consider this sufficent protection?
With PCI passthrough, Amazon can avoid most of the security risks by using a custom board design where the GPU’s SPI flash is write-protected by hardware the GPU has no control of. Passthrough with stock hardware is much riskier.
Memory oversubscription is possible by emulating a nested IOMMU. I’m not sure if Firecracker’s developers are interested in doing so given the performance penalties.

mmcclean-aws · 2023-08-07T20:30:29Z

Thanks. Does that mean that PCI passthrough should work for alternative devices (e.g. Trainium and Inferentia) that expose themselves in /dev ?

DemiMarie · 2023-08-07T21:32:56Z

It should work for almost any PCI device. Whether it is secure is another matter. That depends entirely on choosing a safe device and your ability to prevent early boot DMA attacks and unintended persistence via e.g. on-device flash storage.

jayavanth · 2023-11-20T17:26:55Z

Any updates on GPU support in your roadmap for 2023/2024?

xmarcalx · 2023-12-07T20:48:25Z

Hi @jayavanth ,

Thanks for your question.
No we are not planning any GPU support in Firecracker at the moment.
Once we will consider again this task we will add in our GitHub roadmap, which we are in the following weeks going to update and bring up to speed soon.

fighterhit · 2024-01-02T03:47:42Z

Hi @jayavanth ,

Thanks for your question. No we are not planning any GPU support in Firecracker at the moment. Once we will consider again this task we will add in our GitHub roadmap, which we are in the following weeks we are going to update and bring up to speed soon.

Hi @xmarcalx , in the current era of rapid AI development, GPU support is very important. I hope the team can seriously consider this feature. Thanks!

Talador12 · 2024-08-20T17:33:18Z

At some point, it was decided that PCI passthrough would be acceptable to the tenets of firecracker
#1179 (comment)

This could actually work well with Nvidia's MIG slice paradigm. You would still have to passthrough a physical slice of the GPU, but you could guarantee that slice to be isolated within that allocation
https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#virtualization

I do recommend supporting AMD GPUs and tensor cores as well, but those can come later. Start with Nvidia.

How could we get started on this effort? Open to open source contributors?

DemiMarie · 2024-08-20T19:57:51Z

The best way I know to scalably oversubscribe a GPU is virtio-GPU native contexts, which is what Qubes OS will be using for GPU virtualization in the future. Native contexts expose a subset of the host kernel driver API, which means that the number of clients is limited by available memory rather than by GPU hardware or firmware. Native contexts neither use nor rely on hardware or firmware GPU virtualization suppot, so they work with any GPU that has the needed support in Mesa and virglrenderer. Proprietary drivers (such as Nvidia) are not supported.

On AMD GPUs, process isolation mode is required for security.

xmarcalx · 2024-09-01T17:14:11Z

Dear @Talador12 et all,

Sorry for the late reply, and thank you very much for your interest in Firecracker and specifically into this feature.
GPUs support is a long due feature request coming from the community. We are always interested into learn more about potential use-case which justify GPU support for Firecracker workloads to make sure that it is the right thing for the product and the application.
As per previous message, we were not planning to support GPUs in Firecracker but we are more than happy to support the community in this effort, particularly around:

building a plan for the community to develop a first PoC which include PCIe and GPU support into Firecracker microVMs
analyze at each milestone of the project performance KPI and security stance of Firecracker microVMs
and work together to define a plan on how to deliver these milestone into Firecracker releases.

If you are onboard, we should set up an introductory call within the next couple of weeks with you and any other interested users to understand the initial requirements and usecase and layout the next steps.
We will announce the schedule of this meeting in our slack channel and in our GitHub discussions section.
I look forward to hearing from you soon.

Kind Regards,
Marco

Talador12 · 2024-09-16T14:32:30Z

@xmarcalx Hey Marco,
For the meeting, September 30th or October 9th work for us. Which would you prefer?
Also - we should include interested users in this thread/slack as well.

xmarcalx · 2024-09-24T06:53:51Z

Hi @Talador12 ,

sure, let's organize for the 9th October.
From which timezone are the participants?
Such that we can try to organize a time friendly meeting for at least the majority of people.

For any interested folk in the matter, i will post a link to the meeting who will be free for everyone to join in this issue and in the slack thread in our community https://firecracker-microvm.slack.com/archives/CDL3FUR8B/p1724175189998039

xmarcalx · 2024-09-25T18:02:50Z

Hi everyone,

we set up the meeting for the 9th October, from 18:00 BST to 19:00 BST to:

dive deep on motivation, technical requirements of this request
discuss the next steps we can plan with the community to move forward with this ask.

The meeting details are following:

You have been invited to an online meeting, powered by Amazon Chime.
Click to join the meeting: https://chime.aws/7995191427
Meeting ID: 7995 19 1427
A headset is recommended or you may use your computer’s microphone and speakers.
Call in using your phone:
United States Toll-Free (1): +1 855-552-4463
Meeting ID: 7995 19 1427
One-click Mobile Dial-in (United States Toll-Free (1)): +1 855-552-4463,,,7995191427#
United Kingdom Toll-Free (1): +44 800 085 5175
International: https://chime.aws/dialinnumbers/
Dial-in attendees must enter *7 to mute or unmute themselves.
To connect from an in-room video system, use one of the following Amazon Chime bridges:
SIP video system: 7995191427@meet.chime.in or meet.chime.in
H.323 system: 13.248.147.139 or 76.223.18.152
If prompted enter the Meeting PIN: 7995191427#
Download Amazon Chime at https://aws.amazon.com/chime/download
For information about creating an Amazon Chime account, see https://aws.amazon.com/chime/getting-started

See you there!

Talador12 · 2024-09-26T16:40:46Z

We will see you there! Looking forward to it :)

xmarcalx · 2024-10-09T23:04:06Z

we decided to track the notes of the meeting in this discussion #4845. The discussion contains also the link to the meeting we will use to sync every 4 weeks.

raduweiss added Feature: Emulation Roadmap: Tracked Items tracked on the roadmap project. labels Jul 15, 2019

This was referenced Jul 15, 2019

GPU Support #849

Closed

Virtio-vfio：Should we support virtio-vfio in the nearly future？ #776

Closed

raduweiss mentioned this issue Jul 15, 2019

[RFC] 2020 Roadmap #1104

Closed

raduweiss changed the title ~~Machine Learning Acceleration~~ Offer support for hardware-accelerated inference in Firecracker Sep 18, 2020

raduweiss changed the title ~~Offer support for hardware-accelerated inference in Firecracker~~ [Devices] Offer support for hardware-accelerated inference in Firecracker Sep 18, 2020

sandreim mentioned this issue Mar 8, 2021

vGPU? #2487

Closed

dianpopa assigned alsrdn Oct 4, 2021

0x2b3bfa0 mentioned this issue Nov 24, 2021

Standardize on container images instead of machine images iterative/terraform-provider-iterative#146

Open

mrgleeco mentioned this issue Jan 5, 2022

GPU support? weaveworks/ignite#890

Open

JonathanWoollett-Light added Type: Enhancement Indicates new feature requests and removed Feature: Emulation labels Mar 23, 2023

anthonycorletti mentioned this issue May 3, 2023

[FEATURE] 👾 gpu support anthonycorletti/hotbox#6

Open

2 tasks

xmarcalx unassigned alsrdn Sep 26, 2023

xmarcalx removed the Roadmap: Tracked Items tracked on the roadmap project. label Dec 7, 2023

ShadowCurse added the Status: Parked Indicates that an issues or pull request will be revisited later label Apr 15, 2024

ShadowCurse added this to Firecracker Roadmap Oct 16, 2024

thundergolfer mentioned this issue Nov 6, 2024

pytorch support NVIDIA/cuda-checkpoint#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Devices] Offer support for hardware-accelerated inference in Firecracker #1179

[Devices] Offer support for hardware-accelerated inference in Firecracker #1179

raduweiss commented Jul 15, 2019

nlauchande commented Jul 15, 2019

richardliaw commented Mar 19, 2020

zaharidichev commented Nov 18, 2020

ananos commented Nov 18, 2020

raduweiss commented Nov 18, 2020

ananos commented Dec 4, 2020

amrragab8080 commented Mar 24, 2021

raduweiss commented Mar 31, 2021 •

edited

Loading

pdames commented May 26, 2021

raduweiss commented Jul 2, 2021

zvonkok commented May 9, 2022

raduweiss commented May 25, 2022

DemiMarie commented Nov 4, 2022

mmcclean-aws commented May 27, 2023 •

edited

Loading

kalyazin commented Jun 9, 2023

kalyazin commented Jun 13, 2023

peterdelevoryas commented Jun 14, 2023 •

edited

Loading

DemiMarie commented Aug 7, 2023

mmcclean-aws commented Aug 7, 2023

DemiMarie commented Aug 7, 2023

jayavanth commented Nov 20, 2023

xmarcalx commented Dec 7, 2023 •

edited

Loading

fighterhit commented Jan 2, 2024

Talador12 commented Aug 20, 2024 •

edited

Loading

DemiMarie commented Aug 20, 2024

xmarcalx commented Sep 1, 2024

Talador12 commented Sep 16, 2024 •

edited

Loading

xmarcalx commented Sep 24, 2024 •

edited

Loading

xmarcalx commented Sep 25, 2024

Talador12 commented Sep 26, 2024

xmarcalx commented Oct 9, 2024 •

edited

Loading

[Devices] Offer support for hardware-accelerated inference in Firecracker #1179

[Devices] Offer support for hardware-accelerated inference in Firecracker #1179

Comments

raduweiss commented Jul 15, 2019

nlauchande commented Jul 15, 2019

richardliaw commented Mar 19, 2020

zaharidichev commented Nov 18, 2020

ananos commented Nov 18, 2020

raduweiss commented Nov 18, 2020

ananos commented Dec 4, 2020

amrragab8080 commented Mar 24, 2021

raduweiss commented Mar 31, 2021 • edited Loading

pdames commented May 26, 2021

raduweiss commented Jul 2, 2021

zvonkok commented May 9, 2022

raduweiss commented May 25, 2022

DemiMarie commented Nov 4, 2022

mmcclean-aws commented May 27, 2023 • edited Loading

kalyazin commented Jun 9, 2023

kalyazin commented Jun 13, 2023

peterdelevoryas commented Jun 14, 2023 • edited Loading

DemiMarie commented Aug 7, 2023

mmcclean-aws commented Aug 7, 2023

DemiMarie commented Aug 7, 2023

jayavanth commented Nov 20, 2023

xmarcalx commented Dec 7, 2023 • edited Loading

fighterhit commented Jan 2, 2024

Talador12 commented Aug 20, 2024 • edited Loading

DemiMarie commented Aug 20, 2024

xmarcalx commented Sep 1, 2024

Talador12 commented Sep 16, 2024 • edited Loading

xmarcalx commented Sep 24, 2024 • edited Loading

xmarcalx commented Sep 25, 2024

Talador12 commented Sep 26, 2024

xmarcalx commented Oct 9, 2024 • edited Loading

raduweiss commented Mar 31, 2021 •

edited

Loading

mmcclean-aws commented May 27, 2023 •

edited

Loading

peterdelevoryas commented Jun 14, 2023 •

edited

Loading

xmarcalx commented Dec 7, 2023 •

edited

Loading

Talador12 commented Aug 20, 2024 •

edited

Loading

Talador12 commented Sep 16, 2024 •

edited

Loading

xmarcalx commented Sep 24, 2024 •

edited

Loading

xmarcalx commented Oct 9, 2024 •

edited

Loading