-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Short term accelerated compute instance #4120
Comments
@bcrawford-moj do you have a sense of resource requirements at this time. e.g. do you have an estimate on the file size of the input data being processed by the LLM. |
Hi @AntFMoJ I'm the one working with the model. Not too familiar with how the infrastructure/resource provision works, but I imagine we just need something similar to what's available on the AP in terms of CPU and RAM, but with a GPU enabled with 16GB or even 8GB VRAM. Reasoning: the entire model runs on the AP by splitting the data into smaller chunks. For the LLM/transformer component it's a fairly small model (44M parameters) which should fit comfortably on 8GB of VRAM. The inputs are only 128 tokens (~words) long, and we will be able to scale the model to the amount of VRAM available. Thanks for your help and please let me know if something doesn't make sense :) |
Based on user resource requirements we would recommend a p3.2xlarge instance. |
Draft PR created to build GPU node group |
Hi, @yznlp @bcrawford-moj Can you please confirm the name of the AMI you have been using in your testing ? Thanks |
I'm not exactly sure what the AMI name is. Is it what would appear in the dropdown on the control panel? |
Thanks for getting back to me, we will need to investigate further. |
GPU node group created and tested configuring pod to access GPU resources which worked correctly, although this required the taint and label to be removed temporarily. Next step is to resolve issue with taint/daemonset interaction. |
GPU node group and pod creation tested successfully
|
vscode deployed on the GPU-enabled node pool from the control panel dev environment. |
Summary: Drivers aren't yet available for Ubuntu 24.04, therefore we've downgraded (cut a new release of 1.2.0, the last Ubuntu 22.04 release) to NVIDIA's CUDA base image (ministryofjustice/analytical-platform-visual-studio-code#69), this deploys and is able to run Ollama with GPU capability |
Notes: With the current taints/tolerations, only one GPU enabled workload is schedulable per node, meaning each EDIT: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html |
This comment has been minimized.
This comment has been minimized.
Follow on tickets |
Thanks very much for your work on this! Could we provide the following users access please: |
@bcrawford-moj we have added the users you provided above, you should now be able to open Visual Studio Code:1.2.0-nvidia-cuda-base (GPU-Enabled) in the control panel. If you or any of the above users have any issues please let me know. Just to note, there is initially a limit on how many users can use the GPU at a time, so only one or two of your team will be able to deploy the GPU-enabled VSCode on control panel. We are have raised a story to improve on this limitation and will update you as this progresses. |
Thanks so much! Very excited to use this. Fyi we have a lot of AL over the next few weeks bc so I wouldn't expect that limitation to be an issue in the short term |
@AntFMoJ Hi currently on the AP there are two options with |
Hi @yznlp, commenting on on a closed issue is probably not the best way to ask a question, in future please use the #analytical-platform-support slack channel. That said the difference between the releases is solely to do with the pods idle time and there should be no need to move your work as it does not affect any file persistence. I would have thought that the retired version would not open, so please use the other version |
Got it thank you :) |
Describe the feature request.
Short term provisioning of an accelerated compute instance.
Describe the context.
In the BOLD programme we are producing a publication on the number of prisoners with children. We have developed a methodology involving LLMs which checks whether prison case notes imply the prisoner has a child. The output will be an Official Statistics in Development report due for publication around end of May.
We used the AP to run the LLM over the case notes but obviously it is quite slow (takes about a week to churn through them). The QA process has highlighted some changes we need to make.
Value / Purpose
This will allow us to meet our publication deadline.
We believe this will be the first time LLMs have been used in producing official statistics (and indeed one of our models was fine-tuned using generated labeled data, so we think it will also be the first time genAI has been used).
We are happy to have associated costs journaled to the BOLD programme.
User Types
Data scientists
Proposed solution:
The text was updated successfully, but these errors were encountered: