-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: optimized huggingface model support #354
FEAT: optimized huggingface model support #354
Conversation
…to be added later this week)
…nal categories and used semantic categories for harm assessment
…/script for testing and usage.
Hi @romanlutz , I wanted to get your thoughts on the following suggestion: separating concerns by moving all model-related management (such as loading with different precisions, quantization, unloading, and saving models) into a dedicated utility submodule e.g. ( Key Benefits of This Approach:
I would love to hear your thoughts on whether this aligns with our scope and if there are any additional considerations to keep in mind. In the meantime, I will work on refactoring the message parsing logic and adding support for cloud-based HuggingFace models. |
Absolutely! In fact, you've thought about it a lot more thoroughly than I had so far 🙂 I guess I'd do this in chunks rather than one massive PR but I'm open to options. Wdyt? @rdheekonda and @rlundeen2 have faced issues with the old target and may have thoughts, too. I'm on the fence about creating a new module for this or just a folder (sub module?) within targets or common. I guess it doesn't really matter and users shouldn't have to interact with it so we should be able to move at a later point if necessary. |
Hi Roman, Thanks for the feedback! I agree with breaking this into smaller chunks rather than one big PR. |
…: add HuggingFaceEndpointTarget setup
Hello @romanlutz, I made some quick changes by adding The local version of Feedback and ideas are always welcome! Thanks for your patience :) |
Yes, whatever makes sense. A lot of internal methods there may be "legacy" and can be updated. @rdheekonda mentioned that we had issues with huge delays while pulling the models from HF and running locally. Even small models took a while. We don't want it to be prohibitively difficult to run the notebooks, for example, so that's something that needs solving.... For tests, that part could probably be mocked so that there's no network call. |
Hello @romanlutz , Here is the workflow I will follow for now: 1. Profile the Model Loading WorkflowI will use
2. Experiment with Class-Level CachingTo optimize the model loading process, I will implement class-level caching. This caching mechanism will store the model and tokenizer in memory after the first load, hopefully allowing subsequent instances of 3. Extend and Enhance Test CoverageWhile profiling will give us performance data, I plan as you suggested to enhance the test coverage to ensure robustness and proper error handling. This includes:
4. Optimize Downloads with huggingface-cliUsing Let me know if there are any specific areas you’d like me to focus on. |
All these make sense! Good point about caching. If the download is a one-time operation of a few minutes it's perhaps fine. If it takes 10 minutes every time that's not great 😄 |
… Update argument type in hugging_face_enpoint_target.py- Modify examples in use_hugginface_chat_target_endpoint.py- Update .env_example- Add new notebook use_hugginface_chat_target_endpoint.ipynb
Hello @romanlutz, You can run the notebook My Observations:
Note: The 0.66 seconds download time indicates that specific files were downloaded, but I may not have provided all the necessary files. With all required files specified, the entire process could potentially be reduced to under 1 minute. To optimize the process, ensure to provide the There may still be additional bottlenecks, and the code is a bit messy :), but the main problem with downloading should be solved by running as admin. Performance Data Snippet:
|
Woah! That's a massive difference. I'll try this out and will report my findings. Might take me until tomorrow, though. My first thought was that perhaps it did cache. The difference is so huge. |
…- Cleaned up code and improved organization for model download scripts using Hugging Face Hub, aria2, and Transformers libraries.- Added pytest cases to test different model download variants, ensuring they handle caching and folder creation properly.- Updated documentation notebook to reflect the latest changes and cleaned up unnecessary content.
Hello @romanlutz , I’ve been employing different methods to download the
My best download time was Despite these diverse approaches, my experiences have been quite consistent across the board. My internet connection offers up to I am now shifting my focus to work on the inference part and will revisit the download strategies later. It would be helpful to hear about your experiences or from other team members (@rdheekonda, @rlundeen2), particularly any download speeds or feedback you could share. If you've conducted any tests or have insights into optimizing these downloads further, your input would be invaluable. For more information about optimizing download speeds using the Hugging Face CLI and enabling To calculate the download speed, i used the formula:
For example:
Best, |
Thank you for the details, @KutalVolkan . While downloading is one aspect during PyRIT demos testing, HuggingFace typically checks the default cache directory, and since we'll be using the same model for demos testing, it should already exist. However, our main focus for this feature is on speeding up inference by sending a few text prompts and observing the response in CPU based compute where we run the demos before creating a PR to ensure existing functionality is not broken with new features. Could you test with a few smaller models and check the inference time using the The models don’t have to be from Microsoft, as HuggingFace hosts models from various companies. Here’s another 135M model with a 269MB size, which I believe will have faster inference compared to phi-3-4k-instruct. Again, the goal is to optimize the notebook demo with a small yet effective instruction model so that users can experiment with different HuggingFace models. |
…erence handling and template usage
Hello @rdheekonda , Thank you for the feedback! I have tested the inference times for two smaller models using the
Both models were run on CPU, but our current implementation also supports running on GPU. Note: Increasing the I'll test few other models afterwards be cleaning up the code and considering ways to extract only the assistant's response. Please let me know if there's anything else you'd like me to test or if any further adjustments are needed from my side. |
Hello @rdheekonda, Thank you for your feedback, I really appreciate it! Please feel free to review again and suggest any new enhancements if needed. Regarding the model for the demo, Perhaps the demo is intended primarily for inference time testing. |
Thanks, @KutalVolkan, for addressing the comments. I’ve reviewed and added a few more comments. Let me know once they’re resolved, and it will be good to merge. |
Hello @rdheekonda, Thanks for the additional feedback! I've addressed the new comments. Please review and let me know if anything else is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, few things before merging
- One comment needs to be addressed.
- Merge onto main
…ingface-model-support
…using pytest's native async support
Hello Raja, I hope I have made the changes you requested. If anything is still not quite right, I might need a bit more clarification. Please let me know if any further adjustments are needed. Thank you for your support :) |
It's been addressed in the latest commit 44b9251 |
Upon running build, unit tests are failing with |
Hello Raja, The dependency issue has been addressed. Thank you for pointing it out. If anything else needs to be done, I'm happy to make further adjustments. |
I don't see any changes to pyproject.toml. Am I missing something? |
Is what I did correct? I thought i addressed it here 14f6a3f |
Hello Roman & Raja, I see the error regarding the Hugging Face token. I will handle that by tomorrow by mocking the token. |
Thanks, @KutalVolkan. Here's the reference code for how we mocked the environment variables in our pytests. |
Hello @romanlutz & @rdheekonda, I’ve updated the tests for Fingers crossed this works as expected! 😊 |
Somehow it wasn't showing before. This is fine. |
Merged the PR. Excellent work, @KutalVolkan ! |
Thank you all, for your support! |
Overview
The goal of this task is to reintroduce support for HuggingFace models within the PyRIT framework after the original
HuggingFaceChat
was removed due to long model loading time. The aim is to enable support for lightweight models that can run efficiently on a normal machine, while also providing flexibility for more complex setups (e.g., cloud-based inference for larger models).Work Completed
Reintroducing HuggingFaceChat Class:
HuggingFaceChat
class to create an optimizedHuggingFaceChatTarget
class that aligns with the currentPromptTarget
design standards.Test Suite for HuggingFaceChatTarget:
HuggingFaceChatTarget
class structure.Jupytext Integration and Notebook Creation:
.py
file that serves as both a Python script and a Jupyter notebook using Jupytext, demonstrating theHuggingFaceChatTarget
usage.Next Steps
Optimize Model Loading for Faster Execution:
Refactor and Improve Message Parsing:
Enhance Test Coverage and Mocking:
Future Support for Cloud-Based HuggingFace Models:
Action Items
HuggingFaceChatTarget
class for dynamic HuggingFace model support.Contributors and Feedback
This work is based on suggestions from @romanlutz, who outlined potential areas of improvement and future work. The actual code was reused and built upon from @rdheekonda and @rlundeen2's prior implementations.
Related Issue: #347.