Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Quantization code. #107

Closed
wants to merge 19 commits into from
Closed

Add Quantization code. #107

wants to merge 19 commits into from

Conversation

arnocandel
Copy link
Member

@arnocandel arnocandel commented May 2, 2023

Add quantization logic #88

@arnocandel arnocandel force-pushed the quantization branch 7 times, most recently from 9a811b5 to f78c78d Compare May 2, 2023 22:16
@arnocandel
Copy link
Member Author

arnocandel commented May 3, 2023

still garbage, but at least integrated
generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6.9b --quant_model=h2ogpt-oig-oasst1-512-6.9b-8bit.pt
image

@arnocandel
Copy link
Member Author

@arnocandel
Copy link
Member Author

so maybe need https://github.com/PanQiWei/AutoGPTQ

@arnocandel
Copy link
Member Author

@arnocandel
Copy link
Member Author

python quantize.py
CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6.9b --quant_model=h2ogpt-oig-oasst1-512-6.9b-4bit

@arnocandel
Copy link
Member Author

2 quantization approaches:

1. https://modal.com/docs/guide/ex/falcon_bitsandbytes

python generate.py --base_model=h2oai/h2ogpt-oasst1-falcon-40b --load_4bit=True

2. https://modal.com/docs/guide/ex/falcon_gptq

this PR (probably slower)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants