Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Qbits woq ref impl for debug #1248

Merged
merged 5 commits into from
Feb 4, 2024
Merged

Qbits woq ref impl for debug #1248

merged 5 commits into from
Feb 4, 2024

Conversation

zhewang1-intc
Copy link
Contributor

@zhewang1-intc zhewang1-intc commented Feb 2, 2024

Type of Change

feature or bug fix or documentation or others: feature
API changed or not: yes
add pack_weight info acquire interface to get some meta-data like N, K, BLKSIZE, G_IDX etc...
for usage, pls refer to the packq ut.

Description

detail description
JIRA ticket: https://jira.devtools.intel.com/browse/NLPTOOLKIU-1187

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR
woq_linear will dispatch to ref impl once QBITS_DEBUG env_var be set.

How has this PR been tested?

how to reproduce the test (including hardware information): Intel Xeon 8480+ & Intel Core i9 10900

Dependency Change?

any library dependency introduced or removed: No

Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@changwangss
Copy link
Contributor

changwangss commented Feb 4, 2024

validated PR, the results are same w./w.o debug mode
opt woq:

numactl -m 0 -C 0-55 python run_generation.py --model facebook/opt-125m --woq --benchmark --batch_size 1
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She liked to go to the movies, and she liked to go to the beach. She liked to go to the movies, and she liked to go to the']

export QBITS_DEBUG=1

['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She liked to go to the movies, and she liked to go to the beach. She liked to go to the movies, and she liked to go to the']

GPTQ opt woq

python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" --gptq_pad_max_length 128 --gptq_use_max_length --gptq_block_size 16 --woq_weight_dtype "int4_clip" --output_dir "gptqq" --benchmark --batch_size 1
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. When she was a little girl, she liked to go to the movies. She liked to go to the movies. She liked to go to the movies. She']

export QBITS_DEBUG=1

['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. When she was a little girl, she liked to go to the movies. She liked to go to the movies. She liked to go to the movies. She']

@VincyZhang VincyZhang merged commit 18d36ef into main Feb 4, 2024
15 checks passed
@VincyZhang VincyZhang deleted the qbits-enhancement branch February 4, 2024 03:38
VincyZhang pushed a commit to VincyZhang/intel-extension-for-transformers that referenced this pull request Feb 5, 2024
Co-authored-by: Lv, Liang1 <liang1.lv@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants