Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPTJ Support #1

Merged
merged 4 commits into from
Aug 27, 2023
Merged

Conversation

jamesdborin
Copy link

Thanks for all the work making this repo easier to extend.

Starting the process early with a gptj implementation.

Running this gives the following eval results:

4-bit:

Task Version Metric Value
wikitext 1 word_perplexity 11.1044
byte_perplexity 1.5686
bits_per_byte 0.6495

fp16:

Task Version Metric Value
wikitext 1 word_perplexity 10.8818
byte_perplexity 1.5627
bits_per_byte 0.6440

Let me know if there's anything you would like to see in terms of formatting

@casper-hansen
Copy link
Owner

Results look good to me. Just need to update the max_new_tokens_key to n_positions. This variable helps us look into the config provided with the model to update to the correct sequence length.

@jamesdborin
Copy link
Author

A note:

To get gpt-j to work I had to change the input variable in Catcher in models/base.py from inp to hidden_states. This is because the forward method of GPTJBlocks pass hidden_states by keyword but most huggingface models pass by position. This shouldn't affect the other models because they are passed in by position, which is why inp worked.

However this might cause conflict in the future if other models pass by keyword with a different name...

@casper-hansen casper-hansen merged commit 7fbe9bb into casper-hansen:main Aug 27, 2023
@casper-hansen
Copy link
Owner

Thanks @jamesdborin for the PR. I simply renamed the inputs before merging. GPTJ support added, which is nice to see.

@QiyaoWei QiyaoWei mentioned this pull request Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants