Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typo in Llama docstrings #24020

Merged
merged 3 commits into from
Jun 8, 2023
Merged

Conversation

Kh4L
Copy link
Contributor

@Kh4L Kh4L commented Jun 5, 2023

What does this PR do?

Fix typos in docs.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Have you run the doc examples locally to confirm they still work and the generated responses are still the same (with the typo fixed)?

I don't think we can pass in the inputs directly like this, as generate expects a tensor as its first argument and inputs is a dictionary

@Kh4L
Copy link
Contributor Author

Kh4L commented Jun 6, 2023

@amyeroberts yes, I ran it locally, got the error and fixed it,

here are the types in my code

type(tokenizer)
<class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>

type(inputs)
<class 'torch.Tensor'
``

@amyeroberts
Copy link
Collaborator

@Kh4L Out of interest - could you share the checkpoint being used? Could you also run this snippet with the checkpoint and share the output?

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_CHECKPOINT)
prompt = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt")
print(inputs)
print(type(inputs))
print(type(tokenizer))

The current changes need to be checked with a standard checkpoint for all the models affected here. For instance, if I run the snippet with the OPT checkpoint in the example

MODEL_CHECKPOINT = "facebook/opt-350m"

I get the following output:

{'input_ids': tensor([[    2, 13368,     6,    32,    47, 13316,   116,  2615,    47,  1067,
             7,   162,   116]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
<class 'transformers.tokenization_utils_base.BatchEncoding'>
<class 'transformers.models.gpt2.tokenization_gpt2_fast.GPT2TokenizerFast'>

@Kh4L
Copy link
Contributor Author

Kh4L commented Jun 7, 2023

@amyeroberts Checkpoint is the 7B LLama converted to HF, I get the same output!
Sorry for the confusion, I was using LlamaTokenizer and not AutoTokenizer in my code

Signed-off-by: Serge Panev <spanev@nvidia.com>
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When changing doc examples or tests, it's important to run them to make sure they still pass / the outputs are as expected.

Llama and Open Llama don't have default checkpoints, so can't be checked, but the output for the OPT model does change with the spelling fix so needs to be updated.

src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_tf_opt.py Outdated Show resolved Hide resolved
Signed-off-by: Serge Panev <spanev@nvidia.com>
@Kh4L Kh4L requested a review from amyeroberts June 7, 2023 14:30
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing these!

The style checks are currently failing - you'll need to run make style and push any changes made by the linter. Once the CI is green we can merge :)

Signed-off-by: Serge Panev <spanev@nvidia.com>
@Kh4L
Copy link
Contributor Author

Kh4L commented Jun 7, 2023

Thanks for the detailed review!
I am a bit confused as I can't see the latest commit Kh4L@62ea9f2 in this PR, even though I pushed it on my branch https://github.com/Kh4L/pytorch-transformers/tree/fix_conscious_typo 🤔

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 8, 2023

The documentation is not available anymore as the PR was closed or merged.

@amyeroberts
Copy link
Collaborator

@Kh4L Github PRs were down for part of yesterday - I think it was just that. I can see the commit now and all tests are passing :)

@amyeroberts amyeroberts merged commit 9322c24 into huggingface:main Jun 8, 2023
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
* Fix typo in Llama docstrings

Signed-off-by: Serge Panev <spanev@nvidia.com>

* Update

Signed-off-by: Serge Panev <spanev@nvidia.com>

* make style

Signed-off-by: Serge Panev <spanev@nvidia.com>

---------

Signed-off-by: Serge Panev <spanev@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants