DO NOT MERGE: Fixes for Starcoder2 PR #29228

younesbelkada · 2024-02-23T02:19:43Z

No description provided.

HuggingFaceDocBuilderDev · 2024-02-23T02:42:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

LGTM! 🤗

ArthurZucker · 2024-02-23T03:35:37Z

README.md

@@ -491,6 +491,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
 1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
 1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
 1. **[StableLm](https://huggingface.co/docs/transformers/model_doc/stablelm)** (from Stability AI) released with the paper [StableLM 3B 4E1T (Technical Report)](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo) by Jonathan Tow, Marco Bellagente, Dakota Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi, Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, and James Baicoianu.
+1. **[Starcoder2](https://huggingface.co/docs/transformers/main/model_doc/starcoder2)** (from <FILL INSTITUTION>) released with the paper [<FILL PAPER TITLE>](<FILL ARKIV LINK>) by <FILL AUTHORS>.


docs/source/en/_toctree.yml

docs/source/en/model_doc/starcoder2.md

ArthurZucker · 2024-02-23T03:36:08Z

docs/source/en/model_doc/starcoder2.md

+The Starcoder2 model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
+<INSERT SHORT SUMMARY HERE>
+


TODO as well

src/transformers/models/starcoder2/configuration_starcoder2.py

tests/models/starcoder2/test_modeling_starcoder2.py

ArthurZucker · 2024-02-23T03:39:30Z

src/transformers/models/starcoder2/modeling_starcoder2.py

+
+from ...activations import ACT2FN
+from ...cache_utils import Cache, DynamicCache
+from ...modeling_attn_mask_utils import _prepare_4d_causal_attention_mask, _prepare_4d_causal_attention_mask_for_sdpa


For a followup-pr: we should not use these attention function and should rely on Llama

ArthurZucker · 2024-02-23T03:42:45Z

src/transformers/models/starcoder2/modeling_starcoder2.py

+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+
+        # retrieve input_ids and inputs_embeds
+        if input_ids is not None and inputs_embeds is not None:


TODO as well when propagating the changes for static kv cache cc @zucchini-nlp and @gante for your TODOs!

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

jlamypoirier and others added 13 commits January 10, 2024 15:20

Copy model

81bcfbd

changes

4f2df8e

misc

5b88238

fixes

e0ec999

add embed and residual dropout (#30)

4983a75

Merge branch 'hf_main' into starcoder2

65f9c26

misc

7fac7d8

remove rms norm and gated MLP

5e5b567

remove copied mentions where its not a copy anymore

e5dde45

remove unused _shape

6522dc7

copied from mistral instead

a9b3946

fix copies

9bd16a3

Merge branch 'main' into starcoder-2-fix

5b18420

younesbelkada added 4 commits February 23, 2024 02:52

fix copies

933964b

add not doctested

e3ec0e0

fix

71c970d

fix copyright

a6e0992

ArthurZucker approved these changes Feb 23, 2024

View reviewed changes

younesbelkada and others added 8 commits February 23, 2024 04:43

Update docs/source/en/model_doc/starcoder2.md

1b666c8

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/starcoder2/configuration_starcoder2.py

8b22a7f

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/starcoder2/configuration_starcoder2.py

d71b22d

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

fix doc

6e34ee9

revert some changes

d9b887f

add fa2 tests

a5f6694

fix styling nit

f5498cc

fix

8a6e496

ArthurZucker mentioned this pull request Feb 23, 2024

Starcoder2 model - bis #29215

Merged

push dummy docs

22f5796

younesbelkada closed this Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE: Fixes for Starcoder2 PR #29228

DO NOT MERGE: Fixes for Starcoder2 PR #29228

younesbelkada commented Feb 23, 2024

HuggingFaceDocBuilderDev commented Feb 23, 2024

ArthurZucker left a comment

ArthurZucker Feb 23, 2024

ArthurZucker Feb 23, 2024

ArthurZucker Feb 23, 2024

ArthurZucker Feb 23, 2024

		The Starcoder2 model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
		<INSERT SHORT SUMMARY HERE>

DO NOT MERGE: Fixes for Starcoder2 PR #29228

DO NOT MERGE: Fixes for Starcoder2 PR #29228

Conversation

younesbelkada commented Feb 23, 2024

HuggingFaceDocBuilderDev commented Feb 23, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Feb 23, 2024

Choose a reason for hiding this comment

ArthurZucker Feb 23, 2024

Choose a reason for hiding this comment

ArthurZucker Feb 23, 2024

Choose a reason for hiding this comment

ArthurZucker Feb 23, 2024

Choose a reason for hiding this comment