Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gemini] fix ci #5748

Merged
merged 206 commits into from
May 23, 2024
Merged

[gemini] fix ci #5748

merged 206 commits into from
May 23, 2024

Conversation

botbw
Copy link
Contributor

@botbw botbw commented May 23, 2024

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs
  • I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

CjhHa1 and others added 30 commits January 11, 2024 13:39
* add engine and scheduler

* add dirs

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
…ch#5147)

* request handler

* add readme

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
…hpcaitech#5159)

* [inference/nfc] remove outdated inference tests

* remove outdated kernel tests

* remove deprecated triton kernels

* remove imports from deprecated kernels
)

* add infer_struct and infer_config

* update codes

* change InferConfig

* Add hf_model_config to the engine

* rm _get_hf_model_config

* update codes

* made adjustments according to the feedback from the reviewer.

* update codes

* add ci test for config and struct
* [Inference] Add KVCache Manager

* function refactored

* add test for KVCache Manager

* add attr beam width

* Revise alloc func in CacheManager

* Fix docs and pytests

* add tp slicing for head number

* optimize shapes of tensors used as physical cache

* Apply using InferenceConfig on KVCacheManager

* rm duplicate config file

* Optimize cache allocation: use contiguous cache

* Fix config in pytest (and config)
* unify the config setting

* fix test

* fix import

* fix test

* fix

* fix

* add logger

* revise log info

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* add infer_struct and infer_config

* update codes

* change InferConfig

* Add hf_model_config to the engine

* rm _get_hf_model_config

* update codes

* made adjustments according to the feedback from the reviewer.

* update codes

* add ci test for config and struct

* Add the logic of the inference engine

* update engine and test

* Recover cache_manager.py

* add logger

* fix conflict

* update codes

* update codes

* update model and tokenizer

* fix add the logic about shardformer

* change kvcache_manager docstring

* add policy

* fix ci bug in test_kvcache_manager.py

* remove codes related o tokenizer and move model_policy

* fix  code style

* add ordered_set to requirements-infer.txt

* Delete extra empty lines

* add ordered_set to requirements-test.txt
* add logit processor and request handler

* add

* add

* add

* fix

* add search tokens and update func

* finish request handler

* add running list test

* fix test

* fix some bug

* add

* add

* fix bugs

* fix some bugs

* fix bug

* fix

* fix

* add copy fun

* del useless attn

* fix request status

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
…ng (hpcaitech#5192)

* add context attn unpadded triton kernel

* test compatibility

* kv cache copy (testing)

* fix k/v cache copy

* fix kv cache copy and test

* fix boundary of block ptrs

* add support for GQA/MQA and testing

* fix import statement

---------

Co-authored-by: Round Heng <yuanhengzhao@Rounds-MacBook-Pro.local>
…h#5219)

* add attn

* add attention test

* fix attn forward

* fix decoding
…el (hpcaitech#5229)

* fix accuracy

* alignment in attention

* fix attention

* fix

* fix bugs

* fix bugs

* fix bugs
* fix bugs

* comment

* use more accurate atol

* fix
…tech#5249)

* add flash decoding unpad triton kernel

* rename flash decoding kernel

* add kernel testing (draft)

* revise pytest

* support kv group (GQA)

* (trivial) fix api and pytest

* (trivial) func renaming

* (trivial) func/file renaming

* refactor pytest for attention

* (trivial) format and consistent vars of context/decode attn

* (trivial) remove test redundancy
* add kv copy triton kernel during decoding stage

* add pytest and fix kernel

* fix test utilities

* revise kernel config

* add benchmark for kvcache copy
Courtesy-Xs and others added 24 commits May 14, 2024 14:35
* [fix] revise timeout value on example CI

* trivial
* Fix Llama3 Load error
* Omit Checkpoint IO Temporarily
* fix api server

* fix generation config

* fix api server

* fix comments

* fix infer hanging bug

* resolve comments, change backend to free port
* [example] update inference example
* fix

* fix

* fix

* fix

* fix

* remove kernel intall

* rebase

revert

fix

* fix

* fix
)

* refactor and add

* config default values

* fix gen config passing

* fix rpc generation config
* [doc] update inference readme

* add contents

* trivial
…n) (hpcaitech#5702)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

* add parallel cross entropy output for falcon model & fix some typos in bloom.py

* fix module name error, self.model -> self.transformers in bloom, falcon model

* Fix the overflow bug of distributed cross entropy loss function when training with fp16

* add dtype to parallel cross entropy loss function

* fix dtype related typos adn prettify the loss.py

* fix grad dtype and update dtype mismatch error

* fix typo bugs
[sync] Sync feature/colossal-infer with main
* temporary fix for CI

* timeout to 90
…hpcaitech/feature/colossal-infer

[Inference] Merge feature/colossal-infer
* [inference] release

* [inference] release

* [inference] release

* [inference] release

* [inference] release

* [inference] release

* [inference] release
@botbw botbw requested a review from a team as a code owner May 23, 2024 14:14
@botbw botbw requested a review from Hz188 May 23, 2024 14:16
@botbw botbw enabled auto-merge (squash) May 23, 2024 14:25
@botbw botbw self-assigned this May 23, 2024
@botbw botbw merged commit d211820 into hpcaitech:feature/prefetch May 23, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.