[gemini] fix ci #5748

botbw · 2024-05-23T14:14:08Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

* add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

…ch#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

…hpcaitech#5159) * [inference/nfc] remove outdated inference tests * remove outdated kernel tests * remove deprecated triton kernels * remove imports from deprecated kernels

) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct

* [Inference] Add KVCache Manager * function refactored * add test for KVCache Manager * add attr beam width * Revise alloc func in CacheManager * Fix docs and pytests * add tp slicing for head number * optimize shapes of tensors used as physical cache * Apply using InferenceConfig on KVCacheManager * rm duplicate config file * Optimize cache allocation: use contiguous cache * Fix config in pytest (and config)

* unify the config setting * fix test * fix import * fix test * fix * fix * add logger * revise log info --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

* add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt

* add logit processor and request handler * add * add * add * fix * add search tokens and update func * finish request handler * add running list test * fix test * fix some bug * add * add * fix bugs * fix some bugs * fix bug * fix * fix * add copy fun * del useless attn * fix request status --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

…ng (hpcaitech#5192) * add context attn unpadded triton kernel * test compatibility * kv cache copy (testing) * fix k/v cache copy * fix kv cache copy and test * fix boundary of block ptrs * add support for GQA/MQA and testing * fix import statement --------- Co-authored-by: Round Heng <yuanhengzhao@Rounds-MacBook-Pro.local>

…h#5219) * add attn * add attention test * fix attn forward * fix decoding

…el (hpcaitech#5229) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs

* fix bugs * comment * use more accurate atol * fix

…tech#5249) * add flash decoding unpad triton kernel * rename flash decoding kernel * add kernel testing (draft) * revise pytest * support kv group (GQA) * (trivial) fix api and pytest * (trivial) func renaming * (trivial) func/file renaming * refactor pytest for attention * (trivial) format and consistent vars of context/decode attn * (trivial) remove test redundancy

* add kv copy triton kernel during decoding stage * add pytest and fix kernel * fix test utilities * revise kernel config * add benchmark for kvcache copy

* [fix] revise timeout value on example CI * trivial

* Fix Llama3 Load error * Omit Checkpoint IO Temporarily

* fix api server * fix generation config * fix api server * fix comments * fix infer hanging bug * resolve comments, change backend to free port

* [example] update inference example

* fix * fix * fix * fix * fix * remove kernel intall * rebase revert fix * fix * fix

) * refactor and add * config default values * fix gen config passing * fix rpc generation config

…5730) * [fix] auto policy error message * trivial

* [doc] update inference readme * add contents * trivial

…n) (hpcaitech#5702) * [pre-commit.ci] auto fixes from pre-commit.com hooks * add parallel cross entropy output for falcon model & fix some typos in bloom.py * fix module name error, self.model -> self.transformers in bloom, falcon model * Fix the overflow bug of distributed cross entropy loss function when training with fp16 * add dtype to parallel cross entropy loss function * fix dtype related typos adn prettify the loss.py * fix grad dtype and update dtype mismatch error * fix typo bugs

[sync] Sync feature/colossal-infer with main

for more information, see https://pre-commit.ci

* temporary fix for CI * timeout to 90

…hpcaitech/feature/colossal-infer [Inference] Merge feature/colossal-infer

* [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release

CjhHa1 and others added 30 commits January 11, 2024 13:39

[Inference] First PR for rebuild colossal-infer (hpcaitech#5143)

4cf4682

* add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

[Inference] Add readme (roadmap) and fulfill request handler (hpcaite…

56e75ee

…ch#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

[Inference/NFC] Clean outdated inference tests and deprecated kernels (…

2bb9224

…hpcaitech#5159) * [inference/nfc] remove outdated inference tests * remove outdated kernel tests * remove deprecated triton kernels * remove imports from deprecated kernels

[Inference]Update inference config and fix test (hpcaitech#5178)

93aeacc

* unify the config setting * fix test * fix import * fix test * fix * fix * add logger * revise log info --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

Add padding llama model

86853a3

Fixed a bug in the inference frame

62fd08e

fix bugs in request_handler

6296858

precision alignment

9489dc6

Fixed a writing error

4df8876

add context_attention_unpadded

02c1bf8

fix bugs in sampler

bbfebfb

Fixed a typo

b2eb9cd

fix beam_width

3ad1f3b

[Inference] Pytorch Attention func, pad&nopad input support (hpcaitec…

bfd9b1b

…h#5219) * add attn * add attention test * fix attn forward * fix decoding

fix bugs in attention.py and request_handler.py

47e53ea

adapted to pad_context_forward

fa4fbdb

[Hotfix] Fix accuracy and align attention method api with Triton kern…

e545a87

…el (hpcaitech#5229) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs

fix bugs related to processing padding mask

2a73e82

fix CI bugs

fab294c

rm torch.cuda.synchronize

10e3c9f

fix bugs in request_handler.py and engine.py

d40eb26

[Inference] Kernel: no pad rotary embedding (hpcaitech#5252)

fded91d

* fix bugs * comment * use more accurate atol * fix

[git] fixed rebased files

1ded7e8

[kernel] Add KV cache copy kernel during decoding (hpcaitech#5261)

fa85e02

* add kv copy triton kernel during decoding stage * add pytest and fix kernel * fix test utilities * revise kernel config * add benchmark for kvcache copy

Courtesy-Xs and others added 24 commits May 14, 2024 14:35

[Inference] Delete duplicated copy_vector (hpcaitech#5716)

121d7ad

[ci] Fix example tests (hpcaitech#5714)

5bbab15

* [fix] revise timeout value on example CI * trivial

[Fix] Llama3 Load/Omit CheckpointIO Temporarily (hpcaitech#5717)

74c4792

* Fix Llama3 Load error * Omit Checkpoint IO Temporarily

[Inference] Fix API server, test and example (hpcaitech#5712)

f47f2fb

* fix api server * fix generation config * fix api server * fix comments * fix infer hanging bug * resolve comments, change backend to free port

【Inference] Delete duplicated package (hpcaitech#5723)

a8d459f

[example] Update Inference Example (hpcaitech#5725)

8bcfe36

* [example] update inference example

[lazy] fix lazy cls init (hpcaitech#5720)

9d83c6d

* fix * fix * fix * fix * fix * remove kernel intall * rebase revert fix * fix * fix

[Inference] Fix Inference Generation Config and Sampling (hpcaitech#5710

283c407

) * refactor and add * config default values * fix gen config passing * fix rpc generation config

[Fix/Inference] Add unsupported auto-policy error message (hpcaitech#…

bdf9a00

…5730) * [fix] auto policy error message * trivial

[doc] Update Inference Readme (hpcaitech#5736)

d8b1ea4

* [doc] update inference readme * add contents * trivial

[sync] Sync feature/colossal-infer with main

8633c15

Merge pull request hpcaitech#5737 from yuanheng-zhao/inference/sync/main

c06208e

[sync] Sync feature/colossal-infer with main

[bug] fix silly bug

89c3aee

[pre-commit.ci] auto fixes from pre-commit.com hooks

67f1fdf

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

08efb4c

for more information, see https://pre-commit.ci

[chore] add test for prefetch

3043d15

[pre-commit.ci] auto fixes from pre-commit.com hooks

ba21f26

for more information, see https://pre-commit.ci

[ci] Temporary fix for build on pr (hpcaitech#5741)

c2c8c9c

* temporary fix for CI * timeout to 90

[NFC] Fix code factors on inference triton kernels (hpcaitech#5743)

bd38fe6

[NFC] fix requirements (hpcaitech#5744)

498f42c

[Colossal-Inference] (v0.1.0) Merge pull request hpcaitech#5739 from …

df67476

…hpcaitech/feature/colossal-infer [Inference] Merge feature/colossal-infer

[inference] release (hpcaitech#5747)

4647ec2

* [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release

Merge remote-tracking branch 'origin/main' into prefetch

923b4fb

botbw requested a review from a team as a code owner May 23, 2024 14:14

botbw requested a review from Hz188 May 23, 2024 14:16

botbw enabled auto-merge (squash) May 23, 2024 14:25

botbw self-assigned this May 23, 2024

Hz188 approved these changes May 23, 2024

View reviewed changes

botbw merged commit d211820 into hpcaitech:feature/prefetch May 23, 2024
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gemini] fix ci #5748

[gemini] fix ci #5748

botbw commented May 23, 2024

[gemini] fix ci #5748

[gemini] fix ci #5748

Conversation

botbw commented May 23, 2024

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?