Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] Optimize and Refactor Inference Batching/Scheduling #5367

Merged
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
0e9fc31
add kvcache manager funcs for batching
yuanheng-zhao Feb 5, 2024
9c9d199
(trivial) remove print
yuanheng-zhao Feb 5, 2024
1244546
add batch bucket for batching
yuanheng-zhao Feb 5, 2024
5ed7f1e
revise RunningList struct in handler
yuanheng-zhao Feb 6, 2024
b947131
add kvcache/batch funcs for compatibility
yuanheng-zhao Feb 7, 2024
3e411f3
use new batching methods
yuanheng-zhao Feb 7, 2024
4882943
fix indexing bugs
yuanheng-zhao Feb 7, 2024
de59c2a
(trivial) modify comments
yuanheng-zhao Feb 8, 2024
632d5df
revise abort logic
yuanheng-zhao Feb 8, 2024
ca7820f
use cpu seq lengths/block tables
yuanheng-zhao Feb 8, 2024
8ff3615
rm unused attr in Sequence
yuanheng-zhao Feb 8, 2024
734de9b
fix type conversion/default arg
yuanheng-zhao Feb 8, 2024
183d4cf
add and revise pytests
yuanheng-zhao Feb 8, 2024
0b512a3
revise pytests, rm unused tests
yuanheng-zhao Feb 8, 2024
2d7550f
rm unused statements
yuanheng-zhao Feb 8, 2024
b4d913a
fix pop finished indexing issue
yuanheng-zhao Feb 8, 2024
a5e74a5
trivial revise
yuanheng-zhao Feb 8, 2024
5a8a12b
fix: use index in batch when retrieving inputs/update seqs
yuanheng-zhao Feb 15, 2024
3494374
use dict instead of odict in batch struct
yuanheng-zhao Feb 15, 2024
a99f399
arg type hinting
yuanheng-zhao Feb 16, 2024
5323428
fix make compress
yuanheng-zhao Feb 16, 2024
7293e09
refine comments
yuanheng-zhao Feb 16, 2024
6df7714
fix: pop_n_seqs to pop the first n seqs
yuanheng-zhao Feb 16, 2024
0e70068
(trivial) type hints
yuanheng-zhao Feb 16, 2024
07a25d3
add check in request handler
yuanheng-zhao Feb 19, 2024
78cc43a
remove redundant conversion
yuanheng-zhao Feb 19, 2024
b3dcb18
fix test for request handler
yuanheng-zhao Feb 19, 2024
d2e156b
fix pop method in batch bucket
yuanheng-zhao Feb 19, 2024
e1ff72f
fix prefill adding
yuanheng-zhao Feb 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
rm unused statements
  • Loading branch information
yuanheng-zhao committed Feb 8, 2024
commit 2d7550fc6883e1ce84d618a6993b6422ff99707a
6 changes: 0 additions & 6 deletions colossalai/inference/core/engine.py
Original file line number Diff line number Diff line change
@@ -254,12 +254,6 @@ def add_request(
else:
prompt = prompts[i]

max_blocks_per_sequence = (
self.inference_config.max_input_len
+ self.inference_config.max_output_len
+ self.inference_config.block_size
- 1
) // self.inference_config.block_size
sequence = Sequence(
request_id,
prompt,