Batch inference for HF models called with an array of prompts #135

rlouf · 2023-06-06T16:01:08Z

The generation method in hf_transformers.py is decorated with outlines.vectorize. Thus when we pass, say a 2-dimensional array, we run the model twice when it would be more efficient to flatten the input array and perform batch inference. We thus need to implement a mechanism that allows to flatten the array before passing it to the decorated function.

To do this, we have a choice between implementing a new decorator (preferred) or change the behavior of outlines.vectorize.

The text was updated successfully, but these errors were encountered:

rlouf · 2023-06-20T15:03:00Z

This is being fixed in #139

rlouf added enhancement transformers Linked to the `transformers` integration text labels Jun 6, 2023

rlouf removed the text label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch inference for HF models called with an array of prompts #135

Batch inference for HF models called with an array of prompts #135

rlouf commented Jun 6, 2023

rlouf commented Jun 20, 2023

Batch inference for HF models called with an array of prompts #135

Batch inference for HF models called with an array of prompts #135

Comments

rlouf commented Jun 6, 2023

rlouf commented Jun 20, 2023