-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Architecture for Jan to support multiple Inference Engines #771
Comments
|
Nitro is still an intermediary server to |
@vuonghoainam Re supporting Claude/TGI, aren't there differences between their interfaces (vs. OAI)? Does this mean we will need separate inference engine extensions for each of them? I can also see how it would not be a huge amount of work, just some Javascript glueing. |
Yes, @dan-jan it's correct that we add the support for TGI and claude like that. Some decisions:
|
@vuonghoainam This looks quite clear. I'll create a task in the epic to track documenting this in Specs. |
Moving to #1271 |
Objective
Solution
I envision an architecture in Jan that has the following:
Models Extension
/models
API endpointInference Extension
/chat/completions
, later/audio/speech
)model.json
)Extension for each Inference Engine
/chat/completions
endpointExample
File Tree
model.json
gpt4-32k-1603engine.json
example for NitroExecution Path
llama2-70b-intel-bigdl
Inference Extension
loads themodel.json
forllama2-70b-intel-bigdl
and sees engine isintel-bigdl
Inference Extension
routes it tointel-bigdl
Inference Engine Extensionintel-bigdl
Inference Engine Extension takes in/chat/completions
request, runs inference, and returns result through SSEThe text was updated successfully, but these errors were encountered: