Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensions for LLM #442

Closed
hannespreishuber opened this issue May 11, 2024 · 3 comments
Closed

Extensions for LLM #442

hannespreishuber opened this issue May 11, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@hannespreishuber
Copy link

the idea. Write a litte app - put it on a raspberry to swith lights on or off
Would start with phi-3 (which works as .net app) extend with whisper..
Would need a feature like embeddings or functions
Had a look at semantic kernel but needs a http Rest endpoint, which makes no sense.
Any advice?

@yufenglee
Copy link
Member

I didn't have experience to run a LLM on raspberry device. How does the perf look like, for example to run a phi3 model?

@natke
Copy link
Contributor

natke commented May 21, 2024

@hannespreishuber Your app idea sounds interesting! Can you elaborate on the flow of the application a little more?

Did you try to run phi-3 on raspberry pi already?

Fyi, we are in the process of adding support for whisper.

@baijumeswani baijumeswani added the question Further information is requested label Jun 10, 2024
@baijumeswani
Copy link
Collaborator

Closing the issue. Please re-open if this is still relevant.

kunal-vaishnavi added a commit that referenced this issue Dec 11, 2024
### Description

This PR adds support for outputting the last hidden state in addition to
the logits in ONNX models. Users can run their models with ONNX Runtime
GenAI and use the generator's `GetOutput` API to obtain the hidden
states.

C/C++:
```c
std::unique_ptr<OgaTensor> embeddings = generator->GetOutput("hidden_states");
```

C#:
```csharp
using var embeddings = generator.GetOutput("hidden_states");
```

Java:
```java
Tensor embeddings = generator.getOutput("hidden_states");
```

Python:
```python
embeddings = generator.get_output("hidden_states")
```

### Motivation and Context

In SLMs and LLMs, the last hidden state represents a model's embeddings
for a particular input before the language modeling head is applied.
Generating embeddings for a model is a popular task. These embeddings
can be used for many scenarios such as text classification, sequence
labeling, information retrieval using [retrieval-augmented generation
(RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation),
and more.

This PR helps the following issues:
- microsoft/onnxruntime#20969
- #442
- #474
- #713
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants