You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 |
| 0% 26C P8 9W / 300W | 3MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Run the above startup script, with n-gram speculative decoding enabled
Submit a query to the LLM
Observe gibberish output. The gibberish sometimes/often appears to include repetition from the prior context
Example input (to be clear, this is the entire input - there is no system prompt or chat history):
<|start_header_id|>user<|end_header_id|>
Do machine learning models need data?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Response with no speculative decoding:
Yes, machine learning models need data to learn and make predictions. In fact, data is the fuel that powers machine learning models. Here's why:
1. **Training**: Machine learning models are trained on a dataset, which is a collection of examples or instances that the model learns from. The model uses this data to learn patterns, relationships, and decision boundaries that enable it to make predictions.
2. **Learning**: During training, the model adjusts its parameters to minimize the difference between its predictions and the actual outcomes in the training data. This process is called learning, and it's essential for the model to develop its predictive capabilities.
3. **Generalization**: Once a model is trained, it's tested on a separate dataset, called the validation set, to evaluate its performance. The model's ability to generalize to new, unseen data is critical for its effectiveness in real-world applications.
4. **Fine-tuning**: Even after a model is trained, it may need to be fine-tuned on additional data to adapt to changing conditions, such as new features or shifts in the underlying distribution of the data.
Types of data that machine learning models need:
1. **Labeled data**: This is data that has been annotated or labeled with the correct output or response. Labeled data is essential for supervised learning models, such as classification and regression models.
2. **Unlabeled data**: This is data that doesn't have any labels or annotations. Unlabeled data can be used for unsupervised learning models, such as clustering and dimensionality reduction.
3. **Feature data**: This is data that contains the features or attributes that the model uses to make predictions. Feature data can be numerical, categorical, or a combination of both.
4. **Contextual data**: This is data that provides context or additional information about the data, such as metadata or external knowledge.
In summary, machine learning models need data to learn, make predictions, and adapt to changing conditions. The type and quality of data can significantly impact the performance and effectiveness of a machine learning model.
Response with 2-gram speculation:
Yes, machine learning
Machine learning models model needs data to learn and make predictions. In fact, data is the fuel that powers machine learning
Machine learning
Machine learning.
I haven't tried with this minimum reproduction example for higher than 2-gram, but with the more complex use case I was trying previously I noticed that the responses got longer and more gibberish at 3-gram.
I would also note that I haven't tried without eetq quantisation, due to VRAM constraints.
Expected behavior
As I understand it, the n-gram speculative decoding output should not differ at all from the normal output.
The text was updated successfully, but these errors were encountered:
System Info
text-generation-inference 3.1.0 (saw the same issue on 3.0.0)
nvidia-smi (EC2 VM in AWS)
Information
Tasks
Reproduction
Example input (to be clear, this is the entire input - there is no system prompt or chat history):
Response with no speculative decoding:
Response with 2-gram speculation:
I haven't tried with this minimum reproduction example for higher than 2-gram, but with the more complex use case I was trying previously I noticed that the responses got longer and more gibberish at 3-gram.
I would also note that I haven't tried without
eetq
quantisation, due to VRAM constraints.Expected behavior
As I understand it, the n-gram speculative decoding output should not differ at all from the normal output.
The text was updated successfully, but these errors were encountered: