magpie-ultra - a synthetic dataset for supervised fine-tuning using Llama 3.1 #870
Labels
code-generation
code generation models and tools like copilot and aider
dataset
public datasets and embeddings
embeddings
vector embeddings and related tools
finetuning
Tools for finetuning of LLMs e.g. SFT or RLHF
llm
Large Language Models
New-Label
Choose this option if the existing labels are insufficient to describe the content accurately
Dataset Card for magpie-ultra-v0.1
Dataset Summary
magpie-ultra
is a synthetically generated dataset for supervised fine-tuning using the new Llama 3.1 405B-Instruct model, together with other Llama models like Llama-Guard-3-8B and Meta-Llama-3.1-8B-Instruct.The dataset contains challenging instructions and responses for a wide variety of tasks, such as Coding & debugging, Math, Data analysis, Creative Writing, advice seeking, or Brainstorming.
Explore the dataset in Argilla.
Magpie Pipeline
As the name of the dataset indicates, we used Magpie recipe to generate the instruction-response pairs:
The main difference with respect to the original Magpie release is that we used the new family of models Llama 3.1, and that we substantially generated less instruction-response pairs for this first iteration: 50K vs 1M rows. The Magpie pipeline can be summarised as follows:
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
, we generate an instruction as described in the Magpie paper: we send the pre-query template to the model<|begin_of_text|><|start_header_id|>user<|end_header_id|>\\n\\n
and thanks to the autoregressive capabilites of the LLM and having being fine-tuned on an SFT dataset, it will generate a user instruction until it generates the<eot_id>
token. After that, we send the generated instruction to the LLM to get a response.meta-llama/Meta-Llama-3.1-405B-Instruct
, we generate another response for the generated instruction. Later, we assign a score to the responses given by the instruct and base models withRLHFlow/ArmoRM-Llama3-8B-v0.1
. If the score of the instruct model substracted the score of the base model is positive, then we can consider the generated response by the instruct model is of higher quality.meta-llama/Meta-Llama-3.1-8B-Instruct
, we assess the quality and the difficulty of the generated instructions, and we classify them on one or more of the aforementioned categories: Information seeking, Reasoning, Planning, Editing, Coding & Debugging, Math, Data analysis, Creative writing, Advice seeking, Brainstorming or Others. To ensure that the outputs of the model were a valid JSON that we can easily parse, we used the structured output generation feature of distilabel.meta-llama/Llama-Guard-3-8B
, we classified the generated instruction-response pairs into "safe" or "unsafe" also providing the hazard category from the MLCommons AI Safety.Alibaba-NLP/gte-large-en-v1.5
and Faiss, we generated embeddings for all the instructions and computed its nearest neighbour to ensure instruction diversity on the final dataset.The dataset was generated using a single 8xH100 machine:
Dataset columns
The examples have the following structure per configuration:
model_name_response_base
instruction
response
response_base
intent
knowledge
difficulty
model_name_difficulty
explanation
quality
model_name_quality
primary_tag
other_tags
model_name_classification
embedding
model_name_embeddings
score
score_base
distilabel_metadata
nn_indices
nn_scores
guard
safe
hazard_category
score_difference
The instruction and response columns can be used for SFT. Depending on the value of score_difference one can generate a chosen/rejected pair that can be used for DPO. If the score_difference is positive then we can select response as chosen an response_base as rejected, and the other way around.
Limitations
Suggested labels
{'label-name': 'instruction-response', 'label-description': 'A dataset containing instruction-response pairs for various tasks generated using LLMs.', 'gh-repo': 'argilla/magpie-ultra-v0.1', 'confidence': 62.25}
The text was updated successfully, but these errors were encountered: