Merge branch 'develop' into update-integration-argilla-2.0

zenml-io · Aug 8, 2024 · b0f5ee1 · b0f5ee1
2 parents 43e5cf1 + 6e40939
commit b0f5ee1
Show file tree

Hide file tree

Showing 36 changed files with 897 additions and 126 deletions.
diff --git a/docs/book/.gitbook/assets/argilla-interface-embeddings-finetuning.png b/docs/book/.gitbook/assets/argilla-interface-embeddings-finetuning.png
diff --git a/docs/book/.gitbook/assets/distilabel-synthetic-dataset-hf.png b/docs/book/.gitbook/assets/distilabel-synthetic-dataset-hf.png
diff --git a/docs/book/.gitbook/assets/finetuning-embeddings-visualization.png b/docs/book/.gitbook/assets/finetuning-embeddings-visualization.png
diff --git a/docs/book/.gitbook/assets/mcp-embeddings.gif b/docs/book/.gitbook/assets/mcp-embeddings.gif
diff --git a/docs/book/.gitbook/assets/rag-dataset-hf.png b/docs/book/.gitbook/assets/rag-dataset-hf.png
diff --git a/docs/book/.gitbook/assets/rag-finetuning-embeddings-pipeline.png b/docs/book/.gitbook/assets/rag-finetuning-embeddings-pipeline.png
diff --git a/docs/book/.gitbook/assets/rag-synthetic-data-pipeline.png b/docs/book/.gitbook/assets/rag-synthetic-data-pipeline.png
diff --git a/docs/book/component-guide/orchestrators/azureml.md b/docs/book/component-guide/orchestrators/azureml.md
@@ -25,7 +25,8 @@ You should use the AzureML orchestrator if:
 The ZenML AzureML orchestrator implementation uses [the Python SDK v2 of 
 AzureML](https://learn.microsoft.com/en-gb/python/api/overview/azure/ai-ml-readme?view=azure-python) 
 to allow our users to build their Machine Learning pipelines. For each ZenML step,
-it creates an AzureML `[CommandComponent](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.commandcomponent?view=azure-python)` and brings them together in a pipeline.
+it creates an AzureML [CommandComponent](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.commandcomponent?view=azure-python)
+and brings them together in a pipeline.
 
 ## How to deploy it
 
@@ -149,7 +150,8 @@ def pipeline():
 ### Run pipelines on a schedule
 
 The AzureML orchestrator supports running pipelines on a schedule using 
-its `[JobSchedules](https://learn.microsoft.com/en-us/azure/templates/microsoft.automation/2023-11-01/automationaccounts/jobschedules?pivots=deployment-language-bicep)`. Both cron expression and intervals are supported.
+its [JobSchedules](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipeline-job?view=azureml-api-2&tabs=python). 
+Both cron expression and intervals are supported.
 
 ```python
 from zenml.config.schedule import Schedule

diff --git a/docs/book/how-to/run-remote-pipelines-from-notebooks/README.md b/docs/book/how-to/run-remote-pipelines-from-notebooks/README.md
@@ -8,7 +8,7 @@ ZenML steps and pipelines can be defined in a Jupyter notebook and executed remo
 
 Learn more about it in the following sections:
 
-<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>Define steps in notebook cells</td><td></td><td></td><td><a href="define-steps-in-notebook-cells.md">define-steps-in-notebook-cells.md</a></td></tr><tr><td>Configure the notebook path</td><td></td><td></td><td></td></tr></tbody></table>
+<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>Define steps in notebook cells</td><td></td><td></td><td><a href="define-steps-in-notebook-cells.md">define-steps-in-notebook-cells.md</a></td></tr></tbody></table>
 
 <!-- For scarf -->
 <figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
diff --git a/docs/book/toc.md b/docs/book/toc.md
@@ -56,8 +56,11 @@
     * [Understanding reranking](user-guide/llmops-guide/reranking/understanding-reranking.md)
     * [Implementing reranking in ZenML](user-guide/llmops-guide/reranking/implementing-reranking.md)
     * [Evaluating reranking performance](user-guide/llmops-guide/reranking/evaluating-reranking-performance.md)
-  * [Improve retrieval by finetuning embeddings](user-guide/llmops-guide/finetuning-embeddings.md)
-  * [Finetuning LLMs with ZenML](user-guide/llmops-guide/finetuning-llms.md)
+  * [Improve retrieval by finetuning embeddings](user-guide/llmops-guide/finetuning-embeddings/finetuning-embeddings.md)
+    * [Synthetic data generation](user-guide/llmops-guide/finetuning-embeddings/synthetic-data-generation.md)
+    * [Finetuning embeddings with Sentence Transformers](user-guide/llmops-guide/finetuning-embeddings/finetuning-embeddings-with-sentence-transformers.md)
+    * [Evaluating finetuned embeddings](user-guide/llmops-guide/finetuning-embeddings/evaluating-finetuned-embeddings.md)
+  * [Finetuning LLMs with ZenML](user-guide/llmops-guide/finetuning-llms/finetuning-llms.md)
 
 ## How-To
 

diff --git a/docs/book/user-guide/llmops-guide/README.md b/docs/book/user-guide/llmops-guide/README.md
@@ -26,8 +26,11 @@ In this guide, we'll explore various aspects of working with LLMs in ZenML, incl
   * [Understanding reranking](reranking/understanding-reranking.md)
   * [Implementing reranking in ZenML](reranking/implementing-reranking.md)
   * [Evaluating reranking performance](reranking/evaluating-reranking-performance.md)
-* [Improve retrieval by finetuning embeddings](finetuning-embeddings.md)
-* [Finetuning LLMs with ZenML](finetuning-llms.md)
+* [Improve retrieval by finetuning embeddings](finetuning-embeddings/finetuning-embeddings.md)
+  * [Synthetic data generation](finetuning-embeddings/synthetic-data-generation.md)
+  * [Finetuning embeddings with Sentence Transformers](finetuning-embeddings/finetuning-embeddings-with-sentence-transformers.md)
+  * [Evaluating finetuned embeddings](finetuning-embeddings/evaluating-finetuned-embeddings.md)
+* [Finetuning LLMs with ZenML](finetuning-llms/finetuning-llms.md)
 
 To follow along with the examples and tutorials in this guide, ensure you have a Python environment set up with ZenML installed. Familiarity with the concepts covered in the [Starter Guide](../starter-guide/README.md) and [Production Guide](../production-guide/README.md) is recommended.
 

diff --git a/docs/book/user-guide/llmops-guide/finetuning-embeddings.md b/docs/book/user-guide/llmops-guide/finetuning-embeddings.md
diff --git a/...ser-guide/llmops-guide/finetuning-embeddings/evaluating-finetuned-embeddings.md b/...ser-guide/llmops-guide/finetuning-embeddings/evaluating-finetuned-embeddings.md
@@ -0,0 +1,139 @@
+---
+description: Evaluate finetuned embeddings and compare to original base embeddings.
+---
+
+Now that we've finetuned our embeddings, we can evaluate them and compare to the
+base embeddings. We have all the data saved and versioned already, and we will
+reuse the same MatryoshkaLoss function for evaluation.
+
+In code, our evaluation steps are easy to comprehend. Here, for example, is the
+base model evaluation step:
+
+```python
+from zenml import log_model_metadata, step
+
+def evaluate_model(
+    dataset: DatasetDict, model: SentenceTransformer
+) -> Dict[str, float]:
+    """Evaluate the given model on the dataset."""
+    evaluator = get_evaluator(
+        dataset=dataset,
+        model=model,
+    )
+    return evaluator(model)
+
+@step
+def evaluate_base_model(
+    dataset: DatasetDict,
+) -> Annotated[Dict[str, float], "base_model_evaluation_results"]:
+    """Evaluate the base model on the given dataset."""
+    model = SentenceTransformer(
+        EMBEDDINGS_MODEL_ID_BASELINE,
+        device="cuda" if torch.cuda.is_available() else "cpu",
+    )
+
+    results = evaluate_model(
+        dataset=dataset,
+        model=model,
+    )
+
+    # Convert numpy.float64 values to regular Python floats
+    # (needed for serialization)
+    base_model_eval = {
+        f"dim_{dim}_cosine_ndcg@10": float(
+            results[f"dim_{dim}_cosine_ndcg@10"]
+        )
+        for dim in EMBEDDINGS_MODEL_MATRYOSHKA_DIMS
+    }
+
+    log_model_metadata(
+        metadata={"base_model_eval": base_model_eval},
+    )
+
+    return results
+```
+
+We log the results for our core Matryoshka dimensions as model metadata to ZenML
+within our evaluation step. This will allow us to inspect these results from
+within [the Model Control Plane](https://docs.zenml.io/how-to/use-the-model-control-plane) (see
+below for more details). Our results come in the form of a dictionary of string
+keys and float values which will, like all step inputs and outputs, be
+versioned, tracked and saved in your artifact store.
+
+## Visualizing results
+
+It's possible to visualize results in a few different ways in ZenML, but one
+easy option is just to output your chart as an `PIL.Image` object. (See our
+[documentation on more ways to visualize your
+results](../../../how-to/visualize-artifacts/README.md).) The rest the
+implementation of our `visualize_results` step is just simple `matplotlib` code
+to plot out the base model evaluation against the finetuned model evaluation. We
+represent the results as percentage values and horizontally stack the two sets
+to make comparison a little easier.
+
+![Visualizing finetuned embeddings evaluation
+results](../../../.gitbook/assets/finetuning-embeddings-visualization.png)
+
+We can see that our finetuned embeddings have improved the recall of our
+retrieval system across all of the dimensions, but the results are still not
+amazing. In a production setting, we would likely want to focus on improving the
+data being used for the embeddings training. In particular, we could consider
+stripping out some of the logs output from the documentation, and perhaps omit
+some pages which offer low signal for the retrieval task. This embeddings
+finetuning was run purely on the full set of synthetic data generated by
+`distilabel` and `gpt-4o`, so we wouldn't necessarily expect to see huge
+improvements out of the box, especially when the underlying data chunks are
+complex and contain multiple topics.
+
+## Model Control Plane as unified interface
+
+Once all our pipelines are finished running, the best place to inspect our
+results as well as the artifacts and models we generated is the Model Control
+Plane.
+
+![Model Control Plane](../../../.gitbook/assets/mcp-embeddings.gif)
+
+The interface is split into sections that correspond to:
+
+- the artifacts generated by our steps
+- the models generated by our steps
+- the metadata logged by our steps
+- (potentially) any deployments of models made, though we didn't use this in
+  this guide so far
+- any pipeline runs associated with this 'Model'
+
+We can easily see which are the latest artifact or technical model versions, as
+well as compare the actual values of our evals or inspect the hardware or
+hyperparameters used for training.
+
+This one-stop-shop interface is available on ZenML Pro and you can learn more
+about it in the [Model Control Plane
+documentation](https://docs.zenml.io/how-to/use-the-model-control-plane).
+
+## Next Steps
+
+Now that we've finetuned our embeddings and evaluated them, when they were in a
+good shape for use we could bring these into [the original RAG pipeline](../rag/basic-rag-inference-pipeline.md),
+regenerate a new series of embeddings for our data and then rerun our RAG
+retrieval evaluations to see how they've improved in our hand-crafted and
+LLM-powered evaluations.
+
+The next section will cover [LLM finetuning and deployment](../finetuning-llms/finetuning-llms.md) as the
+final part of our LLMops guide. (This section is currently still a work in
+progress, but if you're eager to try out LLM finetuning with ZenML, you can use
+[our LoRA
+project](https://github.com/zenml-io/zenml-projects/blob/main/llm-lora-finetuning/README.md)
+to get started. We also have [a
+blogpost](https://www.zenml.io/blog/how-to-finetune-llama-3-1-with-zenml) guide which
+takes you through
+[all the steps you need to finetune Llama 3.1](https://www.zenml.io/blog/how-to-finetune-llama-3-1-with-zenml) using GCP's Vertex AI with ZenML,
+including one-click stack creation!)
+
+To try out the two pipelines, please follow the instructions in [the project
+repository README](https://github.com/zenml-io/zenml-projects/blob/main/llm-complete-guide/README.md),
+and you can find the full code in that same directory.
+
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
+
+
diff --git a/...finetuning-embeddings/finetuning-embeddings-for-better-retrieval-performance.md b/...finetuning-embeddings/finetuning-embeddings-for-better-retrieval-performance.md
diff --git a/...guide/finetuning-embeddings/finetuning-embeddings-with-sentence-transformers.md b/...guide/finetuning-embeddings/finetuning-embeddings-with-sentence-transformers.md
@@ -0,0 +1,102 @@
+---
+description: Finetune embeddings with Sentence Transformers.
+---
+
+We now have a dataset that we can use to finetune our embeddings. You can
+[inspect the positive and negative examples](https://huggingface.co/datasets/zenml/rag_qa_embedding_questions_0_60_0_distilabel) on the Hugging Face [datasets page](https://huggingface.co/datasets/zenml/rag_qa_embedding_questions_0_60_0_distilabel) since
+our previous pipeline pushed the data there.
+
+![Synthetic data generated with distilabel for embeddings finetuning](../../../.gitbook/assets/distilabel-synthetic-dataset-hf.png)
+
+Our pipeline for finetuning the embeddings is relatively simple. We'll do the
+following:
+
+- load our data either from Hugging Face or [from Argilla via the ZenML
+  annotation integration](../../../component-guide/annotators/argilla.md)
+- finetune our model using the [Sentence
+  Transformers](https://www.sbert.net/) library
+- evaluate the base and finetuned embeddings
+- visualise the results of the evaluation
+
+![Embeddings finetuning pipeline with Sentence Transformers and
+ZenML](../../../.gitbook/assets/rag-finetuning-embeddings-pipeline.png)
+
+## Loading data
+
+By default the pipeline will load the data from our Hugging Face dataset. If
+you've annotated your data in Argilla, you can load the data from there instead.
+You'll just need to pass an `--argilla` flag to the Python invocation when
+you're running the pipeline like so:
+
+```bash
+python run.py --embeddings --argilla
+```
+
+This assumes that you've set up an Argilla annotator in your stack. The code
+checks for the annotator and downloads the data that was annotated in Argilla.
+Please see our [guide to using the Argilla integration with ZenML](../../../component-guide/annotators/argilla.md) for more details.
+
+## Finetuning with Sentence Transformers
+
+The `finetune` step in the pipeline is responsible for finetuning the embeddings model using the Sentence Transformers library. Let's break down the key aspects of this step:
+
+1. **Model Loading**: The code loads the base model (`EMBEDDINGS_MODEL_ID_BASELINE`) using the Sentence Transformers library. It utilizes the SDPA (Self-Distilled Pruned Attention) implementation for efficient training with Flash Attention 2.
+
+2. **Loss Function**: The finetuning process employs a custom loss function called `MatryoshkaLoss`. This loss function is a wrapper around the `MultipleNegativesRankingLoss` provided by Sentence Transformers. The Matryoshka approach involves training the model with different embedding dimensions simultaneously. It allows the model to learn embeddings at various granularities, improving its performance across different embedding sizes.
+
+3. **Dataset Preparation**: The training dataset is loaded from the provided `dataset` parameter. The code saves the training data to a temporary JSON file and then loads it using the Hugging Face `load_dataset` function.
+
+4. **Evaluator**: An evaluator is created using the `get_evaluator` function. The evaluator is responsible for assessing the model's performance during training.
+
+5. **Training Arguments**: The code sets up the training arguments using the `SentenceTransformerTrainingArguments` class. It specifies various hyperparameters such as the number of epochs, batch size, learning rate, optimizer, precision (TF32 and BF16), and evaluation strategy.
+
+6. **Trainer**: The `SentenceTransformerTrainer` is initialized with the model,
+   training arguments, training dataset, loss function, and evaluator. The
+   trainer handles the training process. The `trainer.train()` method is called
+   to start the finetuning process. The model is trained for the specified
+   number of epochs using the provided hyperparameters.
+
+7. **Model Saving**: After training, the finetuned model is pushed to the Hugging Face Hub using the `trainer.model.push_to_hub()` method. The model is saved with the specified ID (`EMBEDDINGS_MODEL_ID_FINE_TUNED`).
+
+9. **Metadata Logging**: The code logs relevant metadata about the training process, including the training parameters, hardware information, and accelerator details.
+
+10. **Model Rehydration**: To handle materialization errors, the code saves the
+    trained model to a temporary file, loads it back into a new
+    `SentenceTransformer` instance, and returns the rehydrated model.
+
+(*Thanks and credit to Phil Schmid for [his tutorial on finetuning embeddings](https://www.philschmid.de/fine-tune-embedding-model-for-rag) with Sentence
+Transformers and a Matryoshka loss function. This project uses many ideas and
+some code from his implementation.*)
+
+## Finetuning in code
+
+Here's a simplified code snippet highlighting the key parts of the finetuning process:
+
+```python
+# Load the base model
+model = SentenceTransformer(EMBEDDINGS_MODEL_ID_BASELINE)
+# Define the loss function
+train_loss = MatryoshkaLoss(model, MultipleNegativesRankingLoss(model))
+# Prepare the training dataset
+train_dataset = load_dataset("json", data_files=train_dataset_path)
+# Set up the training arguments
+args = SentenceTransformerTrainingArguments(...)
+# Create the trainer
+trainer = SentenceTransformerTrainer(model, args, train_dataset, train_loss)
+# Start training
+trainer.train()
+# Save the finetuned model
+trainer.model.push_to_hub(EMBEDDINGS_MODEL_ID_FINE_TUNED)
+```
+
+The finetuning process leverages the capabilities of the Sentence Transformers library to efficiently train the embeddings model. The Matryoshka approach allows for learning embeddings at different dimensions simultaneously, enhancing the model's performance across various embedding sizes.
+
+Our model is finetuned, saved in the Hugging Face Hub for easy access and
+reference in subsequent steps, but also versioned and tracked within ZenML for
+full observability. At this point the pipeline will evaluate the base and
+finetuned embeddings and visualise the results.
+
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
+
+