Skip to content

Commit cebbe04

Browse files
feat: Updating docs to include model inference guidelines (feast-dev#4416)
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
1 parent 0baeeb5 commit cebbe04

File tree

5 files changed

+98
-3
lines changed

5 files changed

+98
-3
lines changed

docs/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
* [Push vs Pull Model](getting-started/architecture/push-vs-pull-model.md)
2424
* [Write Patterns](getting-started/architecture/write-patterns.md)
2525
* [Feature Transformation](getting-started/architecture/feature-transformation.md)
26+
* [Feature Serving and Model Inference](getting-started/architecture/model-inference.md)
2627
* [Components](getting-started/components/README.md)
2728
* [Overview](getting-started/components/overview.md)
2829
* [Registry](getting-started/components/registry.md)

docs/getting-started/architecture/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,7 @@
1919
{% content-ref url="feature-transformation.md" %}
2020
[feature-transformation.md](feature-transformation.md)
2121
{% endcontent-ref %}
22+
23+
{% content-ref url="model-inference.md" %}
24+
[model-inference.md](model-inference.md)
25+
{% endcontent-ref %}

docs/getting-started/architecture/feature-transformation.md

+1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
A *feature transformation* is a function that takes some set of input data and
44
returns some set of output data. Feature transformations can happen on either raw data or derived data.
55

6+
## Feature Transformation Engines
67
Feature transformations can be executed by three types of "transformation engines":
78

89
1. The Feast Feature Server
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Feature Serving and Model Inference
2+
3+
Production machine learning systems can choose from four approaches to serving machine learning predictions (the output
4+
of model inference):
5+
1. Online model inference with online features
6+
2. Precomputed (batch) model predictions without online features
7+
3. Online model inference with online features and cached predictions
8+
4. Online model inference without features
9+
10+
*Note: online features can be sourced from batch, streaming, or request data sources.*
11+
12+
These three approaches have different tradeoffs but, in general, have significant implementation differences.
13+
14+
## 1. Online Model Inference with Online Features
15+
Online model inference with online features is a powerful approach to serving data-driven machine learning applications.
16+
This requires a feature store to serve online features and a model server to serve model predictions (e.g., KServe).
17+
This approach is particularly useful for applications where request-time data is required to run inference.
18+
```python
19+
features = store.get_online_features(
20+
feature_refs=[
21+
"user_data:click_through_rate",
22+
"user_data:number_of_clicks",
23+
"user_data:average_page_duration",
24+
],
25+
entity_rows=[{"user_id": 1}],
26+
)
27+
model_predictions = model_server.predict(features)
28+
```
29+
30+
## 2. Precomputed (Batch) Model Predictions without Online Features
31+
Typically, Machine Learning teams find serving precomputed model predictions to be the most straightforward to implement.
32+
This approach simply treats the model predictions as a feature and serves them from the feature store using the standard
33+
Feast sdk.
34+
```python
35+
model_predictions = store.get_online_features(
36+
feature_refs=[
37+
"user_data:model_predictions",
38+
],
39+
entity_rows=[{"user_id": 1}],
40+
)
41+
```
42+
Notice that the model server is not involved in this approach. Instead, the model predictions are precomputed and
43+
materialized to the online store.
44+
45+
While this approach can lead to quick impact for different business use cases, it suffers from stale data as well
46+
as only serving users/entities that were available at the time of the batch computation. In some cases, this tradeoff
47+
may be tolerable.
48+
49+
## 3. Online Model Inference with Online Features and Cached Predictions
50+
This approach is the most sophisticated where inference is optimized for low-latency by caching predictions and running
51+
model inference when data producers write features to the online store. This approach is particularly useful for
52+
applications where features are coming from multiple data sources, the model is computationally expensive to run, or
53+
latency is a significant constraint.
54+
55+
```python
56+
# Client Reads
57+
features = store.get_online_features(
58+
feature_refs=[
59+
"user_data:click_through_rate",
60+
"user_data:number_of_clicks",
61+
"user_data:average_page_duration",
62+
"user_data:model_predictions",
63+
],
64+
entity_rows=[{"user_id": 1}],
65+
)
66+
if features.to_dict().get('user_data:model_predictions') is None:
67+
model_predictions = model_server.predict(features)
68+
store.write_to_online_store(feature_view_name="user_data", df=pd.DataFrame(model_predictions))
69+
```
70+
Note that in this case a seperate call to `write_to_online_store` is required when the underlying data changes and
71+
predictions change along with it.
72+
73+
```python
74+
# Client Writes from the Data Producer
75+
user_data = request.POST.get('user_data')
76+
model_predictions = model_server.predict(user_data) # assume this includes `user_data` in the Data Frame
77+
store.write_to_online_store(feature_view_name="user_data", df=pd.DataFrame(model_predictions))
78+
```
79+
While this requires additional writes for every data producer, this approach will result in the lowest latency for
80+
model inference.
81+
82+
## 4. Online Model Inference without Features
83+
This approach does not require Feast. The model server can directly serve predictions without any features. This
84+
approach is common in Large Language Models (LLMs) and other models that do not require features to make predictions.
85+
86+
Note that generative models using Retrieval Augmented Generation (RAG) do require features where the
87+
[document embeddings](../../reference/alpha-vector-database.md) are treated as features, which Feast supports
88+
(this would fall under "Online Model Inference with Online Features").

docs/getting-started/architecture/overview.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,10 @@ Feast's architecture is designed to be flexible and scalable. It is composed of
88
online store.
99
This allows Feast to serve features in real-time with low latency.
1010

11-
* Feast supports On Demand and Streaming Transformations for [feature computation](feature-transformation.md) and
12-
will support Batch transformations in the future. For Streaming and Batch, Feast requires a separate Feature Transformation
13-
Engine (in the batch case, this is typically your Offline Store). We are exploring adding a default streaming engine to Feast.
11+
* Feast supports [feature transformation](feature-transformation.md) for On Demand and Streaming data sources and
12+
will support Batch transformations in the future. For Streaming and Batch data sources, Feast requires a separate
13+
[Feature Transformation Engine](feature-transformation.md#feature-transformation-engines) (in the batch case, this is
14+
typically your Offline Store). We are exploring adding a default streaming engine to Feast.
1415

1516
* Domain expertise is recommended when integrating a data source with Feast understand the [tradeoffs from different
1617
write patterns](write-patterns.md) to your application

0 commit comments

Comments
 (0)