Skip to content

Commit 02ac46e

Browse files
kevjumbahemidactylus
authored andcommitted
Cassandra online store
* Refactor file-editing to a shared utils module * Use f-strings in the CassandraOnlineStoreCreator * Specify version 2 in serializing to make the entity key * Remove unnecessary empty comment lines * Rename proj to columns in _read_rows_by_entity_key * Introduce Cassandra-specific pytest targets * Adapt roadmaps and docs to cover/index Cassandra online store * Add license notes to code files Signed-off-by: Stefano Lottini <stefano.lottini@datastax.com>
1 parent 63d541d commit 02ac46e

33 files changed

+1431
-57
lines changed

CONTRIBUTING.md

+1
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,7 @@ The services with containerized replacements currently implemented are:
312312
- Trino
313313
- HBase
314314
- Postgres
315+
- Cassandra
315316

316317
You can run `make test-python-integration-container` to run tests against the containerized versions of dependencies.
317318

Makefile

+29
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,35 @@ test-python-universal-postgres:
156156
not test_universal_types" \
157157
sdk/python/tests
158158

159+
test-python-universal-cassandra:
160+
PYTHONPATH='.' \
161+
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.online_stores.contrib.cassandra_repo_configuration \
162+
FEAST_USAGE=False \
163+
IS_TEST=True \
164+
python -m pytest -x --integration \
165+
sdk/python/tests
166+
167+
test-python-universal-cassandra-no-cloud-providers:
168+
PYTHONPATH='.' \
169+
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.online_stores.contrib.cassandra_repo_configuration \
170+
FEAST_USAGE=False \
171+
IS_TEST=True \
172+
python -m pytest -x --integration \
173+
-k "not test_lambda_materialization_consistency and \
174+
not test_apply_entity_integration and \
175+
not test_apply_feature_view_integration and \
176+
not test_apply_entity_integration and \
177+
not test_apply_feature_view_integration and \
178+
not test_apply_data_source_integration and \
179+
not test_nullable_online_store " \
180+
sdk/python/tests
181+
182+
test-python-universal-cassandra-minimal:
183+
FEAST_USAGE=False \
184+
IS_TEST=True \
185+
FEAST_LOCAL_ONLINE_CONTAINER=True \
186+
python -m pytest -n0 --integration -k cassandra sdk/python/tests
187+
159188
test-python-universal:
160189
FEAST_USAGE=False IS_TEST=True python -m pytest -n 8 --integration sdk/python/tests
161190

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ The list below contains the functionality that contributors are planning to deve
177177
* [x] [Azure Cache for Redis (community plugin)](https://github.com/Azure/feast-azure)
178178
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/online-stores/postgres)
179179
* [x] [Custom online store support](https://docs.feast.dev/how-to-guides/adding-support-for-a-new-online-store)
180-
* [x] [Cassandra / AstraDB](https://github.com/datastaxdevs/feast-cassandra-online-store)
180+
* [x] [Cassandra / AstraDB](https://docs.feast.dev/reference/online-stores/cassandra)
181181
* [ ] Bigtable (in progress)
182182
* **Feature Engineering**
183183
* [x] On-demand Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit#))

docs/reference/online-stores/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,7 @@ Please see [Online Store](../../getting-started/architecture-and-components/onli
2525
{% content-ref url="postgres.md" %}
2626
[postgres.md](postgres.md)
2727
{% endcontent-ref %}
28+
29+
{% content-ref url="cassandra.md" %}
30+
[cassandra.md](cassandra.md)
31+
{% endcontent-ref %}
+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Cassandra / Astra DB online store
2+
3+
## Description
4+
5+
The [Cassandra / Astra DB] online store provides support for materializing feature values into an Apache Cassandra / Astra DB database for online features.
6+
7+
* The whole project is contained within a Cassandra keyspace
8+
* Each feature view is mapped one-to-one to a specific Cassandra table
9+
* This implementation inherits all strengths of Cassandra such as high availability, fault-tolerance, and data distribution
10+
11+
An easy way to get started is the command `feast init REPO_NAME -t cassandra`.
12+
13+
### Example (Cassandra)
14+
15+
{% code title="feature_store.yaml" %}
16+
```yaml
17+
project: my_feature_repo
18+
registry: data/registry.db
19+
provider: local
20+
online_store:
21+
type: cassandra
22+
hosts:
23+
- 192.168.1.1
24+
- 192.168.1.2
25+
- 192.168.1.3
26+
keyspace: KeyspaceName
27+
port: 9042 # optional
28+
username: user # optional
29+
password: secret # optional
30+
protocol_version: 5 # optional
31+
load_balancing: # optional
32+
local_dc: 'datacenter1' # optional
33+
load_balancing_policy: 'TokenAwarePolicy(DCAwareRoundRobinPolicy)' # optional
34+
```
35+
{% endcode %}
36+
37+
### Example (Astra DB)
38+
39+
{% code title="feature_store.yaml" %}
40+
```yaml
41+
project: my_feature_repo
42+
registry: data/registry.db
43+
provider: local
44+
online_store:
45+
type: cassandra
46+
secure_bundle_path: /path/to/secure/bundle.zip
47+
keyspace: KeyspaceName
48+
username: Client_ID
49+
password: Client_Secret
50+
protocol_version: 4 # optional
51+
load_balancing: # optional
52+
local_dc: 'eu-central-1' # optional
53+
load_balancing_policy: 'TokenAwarePolicy(DCAwareRoundRobinPolicy)' # optional
54+
55+
```
56+
{% endcode %}
57+
58+
For a full explanation of configuration options please look at file
59+
`sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/README.md`.
60+
61+
Storage specifications can be found at `docs/specs/online_store_format.md`.

docs/roadmap.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ The list below contains the functionality that contributors are planning to deve
3535
* [x] [Azure Cache for Redis (community plugin)](https://github.com/Azure/feast-azure)
3636
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/online-stores/postgres)
3737
* [x] [Custom online store support](https://docs.feast.dev/how-to-guides/adding-support-for-a-new-online-store)
38-
* [x] [Cassandra / AstraDB](https://github.com/datastaxdevs/feast-cassandra-online-store)
38+
* [x] [Cassandra / AstraDB](https://docs.feast.dev/reference/online-stores/cassandra)
3939
* [ ] Bigtable (in progress)
4040
* **Feature Engineering**
4141
* [x] On-demand Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit#))

docs/specs/online_store_format.md

+80
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,86 @@ Other types of entity keys are not supported in this version of the specificatio
9292

9393
![Datastore Online Example](datastore_online_example.png)
9494

95+
## Cassandra/Astra DB Online Store Format
96+
97+
### Overview
98+
99+
Apache Cassandra™ is a table-oriented NoSQL distributed database. Astra DB is a managed database-as-a-service
100+
built on Cassandra, and will be assimilated to the former in what follows.
101+
102+
In Cassandra, tables are grouped in _keyspaces_ (groups of related tables). Each table is comprised of
103+
_rows_, each containing data for a given set of _columns_. Moreover, rows are grouped in _partitions_ according
104+
to a _partition key_ (a portion of the uniqueness-defining _primary key_ set of columns), so that all rows
105+
with the same values for the partition key are guaranteed to be stored on the same Cassandra nodes, next to each other,
106+
which guarantees fast retrieval times.
107+
108+
This architecture makes Cassandra a good fit for an online feature store in Feast.
109+
110+
### Cassandra Online Store Format
111+
112+
Each project (denoted by its name, called "feature store name" elsewhere) may contain an
113+
arbitrary number of `FeatureView`s: these correspond each to a specific table, and
114+
all tables for a project are to be contained in a single keyspace. The keyspace should
115+
have been created by the Feast user preliminarly and is to be specified in the feature store
116+
configuration `yaml`.
117+
118+
The table for a project `project` and feature view `FeatureView` will have name
119+
`project_FeatureView` (e.g. `feature_repo_driver_hourly_stats`).
120+
121+
All tables have the same structure. Cassandra is schemaful and the columns are strongly typed.
122+
In the following table schema (which also serves as Chebotko diagram) the Python
123+
and Cassandra data types are both specified:
124+
125+
|Table: |`<project>`_`<FeatureView>` | | _(Python type)_ |
126+
|---------------|-----------------------------|--|----------------------|
127+
|`entity_key` |`TEXT` |K | `str` |
128+
|`feature_name` |`TEXT` |C↑| `str` |
129+
|`value` |`BLOB` | | `bytes` |
130+
|`event_ts` |`TIMESTAMP` | | `datetime.datetime` |
131+
|`created_ts` |`TIMESTAMP` | | `datetime.datetime` |
132+
133+
Each row in the table represents a single value for a feature in a feature view,
134+
thus associated to a specific entity. The choice of partitioning ensures that,
135+
within a given feature view (i.e. a single table), for a given entity any number
136+
of features can be retrieved with a single, best-practice-respecting query
137+
(which is what happens in the `online_read` method implementation).
138+
139+
140+
The `entity_key` column is computed as `serialize_entity_key(entityKey).hex()`,
141+
where `entityKey` is of type `feast.protos.feast.types.EntityKey_pb2.EntityKey`.
142+
143+
The value of `feature_name` is the plain-text name of the feature as defined
144+
in the corresponding `FeatureView`.
145+
146+
For `value`, the bytes from `[protoValue].SerializeToString()`
147+
are used, where `protoValue` is of type `feast.protos.feast.types.Value_pb2.Value`.
148+
149+
Column `event_ts` stores the timestamp the feature value refers to, as passed
150+
to the store method. Conversely, column `created_ts`, meant to store the write
151+
time for the entry, is now being deprecated and will be never written by this
152+
online-store implementation. Thanks to the internal storage mechanism of Cassandra,
153+
this does not incur a noticeable performance penalty (hence, for the time being,
154+
the column can be maintained in the schema).
155+
156+
### Example entry
157+
158+
For a project `feature_repo` and feature view named `driver_hourly_stats`,
159+
a typical row in table `feature_repo_driver_hourly_stats` might look like:
160+
161+
|Column |content | notes |
162+
|---------------|-----------------------------------------------------|-------------------------------------------------------------------|
163+
|`entity_key` |`020000006472697665725f69640400000004000000ea030000` | from `"driver_id = 1002"` |
164+
|`feature_name` |`conv_rate` | |
165+
|`value` |`0x35f5696d3f` | from `float_val: 0.9273980259895325`, i.e. `(b'5\xf5im?').hex()` |
166+
|`event_ts` |`2022-07-07 09:00:00.000000+0000` | from `datetime.datetime(2022, 7, 7, 9, 0)` |
167+
|`created_ts` |`null` | not explicitly written to avoid unnecessary tombstones |
168+
169+
### Known Issues
170+
171+
If a `FeatureView` ever gets _re-defined_ in a schema-breaking way, the implementation is not able to rearrange the
172+
schema of the underlying table accordingly (neither dropping all data nor, even less so, keeping it somehow).
173+
This should never occur, lest one encounters all sorts of data-retrieval issues anywhere in Feast usage.
174+
95175
# Appendix
96176

97177
##### Appendix A. Value proto format.

sdk/python/docs/index.rst

+7
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,13 @@ HBase Online Store
287287
:members:
288288
:noindex:
289289

290+
Cassandra Online Store
291+
-----------------------
292+
293+
.. automodule:: feast.infra.online_stores.contrib.cassandra_online_store.cassandra_online_store
294+
:members:
295+
:noindex:
296+
290297

291298
Batch Materialization Engine
292299
============================

sdk/python/docs/source/feast.infra.offline_stores.contrib.rst

+14-6
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,26 @@ Subpackages
1414
Submodules
1515
----------
1616

17-
feast.infra.offline\_stores.contrib.contrib\_repo\_configuration module
18-
-----------------------------------------------------------------------
17+
feast.infra.offline\_stores.contrib.postgres\_repo\_configuration module
18+
------------------------------------------------------------------------
1919

20-
.. automodule:: feast.infra.offline_stores.contrib.contrib_repo_configuration
20+
.. automodule:: feast.infra.offline_stores.contrib.postgres_repo_configuration
2121
:members:
2222
:undoc-members:
2323
:show-inheritance:
2424

25-
feast.infra.offline\_stores.contrib.postgres\_repo\_configuration module
26-
------------------------------------------------------------------------
25+
feast.infra.offline\_stores.contrib.spark\_repo\_configuration module
26+
---------------------------------------------------------------------
2727

28-
.. automodule:: feast.infra.offline_stores.contrib.postgres_repo_configuration
28+
.. automodule:: feast.infra.offline_stores.contrib.spark_repo_configuration
29+
:members:
30+
:undoc-members:
31+
:show-inheritance:
32+
33+
feast.infra.offline\_stores.contrib.trino\_repo\_configuration module
34+
---------------------------------------------------------------------
35+
36+
.. automodule:: feast.infra.offline_stores.contrib.trino_repo_configuration
2937
:members:
3038
:undoc-members:
3139
:show-inheritance:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
feast.infra.online\_stores.contrib.cassandra\_online\_store package
2+
===================================================================
3+
4+
Submodules
5+
----------
6+
7+
feast.infra.online\_stores.contrib.cassandra\_online\_store.cassandra\_online\_store module
8+
-------------------------------------------------------------------------------------------
9+
10+
.. automodule:: feast.infra.online_stores.contrib.cassandra_online_store.cassandra_online_store
11+
:members:
12+
:undoc-members:
13+
:show-inheritance:
14+
15+
Module contents
16+
---------------
17+
18+
.. automodule:: feast.infra.online_stores.contrib.cassandra_online_store
19+
:members:
20+
:undoc-members:
21+
:show-inheritance:

sdk/python/docs/source/feast.infra.online_stores.contrib.rst

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Subpackages
77
.. toctree::
88
:maxdepth: 4
99

10+
feast.infra.online_stores.contrib.cassandra_online_store
1011
feast.infra.online_stores.contrib.hbase_online_store
1112

1213
Submodules

sdk/python/docs/source/feast.rst

+8
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,14 @@ feast.field module
169169
:undoc-members:
170170
:show-inheritance:
171171

172+
feast.file\_utils module
173+
------------------------
174+
175+
.. automodule:: feast.file_utils
176+
:members:
177+
:undoc-members:
178+
:show-inheritance:
179+
172180
feast.flags\_helper module
173181
--------------------------
174182

sdk/python/docs/source/index.rst

+7
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,13 @@ HBase Online Store
287287
:members:
288288
:noindex:
289289

290+
Cassandra Online Store
291+
-----------------------
292+
293+
.. automodule:: feast.infra.online_stores.contrib.cassandra_online_store.cassandra_online_store
294+
:members:
295+
:noindex:
296+
290297

291298
Batch Materialization Engine
292299
============================

sdk/python/feast/cli.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -590,7 +590,7 @@ def materialize_incremental_command(ctx: click.Context, end_ts: str, views: List
590590
"--template",
591591
"-t",
592592
type=click.Choice(
593-
["local", "gcp", "aws", "snowflake", "spark", "postgres", "hbase"],
593+
["local", "gcp", "aws", "snowflake", "spark", "postgres", "hbase", "cassandra"],
594594
case_sensitive=False,
595595
),
596596
help="Specify a template for the created project",

0 commit comments

Comments
 (0)