Skip to content

Commit 1ab97f4

Browse files
JohnLemmonMedelytqtensor
authored andcommitted
fix: Bugfix for grabbing historical data from Snowflake with array type features. (feast-dev#3964)
Bugfix for grabbing historical data from Snowflake with array type features that are null for an entity. Update docs to reflect array support in Snowflake Signed-off-by: john.lemmon <john.lemmon@medely.com>
1 parent 8331762 commit 1ab97f4

File tree

4 files changed

+40
-12
lines changed

4 files changed

+40
-12
lines changed

docs/reference/data-sources/overview.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,13 @@ Details for each specific data source can be found [here](README.md).
1919
Below is a matrix indicating which data sources support which types.
2020

2121
| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
22-
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
23-
| `bytes` | yes | yes | yes | yes | yes | yes | yes |
24-
| `string` | yes | yes | yes | yes | yes | yes | yes |
25-
| `int32` | yes | yes | yes | yes | yes | yes | yes |
26-
| `int64` | yes | yes | yes | yes | yes | yes | yes |
27-
| `float32` | yes | yes | yes | yes | yes | yes | yes |
28-
| `float64` | yes | yes | yes | yes | yes | yes | yes |
29-
| `bool` | yes | yes | yes | yes | yes | yes | yes |
30-
| `timestamp` | yes | yes | yes | yes | yes | yes | yes |
31-
| array types | yes | yes | no | no | yes | yes | no |
22+
| :-------------------------------- | :-- | :-- |:----------| :-- | :-- | :-- | :-- |
23+
| `bytes` | yes | yes | yes | yes | yes | yes | yes |
24+
| `string` | yes | yes | yes | yes | yes | yes | yes |
25+
| `int32` | yes | yes | yes | yes | yes | yes | yes |
26+
| `int64` | yes | yes | yes | yes | yes | yes | yes |
27+
| `float32` | yes | yes | yes | yes | yes | yes | yes |
28+
| `float64` | yes | yes | yes | yes | yes | yes | yes |
29+
| `bool` | yes | yes | yes | yes | yes | yes | yes |
30+
| `timestamp` | yes | yes | yes | yes | yes | yes | yes |
31+
| array types | yes | yes | yes | no | yes | yes | no |

docs/reference/data-sources/snowflake.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,5 +46,5 @@ The full set of configuration options is available [here](https://rtd.feast.dev/
4646

4747
## Supported Types
4848

49-
Snowflake data sources support all eight primitive types, but currently do not support array types.
49+
Snowflake data sources support all eight primitive types. Array types are also supported but not with type inference.
5050
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).

sdk/python/feast/infra/offline_stores/snowflake.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -463,7 +463,9 @@ def _to_df_internal(self, timeout: Optional[int] = None) -> pd.DataFrame:
463463
Array(Float32),
464464
Array(Bool),
465465
]:
466-
df[feature.name] = [json.loads(x) for x in df[feature.name]]
466+
df[feature.name] = [
467+
json.loads(x) if x else None for x in df[feature.name]
468+
]
467469

468470
return df
469471

sdk/python/tests/unit/infra/offline_stores/test_snowflake.py

+26
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
11
import re
22
from unittest.mock import ANY, MagicMock, patch
33

4+
import pandas as pd
45
import pytest
6+
from pytest_mock import MockFixture
57

8+
from feast import FeatureView, Field, FileSource
69
from feast.infra.offline_stores.snowflake import (
710
SnowflakeOfflineStoreConfig,
811
SnowflakeRetrievalJob,
912
)
1013
from feast.infra.online_stores.sqlite import SqliteOnlineStoreConfig
1114
from feast.repo_config import RepoConfig
15+
from feast.types import Array, String
1216

1317

1418
@pytest.fixture(params=["s3", "s3gov"])
@@ -55,3 +59,25 @@ def test_to_remote_storage(retrieval_job):
5559
mock_get_file_names_from_copy.assert_called_once_with(ANY, ANY)
5660
native_path = mock_get_file_names_from_copy.call_args[0][1]
5761
assert re.match("^s3://.*", native_path), "path should be s3://*"
62+
63+
64+
def test_snowflake_to_df_internal(
65+
retrieval_job: SnowflakeRetrievalJob, mocker: MockFixture
66+
):
67+
mock_execute = mocker.patch(
68+
"feast.infra.offline_stores.snowflake.execute_snowflake_statement"
69+
)
70+
mock_execute.return_value.fetch_pandas_all.return_value = pd.DataFrame.from_dict(
71+
{"feature1": ['["1", "2", "3"]', None, "[]"]} # For Valid, Null, and Empty
72+
)
73+
74+
feature_view = FeatureView(
75+
name="my-feature-view",
76+
entities=[],
77+
schema=[
78+
Field(name="feature1", dtype=Array(String)),
79+
],
80+
source=FileSource(path="dummy.path"), # Dummy value
81+
)
82+
retrieval_job._feature_views = [feature_view]
83+
retrieval_job._to_df_internal()

0 commit comments

Comments
 (0)