Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in FileOfflineStore.get_historical_features.<locals>.evaluate_historical_retrieval() #3069

Closed
franciscojavierarceo opened this issue Aug 11, 2022 · 3 comments · Fixed by #3088

Comments

@franciscojavierarceo
Copy link
Member

Expected Behavior

.to_df() should return a dataframe.

Current Behavior

When trying to execute .to_df() after executing .to_df(validation_reference=<some-validation-reference/>) the self.evaluation_function().compute() call in the _to_df_internal() method inside the FileRetrievalJob class fails.

While the Data Quality monitoring feature the is implemented within validation_reference is still in alpha, .to_df() should not be an issue.

Steps to reproduce

I've provided a minimally reproducible example in this notebook

Specifications

  • Version: 0.23
  • Platform: Python3.8
  • Subsystem:

Possible Solution

Following the stack trace it appears that there's an issue with the created date.

@felixwang9817
Copy link
Collaborator

hey @franciscojavierarceo, thanks for reporting this and adding a notebook! I was able to repro the bug very easily

I think the issue is that that your saved dataset is being stored in data/driver_stats.parquet, which is the location of the file source - hence the file source is being overwritten (and in particular, the data at that location no longer as a created column), and so once you try to run the historical retrieval job again, it fails since the created column has essentially been deleted from its perspective

is there a particular reason you're trying to save the dataset to data/driver_stats.parquet?

@achals
Copy link
Member

achals commented Aug 11, 2022

I think the issue is that that your saved dataset is being stored in data/driver_stats.parquet, which is the location of the file source - hence the file source is being overwritten (and in particular, the data at that location no longer as a created column),

Should we prevent overwriting any existing files?

@felixwang9817
Copy link
Collaborator

@achals yup I think that's the correct solution here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants