-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add __eq__, __hash__ to SparkSource for correct comparison #4028
Merged
Merged
Changes from 3 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
2f37e07
feat: Enable Arrow-based columnar data transfers
ElliotNguyen68 8eb0c59
Merge branch 'feast-dev:master' into master
ElliotNguyen68 c1c9990
fix: Add __eq__, __hash__ to SparkSource for comparision
ElliotNguyen68 add7f3a
chore: simplify the logic
ElliotNguyen68 6cba191
Merge branch 'feast-dev:master' into master
ElliotNguyen68 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nitpicking :) Won't this look better as a single expression? something like
return table == table and query == query ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they have priority, I see that there is a kind of hierarchy in the way we get the data out of sparksource in get_table_query_string function, if not table provided then read query if not read path. So I assume that there is a case that like this
sparksourceA(table = 'tableA', query=None) and after that it change to sparksourceA(table = 'tableA', query='select some thing from sometable'), how should we compare in this case. I still have some concern about the logic of providing table, query and path when init sparksource, isn't it we just need 1 of those to be provided, and 2 remained will be null, should we have some contraint on this? If yes so we can do some things like return table == table and query == query ... @tokoko @HaoXuAI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are good points, but I think we should still simply check for absolute equality, so priority doesn't really matter, if there is a change in
query
or if user changesquery
toNone
and switches to usingtable
, this method should returnFalse
. In some cases, the change might not actually have an impact i guess, but it's still a change. So a single expression should be fine.As for the question about 3 competing parameters (table, query and path), some sort of a constraint is probably a good idea, but imho the whole idea of using
SparkSource
for everything was a really bad decision in the first place. If the user wants to register a parquet folder as a data source for example, i believe the way to do that should be by providingFileSource
object instead of aSparkSource
one... To achieve that, we need to teachSparkOfflineStore
how to readFileSource
and deprecatepath
parameter fromSparkSource
... but that's a discussion for another day and a little out of scope here :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I think we can by pass the order, thanks @tokoko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HaoXuAI , I think we can merge the pr .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HaoXuAI , can you help me with the pr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. thanks for the work 👍