-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add leftsemi merge #57979
ENH: Add leftsemi merge #57979
Conversation
khiter_t k | ||
Int64Vector locs = Int64Vector() | ||
Int64Vector self_locs = Int64Vector() | ||
Int64VectorData *l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistic nit but would be better to not use single character variable names, especially those that conflict with debugger aliases. The cython debugger is hard enough to use as is :-)
Int64Vector self_locs = Int64Vector() | ||
Int64VectorData *l | ||
Int64VectorData *sl | ||
# mask not implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we raise NotImplementedError here for now?
|
||
for i in range(n): | ||
val = values[i] | ||
hash(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the point of this just to raise an error if an object is not hashable?
val = values[i] | ||
hash(val) | ||
|
||
k = kh_get_pymap(self.table, <PyObject*>val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the cast required here? I assume it would know we are already working with PyObjects given the declaration of values
@@ -1683,7 +1681,7 @@ def get_join_indexers( | |||
left_keys: list[ArrayLike], | |||
right_keys: list[ArrayLike], | |||
sort: bool = False, | |||
how: JoinHow = "inner", | |||
how: JoinHow + Literal["leftsemi"] = "inner", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why we don't just add leftsemi
to the JoinHow
definition?
tm.assert_frame_equal(result, expected.head(1)) | ||
|
||
|
||
def test_leftsemi_invalid(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for raising NotImplementederror when passing a mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerned about whether this argument will apply to DataFrame.join()
.
@@ -326,6 +326,11 @@ | |||
join; sort keys lexicographically. | |||
* inner: use intersection of keys from both frames, similar to a SQL inner | |||
join; preserve the order of the left keys. | |||
* leftsemi: Filter for rows in the left that have a match on the right; | |||
preserve the order of the left keys. Doesn't support `left_index`, `right_index`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preserve the order of the left keys. Doesn't support `left_index`, `right_index`, | |
preserve the order of the left keys. Does not support `left_index`, `right_index`, |
@@ -447,7 +447,7 @@ def closed(self) -> bool: | |||
AnyAll = Literal["any", "all"] | |||
|
|||
# merge | |||
MergeHow = Literal["left", "right", "inner", "outer", "cross"] | |||
MergeHow = Literal["left", "right", "inner", "outer", "cross", "leftsemi"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MergeHow
is also used as the type for how
in DataFrame.join()
.
So if DataFrame.join()
will not support the "leftsemi"
operation, then you'll need to split this type into a JoinHow
and MergeHow
, with the latter dependent on the former.
If DataFrame.join()
will support that operation, then you'll have to update the docs for join
and add a test for it.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Closing to clear the queue, but feel free to update and reopen anytime |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.I want to add anti joins as a follow up, which can be incorporated into the new class