Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Remove unused code for streaming position deletes #11175

Merged
merged 3 commits into from
Sep 26, 2024

Conversation

wypoon
Copy link
Contributor

@wypoon wypoon commented Sep 20, 2024

Follow up to #9117.
Prior to that PR, there were two code paths in DeleteFilter for applying position deletes: if the number of position deletes were below a certain number, we use the PositionDeleteIndex bitmap and otherwise we use a streaming filter. That PR makes it so that we always use the first code path, but did not remove the now unused code for the second code path.
This PR is to remove most of the unused code. It includes removing tests which were only added to test that second code path as well as bugs found in that code path (#8132). The streamingFilter and streamingMarker public methods in Deletes are not removed but are deprecated; they can be removed in the version after 1.7. However, the implementation of the methods is replaced by the much simpler implementation using the PositionDeleteIndex bitmap (first code path mentioned above) and the backing classes for the streaming implementation is removed.

@github-actions github-actions bot added the core label Sep 20, 2024
@wypoon wypoon changed the title [DRAFT] Remove unused code for streaming position deletes Core: Remove unused code for streaming position deletes Sep 20, 2024
@wypoon
Copy link
Contributor Author

wypoon commented Sep 20, 2024

@amogh-jahagirdar @szehon-ho @aokolnychyi please review.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wypoon but I think we should deprecate these APIs and remove at a future point (2.0) instead of removing outright.

core/src/main/java/org/apache/iceberg/deletes/Deletes.java Outdated Show resolved Hide resolved
Implement them the same way that the non-streaming versions are implemented
so we can remove the classes that are no longer needed.
@wypoon
Copy link
Contributor Author

wypoon commented Sep 20, 2024

@amogh-jahagirdar I have restored the removed public static methods in Deletes and deprecated them instead. However, I have replaced their implementations with the equivalent of using the non-streaming code path.
See

// if there are fewer deletes than a reasonable number to keep in memory, use a set
if (posDeletes.stream().mapToLong(DeleteFile::recordCount).sum() < setFilterThreshold) {
PositionDeleteIndex positionIndex = Deletes.toPositionIndex(filePath, deletes);
Predicate<T> isDeleted = record -> positionIndex.isDeleted(pos(record));
return createDeleteIterable(records, isDeleted);
}
return hasIsDeletedColumn
? Deletes.streamingMarker(
records, this::pos, Deletes.deletePositions(filePath, deletes), this::markRowDeleted)
: Deletes.streamingFilter(
records, this::pos, Deletes.deletePositions(filePath, deletes), counter);
}

and
private CloseableIterable<T> createDeleteIterable(
CloseableIterable<T> records, Predicate<T> isDeleted) {
return hasIsDeletedColumn
? Deletes.markDeleted(records, isDeleted, this::markRowDeleted)
: Deletes.filterDeleted(records, isDeleted, counter);
}

for reference.
The same change is in #11177 except there the tests using Deletes.streamingFilter and Deletes.streamingMarker have not been removed, so you can see that the tests pass (for myself, I ran the tests locally). Since the objective is to remove code, I did not want to restore the tests here. #11177 is not intended for merging.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wypoon! I'll wait a bit in case @nastra had anything else, and then i'll merge.

@@ -110,155 +106,6 @@ public void testPositionMerging() {
.containsExactly(0L, 3L, 3L, 9L, 16L, 19L, 19L, 22L, 22L, 56L, 63L, 70L, 91L);
}

@Test
public void testPositionStreamRowFilter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer not removing the tests until the deprecated method is actually removed from the code, but I think in this case it's fine since it's not a typical public API and rather a utility.

core/src/main/java/org/apache/iceberg/deletes/Deletes.java Outdated Show resolved Hide resolved
@wypoon
Copy link
Contributor Author

wypoon commented Sep 25, 2024

The Flink CI issues are unrelated to this change. All tests passed prior to my last commit, and the only change in the last commit was updating javadoc comments.

@amogh-jahagirdar
Copy link
Contributor

Thanks @wypoon , merging!

@amogh-jahagirdar amogh-jahagirdar merged commit b1d38b3 into apache:main Sep 26, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants