Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate cudf::make_strings_column accepting typed offsets #14461

Merged
merged 11 commits into from
Dec 5, 2023

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Nov 21, 2023

Description

Deprecates the following factory functions:

std::unique_ptr<column> make_strings_column(
  cudf::device_span<char const> strings,
  cudf::device_span<size_type const> offsets,
  cudf::device_span<bitmask_type const> null_mask,
  size_type null_count,
  rmm::cuda_stream_view stream,
  rmm::mr::device_memory_resource* mr);

std::unique_ptr<column> make_strings_column(size_type num_strings,
  rmm::device_uvector<size_type>&& offsets,
  rmm::device_uvector<char>&& chars,
  rmm::device_buffer&& null_mask,
  size_type null_count);

The first one is not needed, rarely used, and inefficient. The only callers found can use other, more efficient functions which do not require copying the device data. The 2nd one is only used in 8/150 places and can be easily recoded in place to use type-erased factory function instead.
Removing these APIs helps reduce the number of functions that require type-specific offsets to make it easier to support offsets of larger types in the future.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function breaking Breaking change deprecation labels Nov 21, 2023
@davidwendt davidwendt self-assigned this Nov 21, 2023
@github-actions github-actions bot added the Java Affects Java cuDF API. label Nov 22, 2023
@davidwendt davidwendt changed the title Deprecate cudf::make_strings_column accepting device-span offsets, chars Deprecate cudf::make_strings_column accepting typed offsets Nov 22, 2023
@davidwendt davidwendt marked this pull request as ready for review November 22, 2023 21:23
@davidwendt davidwendt requested review from a team as code owners November 22, 2023 21:23
@davidwendt davidwendt marked this pull request as draft November 22, 2023 21:23
@davidwendt davidwendt marked this pull request as ready for review November 30, 2023 15:47
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some follow-up questions

template <> // CUDF_TYPE_MAPPING(char,INT8) causes duplicate id_to_type_impl definition
constexpr inline type_id type_to_id<char>()
{
return type_id::INT8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any need to have actual CHAR data type in cudf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Because we would just remove it with PR #14202

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking outside of the context of strings columns. Just a char column as such.

Copy link
Contributor Author

@davidwendt davidwendt Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot think of a reason to have one outside of strings.
I fear it would create issues in how the column data is interpretted/encoded.

cpp/include/cudf/column/column_factories.hpp Show resolved Hide resolved
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 1, 2023
@davidwendt
Copy link
Contributor Author

@GregoryKimball asked me to cc @etseidl on this PR in case you are using these APIs in any current pending work.
The deprecated APIs will likely be removed in release 24.04.

@etseidl
Copy link
Contributor

etseidl commented Dec 1, 2023

The deprecated APIs will likely be removed in release 24.04.

Thanks for the heads up...I believe I looked at this PR earlier and didn't think there'd be any problems...I'll do a closer look in a bit.

Edit: I only seem to use

std::unique_ptr<column> make_strings_column(size_type num_strings,
                                            std::unique_ptr<column> offsets_column,
                                            std::unique_ptr<column> chars_column,
                                            size_type null_count,
                                            rmm::device_buffer&& null_mask);

so I'm good.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 434df44 into rapidsai:branch-24.02 Dec 5, 2023
67 checks passed
@davidwendt davidwendt deleted the remove-strings-factory branch December 5, 2023 12:05
rapids-bot bot pushed a commit that referenced this pull request Jan 4, 2024
A call to a deprecated `cudf::make_strings_column` factory function was introduced in #14664.
This call had been previously fixed in #14461 but was lost when the source file was moved in #14664.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14695
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants