Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of policies and guidelines for libcudf usage. #11853

Merged
merged 6 commits into from
Oct 18, 2022

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Oct 3, 2022

Description

This PR adds a section to the developer documentation about various libcudf design decisions that affect users. These policies are important for us to document and communicate consistently. I am not sure what the best place for this information is, but I think the developer docs are a good place to start since until we address #11481 we don't have a great way to publish any non-API user-facing libcudf documentation. I've created this draft PR to solicit feedback from other libcudf devs about other policies that we should be documenting in a similar manner. Once everyone is happy with the contents, I would suggest that we merge this into the dev docs for now and then revisit a better place once we've tackled #11481.

Partly addresses #5505, #1781.

Resolves #4511.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added 3 - Ready for Review Ready for review by team doc Documentation libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 3, 2022
@vyasr vyasr self-assigned this Oct 3, 2022
@vyasr vyasr added libcudf Affects libcudf (C++/CUDA) code. and removed libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function labels Oct 3, 2022
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small clarification request.

cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved

Users are therefore responsible for passing valid data into such APIs.
_Note that this policy does not mean that libcudf performs no validation whatsoever_.
libcudf APIs should still perform any validation that does not require introspection, such as checking whether input column/table sizes or dtypes are appropriate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extends to output sizes and results as well.

In general there is no validation that an operation will produce a column that fits in the 2GB limit. There are a few minor exceptions where the validation does not require an extra kernel launch.

Also, operations that may produce overflow (integer or floating-point arithmetic) are also not checked. Overflow produces undefined results. For example, integer overflow to negative values is not guaranteed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent here was not to provide exhaustive lists of things we will and will not check, but rather to provide some examples of cases to illustrate the difference between introspective and non-introspective operations. I am wary of trying to maintain a complete list that will inevitably go out of date the day after it is published. Do you think it's worth trying to keep a larger list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I would like to keep a list of things that keep being brought up (like these two here) so that I can point to this document to say this is by design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this does not need to be exhaustive, but it would be nice if this policy guide can serve as a mini-FAQ of "why is libcudf designed this way?" I think that the concrete examples @davidwendt mentioned help illustrate the philosophy of the library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some, and I've reworded this section to reflect that we want to keep a list but make no promises that it's exhaustive.

cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
@vyasr vyasr marked this pull request as ready for review October 17, 2022 21:06
@vyasr vyasr requested a review from a team as a code owner October 17, 2022 21:06
@vyasr vyasr added this to the Enable streams milestone Oct 17, 2022
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see us writing down some things that have been tribal knowledge for a while now.

Co-authored-by: Jake Hemstad <jhemstad@nvidia.com>
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments. Thanks!

cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
@vyasr
Copy link
Contributor Author

vyasr commented Oct 18, 2022

@davidwendt if you are satisfied with how I've addressed your comments feel free to merge this once you approve.

@davidwendt
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5d57159 into rapidsai:branch-22.12 Oct 18, 2022
@vyasr vyasr deleted the docs/libcudf_policies branch October 18, 2022 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team doc Documentation libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Discuss and document expected stream synchronization behavior of libcudf functions
5 participants