Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement iloc-getitem using parse-don't-validate approach #13534

Merged
merged 25 commits into from
Jul 14, 2023

Commits on Jun 23, 2023

  1. Add iloc-getitem benchmarks

    wence- committed Jun 23, 2023
    Configuration menu
    Copy the full SHA
    93c1d21 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    64b093e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6b649bd View commit details
    Browse the repository at this point in the history
  4. Implement iloc-getitem using parse-don't-validate approach

    To simplify the low-level implementation of iloc-based getitem on both
    Series and DataFrames, change the dispatching approach to parse the
    user-provided "unstructured" key into structured data (a tagged
    union using an enum + tuple). At the libcudf level, there are four
    styles of indexing we can do:
    
    1. index by slice
    2. index by mask
    3. index by map
    4. index by scalar
    
    iloc keys are parsed into information that tags them by type and
    normalises the key to an appropriate column or other low-level object.
    
    This centralises the business logic for index parsing in a
    single place, and ensures that downstream consumers of the validated
    and normalised indexer don't need to inspect it again to determine
    what to do. Note that we treat index by scalar as composition of index
    by map with get_element (since that simplifies the logic when
    extracting the single row of a dataframe: we want to keep it on
    device), but the scalar "type tag" allows us to determine this
    unambiguously without reinspecting the key.
    
    The major benefits will come when updating loc-based getitem (where
    the parsing rules are more complicated, but eventually turn into one
    of the above four cases). In this latter case, we will no longer
    attempt to turn a loc-based key into a "user-facing" key for iloc, but
    rather will call directly into the pre-parsed interface.
    
    That said, we already provide some performance improvements since we
    only do inspection once.
    
    - Closes rapidsai#13013
    - Closes rapidsai#13267
    - Closes rapidsai#13515
    wence- committed Jun 23, 2023
    Configuration menu
    Copy the full SHA
    5094f49 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8ad58a8 View commit details
    Browse the repository at this point in the history
  6. Use _gather for scalar indexing

    Can't use libcudf.copying.gather since we need to do some
    post-processing on categorical and struct columns. Staying in the
    Series API gets us that for free.
    wence- committed Jun 23, 2023
    Configuration menu
    Copy the full SHA
    b43d93a View commit details
    Browse the repository at this point in the history
  7. Introduce GatherMap and BooleanMask

    Also use dataclasses as poor man's ADTs rather than tuple with tag
    field.
    
    Some renaming.
    wence- committed Jun 23, 2023
    Configuration menu
    Copy the full SHA
    a479a34 View commit details
    Browse the repository at this point in the history
  8. Minor simplifications

    wence- committed Jun 23, 2023
    Configuration menu
    Copy the full SHA
    5e4af4a View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    bc44c3f View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    503f4ae View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2023

  1. Configuration menu
    Copy the full SHA
    19637fa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    dbf56b8 View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2023

  1. Configuration menu
    Copy the full SHA
    ad1b21a View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2023

  1. Refactor GatherMap and BooleanMask construction

    Rather than having free functions to construct the witness types, the
    default constructor validates correctness, and a classmethod
    from_column_unchecked allows one to build a witness type asserting
    correctness by fiat.
    wence- committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    b763ebb View commit details
    Browse the repository at this point in the history
  2. Remove walrus

    wence- committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    12e66fc View commit details
    Browse the repository at this point in the history
  3. Adapt benchmark

    wence- committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    b92638d View commit details
    Browse the repository at this point in the history
  4. Minor docstring fixes

    wence- committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    99d3da1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b046539 View commit details
    Browse the repository at this point in the history
  6. Clarify scope of pytest.raises

    wence- committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    1ace86a View commit details
    Browse the repository at this point in the history
  7. Numpydoc formatting

    wence- committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    e547372 View commit details
    Browse the repository at this point in the history
  8. Simplify clamping to range

    wence- committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    892ee14 View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2023

  1. Configuration menu
    Copy the full SHA
    803fbc0 View commit details
    Browse the repository at this point in the history
  2. A few more small fixes

    wence- committed Jul 12, 2023
    Configuration menu
    Copy the full SHA
    762eb1c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    943c58e View commit details
    Browse the repository at this point in the history

Commits on Jul 13, 2023

  1. Configuration menu
    Copy the full SHA
    dffdc4e View commit details
    Browse the repository at this point in the history