Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CODE REVIEW ONLY] Host tree algorithms #12

Open
wants to merge 40 commits into
base: enh-json_code_reorg1
Choose a base branch
from

Conversation

karthikeyann
Copy link
Owner

@karthikeyann karthikeyann commented Sep 19, 2024

DO NOT MERGE.
For review purpose only.

@karthikeyann karthikeyann changed the title Host tree algorithms - [CODE REVIEW ONLY] Host tree algorithms Sep 19, 2024
galipremsagar and others added 13 commits September 19, 2024 17:06
… for `cudf.pandas` (rapidsai#16739)

This PR introduces GPU and CPU usage reporting to cudf.pandas pytest suite and the generated metrics will be available for viewing in the existing pandas pytest summary page:
https://github.com/rapidsai/cudf/actions/runs/10886370333/attempts/1#summary-30220192117

![Screenshot 2024-09-16 at 2 39 07 PM](https://github.com/user-attachments/assets/6d31c7d2-8a27-4f02-bf9d-c1b40ad1d756)


Note: I'm aware of cases of where both GPU and CPU usage show 0%, which is due to various reasons that I'm working on addressing in a follow-up PR.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Murray (https://github.com/Matt711)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#16739
This PR fixes several documentation issues uncovered while working on rapidsai#16619. There are no actual code changes.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Mark Harris (https://github.com/harrism)

URL: rapidsai#16822
`MultiIndex._poplevel`, which backs `MultiIndex.droplevel`, operates by dropping a given level inplace. There 2 places where `._poplevel` is called, and both usages makes a shallow copy of the data first, presumably to work around side effects of this inplace behavior.

This PR remove the `MultiIndex._poplevel` implementation and just implements dropping level like behavior by just returning a new object.

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: rapidsai#16767
rapidsai#16787)

The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce:

```
./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1
```
 
The root cause is that some (1) `thread_local`  objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA.

This simple PR is the cuDF side of the fix. The other part is done here rapidsai/kvikio#462.

closes rapidsai#13229

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#16787
…ilable (rapidsai#16652)

There are some places where a public object like `DataFrame` or `Index` accesses a `ColumnAccessor` attribute when it's accessible in a shared subclass attribute instead (like `Frame`).

In an effort to access the `ColumnAccessor` less, replaced usages of `._data.attribute` with a `Frame` specific attribute`

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: rapidsai#16652
This PR switches pytest traceback to `native` instead of prettified pytest traceback that takes longer to finish and spits out the source code of the file where the error happens too which is not needed given the time savings.

With pytest traceback:
<img width="1063" alt="Screenshot 2024-09-19 at 2 34 57 PM" src="https://github.com/user-attachments/assets/9658dd5a-eeb9-4ded-8c77-21b71c74d0a5">
<img width="1073" alt="Screenshot 2024-09-19 at 2 35 07 PM" src="https://github.com/user-attachments/assets/b8500e8a-9d7d-4c0d-8b9a-b2546a0741ee">
<img width="1065" alt="Screenshot 2024-09-19 at 2 35 20 PM" src="https://github.com/user-attachments/assets/a7c2925d-f94d-4b74-97a5-e3d2a0ebf36c">

With `native` traceback:
<img width="713" alt="Screenshot 2024-09-19 at 2 34 04 PM" src="https://github.com/user-attachments/assets/e540bc4b-c351-4815-b2dd-dfe4bb491ecb">

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Richard (Rick) Zamora (https://github.com/rjzamora)

URL: rapidsai#16851
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants