Only check dask.DataFrame dtypes of columns actually used #1236

ianthomas23 · 2023-06-16T10:32:47Z

In our dask DataFrame workflows we use a prediction of a dtype to return, and previously we tried to calculate one that suited all columns of the DataFrame. This fix restricts the calculation to only look at the columns that we actually use.

In terms of implementation, the columns used have already been identified in the compile_components function so we just need to return them to all callers, and the dask workflow now uses just those columns.

I have been really conservative here. Using up-to-date dependent packages the predicted dtype doesn't matter at all, I can put in anything here and datashader works as expected. But given that this code does some potentially risky things with dask internals I do not want to change it any more than necessary.

codecov · 2023-06-16T11:07:54Z

Codecov Report

Merging #1236 (441fed4) into main (9f5b411) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1236   +/-   ##
=======================================
  Coverage   83.52%   83.52%           
=======================================
  Files          35       35           
  Lines        8777     8778    +1     
=======================================
+ Hits         7331     7332    +1     
  Misses       1446     1446

Impacted Files	Coverage Δ
datashader/compiler.py	`88.60% <100.00%> (+0.05%)`	⬆️
datashader/data_libraries/dask.py	`92.85% <100.00%> (-2.39%)`	⬇️
datashader/data_libraries/dask_xarray.py	`98.95% <100.00%> (ø)`
datashader/data_libraries/pandas.py	`100.00% <100.00%> (ø)`

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Only check dask.DataFrame dtypes of columns actually used

441fed4

ianthomas23 added the bug label Jun 16, 2023

ianthomas23 added this to the v0.15.1 milestone Jun 16, 2023

ianthomas23 merged commit 6dce648 into holoviz:main Jun 19, 2023

ianthomas23 deleted the 1235_dask_column_dtypes branch June 19, 2023 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only check dask.DataFrame dtypes of columns actually used #1236

Only check dask.DataFrame dtypes of columns actually used #1236

ianthomas23 commented Jun 16, 2023

codecov bot commented Jun 16, 2023 •

edited

Loading

Only check dask.DataFrame dtypes of columns actually used #1236

Only check dask.DataFrame dtypes of columns actually used #1236

Conversation

ianthomas23 commented Jun 16, 2023

codecov bot commented Jun 16, 2023 • edited Loading

Codecov Report

codecov bot commented Jun 16, 2023 •

edited

Loading