Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring towards GPU support #794

Merged
merged 27 commits into from
Oct 11, 2019
Merged

Refactoring towards GPU support #794

merged 27 commits into from
Oct 11, 2019

Conversation

jonmmease
Copy link
Collaborator

@jonmmease jonmmease commented Oct 3, 2019

Overview

This PR lays the foundation for adding GPU support to Datashader. It contains the refactoring portion of #793, without adding the CUDA specific code. After this PR is reviewed and merged, I'll open another PR that adds the cuda support on top of this.

Fix

While refactoring, I found the cause of #792, which is fixed in 30a2ae2. I think we could cherry pick this commit into a separate PR if folks would rather it be separated from this PR.

Changes

  • Move the data library specific modules (pandas, dask, xarray) under a new data_libraries package. When cudf support is added, a new module will be added here. (258ceca )
  • Replace the former Glyph._compute_x_bounds and Glyph._compute_y_bounds methods with a single Glyph._compute_bounds method. The old methods assumed the input was a numpy array. The new method checks if it is a numpy array or a pandas series. When cudf is added, this method will also check if the input is a cudf.Series object. (40c3b61 )
  • Refactor reduction methods to have a separate "append" method for the following cases: No reduction column is used (e.g. ds.count()), an integer reduction column is used (e.g. ds.count('int_column')), a float reduction column is used (e.g. ds.count('float_column')). This is needed because numba's cuda mode does not seem support default arguments, and it doesn't allow the use of isnan on integer columns. (7b31140)
  • Extract use of np.interp in preparation for swapping in a different implementation for cudf.ndarray objects. (63ec2fd)
  • Explicitly cast result of np.nonzero to an integer as the cudf equivalent results in scalar array instead. (cc3ff21)
  • Various cleanup of the _interpolate function, which is called by shade (7a10a49)
    • Unify int/float/bool logic paths:
    • Add masked_clip_2d function that clips an array of either float or int values. There will be a cupy specific implementation of this later.
    • Copy input data once and perform offset in-place
    • construct uint32 view once at the end of function rather than in branches
    • Don't interpolate r/g/b unnecessarily when a single color is passed as cmap.
  • Isolate pandas specific code in dshape_from_pandas_helper in preparation for supporting cudf: cca1b87
  • Add internal _sum_zero reduction to compute sum from an array of zeros. The standard sum reduction starts with an array of nans and the append operation has a different logic branch depending on whether a pixel starts as nan. This approach is not compatible with the cuda atomic operations. For cuda, sum will be implemented by combining _sum_zero with any. (cf7f5c8)
  • Refactor points, line, and area rendering to extract the inner loop as a cuda compatible function. (b57e0a9)
  • Expand the bounds and vt tuples in the signatures of numba functions as numba.cuda.jit doesn't seem to support tuples. (b57e0a9, 7fe52db).
  • Convert tuples of columns passed to numba functions into 2D arrays as as numba.cuda.jit doesn't seem to support tuples. (b57e0a9).
  • Convert from numpy.log and numpy.isnan to math.log and math.isnan for use in numba functions because numba.cuda.jit doesn't support the numpy versions. (e073961)
  • parameterize relevant test functions over data frame and array types. This will make it easier to add tests for new data frame (cudf) and array (cupy) types. (5e1d180)
  • Add custom test equality assertion functions for xarray and pandas. These will provide a central place to handling of numpy-like (e.g. cupy) and pandas-like (e.g. cudf) objects. (45a979d)
  • Convert the _bases and _temps properties of Reduction objects into methods (_build_bases, _build_temps). This way we can pass configuration options to them (like whether or not to operate in cuda mode). (517fc97)

Currently only numpy, but will make it easier to support other numpy-like libraries
This ends up being a scalar array when called with a cupy argument
 - Unify int/float/bool logic paths:
 - Add masked_clip_2d function that clips an array of either float or int values
 - Copy input data and offset in place
 - construct uint32 view once at the end of function rather than in branches
 - Don't interpolate r/g/b unnecessarily when a single color is passed as cmap.
…rather than nans)

Prior structure was not compatible with the numba.cuda.atomic.add operation
These will provide a central place to handling of numpy-like (e.g. cupy) and pandas-like (e.g. cudf) objects.
This way we can later pass configuration arguments this them
@jonmmease jonmmease added the wip label Oct 10, 2019
@jonmmease
Copy link
Collaborator Author

I think I've figured out a way to support line/area with axis=0 with cuda. So I'm going to do a bit more line/area refactorting on this PR.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of 517fc97, I've reviewed all the commits in this PR, and as far as I can tell they are all mergeable. So, once you're done with any further commits, I should be able to review those quickly and merge the whole PR. Thanks for all this!

Fix area glyph tests: Slight changes here are due to the fact that
these tests are now performed in data, rather than pixel, coordinates
@jonmmease
Copy link
Collaborator Author

Ok, I did a bit more refactoring of lines in d7f3650 and area in b5dbf1d.

The goal was to make it possible to support parallelization with axis=0. The reason that this was not as straightforward as axis=1 is because the prior logic to calculate breaks due to nans and out-of-bounds segments was sequential in nature.

I also managed to eliminate a fair amount of code duplication by having the draw_segment and draw_trapezoid functions input values in data, rather than pixel, coordinates. These functions now perform their own skipping/clipping logic, so this logic is no longer duplicated in every line line/area rendering function.

@jonmmease jonmmease removed the wip label Oct 11, 2019
Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commits also look good, thanks. I have only one concern before merging: the changes to line and area handling look extensive enough that they could have performance implications. Have you run a basic performance comparison before and after this PR, to make sure nothing got terribly slower? As long as you've made some basic check like that, I'm happy to merge now.

@jonmmease
Copy link
Collaborator Author

Thanks, I've done several benchmarks and I'm seeing equivalent performance (with +-5%). e.g., for 10 million points with count aggregation

  • points: From 65.2 ms -> 64.3 ms (after)
  • line: From 256 ms -> 268 ms (after)
  • area: 1800 ms -> 1750 ms (after)

@jbednar
Copy link
Member

jbednar commented Oct 11, 2019

Looks great, thanks!!

@jbednar jbednar merged commit 13d651e into master Oct 11, 2019
@maximlt maximlt deleted the pre_gpu_refactor branch December 25, 2021 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants