Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda release setup #66

Merged
merged 8 commits into from
Nov 18, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTIN

### Adding support for new viz libraries

cuXfilter.py acts like a connector library and it is easy to add support for new libraries. The cuxfilter/charts/core directory has all the core chart classes which can be inherited and used to implement a few (viz related) functions and support dashboarding in cuXfilter directly.
cuxfilter.py acts like a connector library and it is easy to add support for new libraries. The cuxfilter/charts/core directory has all the core chart classes which can be inherited and used to implement a few (viz related) functions and support dashboarding in cuxfilter directly.

You can see the examples to implement viz libraries in the bokeh and cudatashader directories.

Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
# <div align="left"><img src="https://rapids.ai/assets/images/rapids_logo.png" width="90px"/>&nbsp; cuXfilter
# <div align="left"><img src="https://rapids.ai/assets/images/rapids_logo.png" width="90px"/>&nbsp; cuxfilter

cuXfilter ( ku-cross-filter ) is a [RAPIDS](https://github.com/rapidsai) framework to connect web visualizations to GPU accelerated crossfiltering. Inspired by the javascript version of the [original]( https://github.com/crossfilter/crossfilter), it enables interactive and super fast multi-dimensional filtering of 100 million+ row tabular datasets via [cuDF](https://github.com/rapidsai/cudf).
cuxfilter ( ku-cross-filter ) is a [RAPIDS](https://github.com/rapidsai) framework to connect web visualizations to GPU accelerated crossfiltering. Inspired by the javascript version of the [original]( https://github.com/crossfilter/crossfilter), it enables interactive and super fast multi-dimensional filtering of 100 million+ row tabular datasets via [cuDF](https://github.com/rapidsai/cudf).

## RAPIDS Viz
cuXfilter is one of the core projects of the “RAPIDS viz” team. Taking the axiom that “a slider is worth a thousand queries” from @lmeyerov to heart, we want to enable fast exploratory data analytics through an easier-to-use pythonic notebook interface.
cuxfilter is one of the core projects of the “RAPIDS viz” team. Taking the axiom that “a slider is worth a thousand queries” from @lmeyerov to heart, we want to enable fast exploratory data analytics through an easier-to-use pythonic notebook interface.

As there are many fantastic visualization libraries available for the web, our general principle is not to create our own viz library, but to enhance others with faster acceleration, larger datasets, and better dev UX. **Basically, we want to take the headache out of interconnecting multiple charts to a GPU backend, so you can get to visually exploring data faster.**

By the way, cuXfilter is best used to interact with large (1 million+) tabular datasets. GPU’s are fast, but accessing that speedup requires some architecture overhead that isn’t worthwhile for small datasets.
By the way, cuxfilter is best used to interact with large (1 million+) tabular datasets. GPU’s are fast, but accessing that speedup requires some architecture overhead that isn’t worthwhile for small datasets.

For more detailed requirements, see below.

## cuXfilter.py Architecture
## cuxfilter.py Architecture

The python version of cuXfilter leverage jupyter notebook and bokeh server to greatly reduce backend complexity. Currently we are focusing development efforts on the python version instead of the older javascript version.
The python version of cuxfilter leverage jupyter notebook and bokeh server to greatly reduce backend complexity. Currently we are focusing development efforts on the python version instead of the older javascript version.

<img src="https://github.com/rapidsai/cuxfilter/blob/master/docs/_images/RAPIDS%20Viz%20EcoSystem%20v2.png" />

### What is cuDataTiles?

cuXfilter.py implements cuDataTiles, a GPU accelerated version of data tiles based on the work of [Falcon](https://github.com/uwdata/falcon). When starting to interact with specific charts in a cuXfilter dashboard, values for the other charts are precomputed to allow for fast slider scrubbing without having to recalculate values.
cuxfilter.py implements cuDataTiles, a GPU accelerated version of data tiles based on the work of [Falcon](https://github.com/uwdata/falcon). When starting to interact with specific charts in a cuxfilter dashboard, values for the other charts are precomputed to allow for fast slider scrubbing without having to recalculate values.

### Open Source Projects

cuXfilter wouldn’t be possible without using these great open source projects:
cuxfilter wouldn’t be possible without using these great open source projects:

- [Bokeh](https://bokeh.pydata.org/en/latest/)
- [DataShader](http://datashader.org/)
Expand All @@ -32,22 +32,22 @@ cuXfilter wouldn’t be possible without using these great open source projects:
- [Jupyter](https://jupyter.org/about)


### Where is the original cuXfilter and Mortgage Viz Demo?
### Where is the original cuxfilter and Mortgage Viz Demo?

The original version (0.2) of cuXfilter, most known for the backend powering the Mortgage Viz Demo, has been moved into the [`GTC-2018-mortgage-visualization branch`](https://github.com/rapidsai/cuxfilter/tree/GTC-2018-mortgage-visualization). As it has a much more complicated backend and javascript API, we’ve decided to focus more on the streamlined notebook focused version in the `/python` folder.
The original version (0.2) of cuxfilter, most known for the backend powering the Mortgage Viz Demo, has been moved into the [`GTC-2018-mortgage-visualization branch`](https://github.com/rapidsai/cuxfilter/tree/GTC-2018-mortgage-visualization). As it has a much more complicated backend and javascript API, we’ve decided to focus more on the streamlined notebook focused version in the `/python` folder.

## Usage

```python
import cuXfilter
from cuXfilter import charts
import cuxfilter
from cuxfilter import charts

#update data_dir if you have downloaded datasets elsewhere
DATA_DIR = './data'
from cuXfilter.sampledata import datasets_check
from cuxfilter.sampledata import datasets_check
datasets_check('auto_accidents', base_dir=DATA_DIR)

cux_df = cuXfilter.DataFrame.from_arrow('./data/auto_accidents.arrow')
cux_df = cuxfilter.DataFrame.from_arrow('./data/auto_accidents.arrow')
cux_df.data['ST_CASE'] = cux_df.data['ST_CASE'].astype('float64')

label_map = {1: 'Sunday', 2: 'Monday', 3: 'Tuesday', 4: 'Wednesday', 5: 'Thursday', 6: 'Friday', 7: 'Saturday', 9: 'Unknown'}
Expand All @@ -61,7 +61,7 @@ chart3 = charts.bokeh.bar('DAY_WEEK', x_label_map=label_map)
chart4 = charts.bokeh.bar('MONTH')

#declare dashboard
d = cux_df.dashboard([chart1, chart2, chart3, chart4], layout=cuXfilter.layouts.feature_and_double_base,theme = cuXfilter.themes.light, title='Auto Accident Dataset')
d = cux_df.dashboard([chart1, chart2, chart3, chart4], layout=cuxfilter.layouts.feature_and_double_base,theme = cuxfilter.themes.light, title='Auto Accident Dataset')

#preview the dashboard inside the notebook(non-interactive) with layout
await d.preview()
Expand All @@ -83,7 +83,7 @@ Troubleshooting help can be found [here](https://rapidsai.github.io/cuxfilter/in

## Installation

> You need to have RAPIDS (cudf) installed for cuXfilter to work
> You need to have RAPIDS (cudf) installed for cuxfilter to work


### 1. If installing within the rapidai DOCKER container, follow the following instructions
Expand Down Expand Up @@ -182,11 +182,11 @@ The notebooks inside `python/notebooks` already have a check function which veri
While in the directory you want the datasets to be saved, execute the following

```bash
#go the the environment where cuXfilter is installed. Skip if in a docker container
#go the the environment where cuxfilter is installed. Skip if in a docker container
source activate test_env

#download and extract the datasets
python -c "from cuXfilter.sampledata import datasets_check; datasets_check(base_dir='./')"
python -c "from cuxfilter.sampledata import datasets_check; datasets_check(base_dir='./')"
```

Individual links:
Expand Down Expand Up @@ -217,12 +217,12 @@ Our plan is to **add support in the future** for the following libraries:

## Contributing Developers Guide

cuXfilter.py acts like a connector library and it is easy to add support for new libraries. The `python/cuxfilter/charts/core` directory has all the core chart classes which can be inherited and used to implement a few (viz related) functions and support dashboarding in cuXfilter directly.
cuxfilter.py acts like a connector library and it is easy to add support for new libraries. The `python/cuxfilter/charts/core` directory has all the core chart classes which can be inherited and used to implement a few (viz related) functions and support dashboarding in cuxfilter directly.

You can see the examples to implement viz libraries in the bokeh and cudatashader directories. Let us know if you would like to add a chart by opening a feature request issue or submitting a PR.

For more details, check out the [contributing guide](./CONTRIBUTING.md).

## Future Work
cuXfilter development is in early stages and on going. See what we are planning next on the [projects page](https://github.com/rapidsai/cuxfilter/projects).
cuxfilter development is in early stages and on going. See what we are planning next on the [projects page](https://github.com/rapidsai/cuxfilter/projects).

130 changes: 130 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
#!/bin/bash

# Copyright (c) 2019, NVIDIA CORPORATION.

# cuDF build script

# This script is used to build the component(s) in this repo from
# source, and can be called with various options to customize the
# build as needed (see the help output for details)

# Abort script on first error
set -e

NUMARGS=$#
ARGS=$*

# NOTE: ensure all dir changes are relative to the location of this
# script, and that this script resides in the repo dir!
REPODIR=$(cd $(dirname $0); pwd)

VALIDARGS="clean cuxfilter cudatashader -v -g -n --allgpuarch -h"
HELP="$0 [clean] [cuxfilter] [cudatashader] [-v] [-g] [-n] [-h]
clean - remove all existing build artifacts and configuration (start
over)
cuxfilter - build the cuxfilter library only
cudatashader - build the cudatashader library only
-v - verbose build mode
-g - build for debug
-n - no install step
--allgpuarch - build for all supported GPU architectures
-h - print this text
"
CUXFILTER_BUILD_DIR=${REPODIR}/python/cuxfilter/build
CUDATASHADER_BUILD_DIR=${REPODIR}/../cuDatashader/build
BUILD_DIRS="${CUXFILTER_BUILD_DIR} #{CUDATASHADER_BUILD_DIR}"

# Set defaults for vars modified by flags to this script
VERBOSE=""
BUILD_TYPE=Release
INSTALL_TARGET=install
BENCHMARKS=OFF
BUILD_ALL_GPU_ARCH=0

# Set defaults for vars that may not have been defined externally
# FIXME: if INSTALL_PREFIX is not set, check PREFIX, then check
# CONDA_PREFIX, but there is no fallback from there!
INSTALL_PREFIX=${INSTALL_PREFIX:=${PREFIX:=${CONDA_PREFIX}}}
PARALLEL_LEVEL=${PARALLEL_LEVEL:=""}

function hasArg {
(( ${NUMARGS} != 0 )) && (echo " ${ARGS} " | grep -q " $1 ")
}

if hasArg -h; then
echo "${HELP}"
exit 0
fi

# Check for valid usage
if (( ${NUMARGS} != 0 )); then
for a in ${ARGS}; do
if ! (echo " ${VALIDARGS} " | grep -q " ${a} "); then
echo "Invalid option: ${a}"
exit 1
fi
done
fi

# Process flags
if hasArg -v; then
VERBOSE=1
fi
if hasArg -g; then
BUILD_TYPE=Debug
fi
if hasArg -n; then
INSTALL_TARGET=""
fi
if hasArg --allgpuarch; then
BUILD_ALL_GPU_ARCH=1
fi
if hasArg benchmarks; then
BENCHMARKS="ON"
fi

# If clean given, run it prior to any other steps
if hasArg clean; then
# If the dirs to clean are mounted dirs in a container, the
# contents should be removed but the mounted dirs will remain.
# The find removes all contents but leaves the dirs, the rmdir
# attempts to remove the dirs but can fail safely.
for bd in ${BUILD_DIRS}; do
if [ -d ${bd} ]; then
find ${bd} -mindepth 1 -delete
rmdir ${bd} || true
fi
done
fi

if (( ${BUILD_ALL_GPU_ARCH} == 0 )); then
GPU_ARCH="-DGPU_ARCHS="
echo "Building for the architecture of the GPU in the system..."
else
GPU_ARCH="-DGPU_ARCHS=ALL"
echo "Building for *ALL* supported GPU architectures..."
fi

################################################################################

# Build and install the cuxfilter Python package
if (( ${NUMARGS} == 0 )) || hasArg cuxfilter; then

cd ${REPODIR}/python
if [[ ${INSTALL_TARGET} != "" ]]; then
python setup.py build_ext --inplace
python setup.py install --single-version-externally-managed --record=record.txt
else
python setup.py build_ext --inplace --library-dir=${LIBCUDF_BUILD_DIR}
fi
fi

# Build and install the cudatashader Python package
if (( ${NUMARGS} == 0 )) || hasArg cudatashader; then

cd ${REPODIR}/../
git clone https://github.com/rapidsai/cuDataShader.git
cd ${REPODIR}/../cuDataShader
python setup.py install --single-version-externally-managed --record=record.txt
Comment on lines +127 to +128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would cd back to $REPODIR after this installation. I'm also unsure how this will act in a conda build environment, but we can leave it as is and see.

cd ${REPODIR}
fi
37 changes: 37 additions & 0 deletions ci/checks/changelog.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash
# Copyright (c) 2018, NVIDIA CORPORATION.
#########################
# cuxfilter CHANGELOG Tester #
#########################

# Checkout master for comparison
git checkout --quiet master

# Switch back to tip of PR branch
git checkout --quiet current-pr-branch

# Ignore errors during searching
set +e

# Get list of modified files between matster and PR branch
CHANGELOG=`git diff --name-only master...current-pr-branch | grep CHANGELOG.md`
# Check if CHANGELOG has PR ID
PRNUM=`cat CHANGELOG.md | grep "$PR_ID"`
RETVAL=0

# Return status of check result
if [ "$CHANGELOG" != "" -a "$PRNUM" != "" ] ; then
echo -e "\n\n>>>> PASSED: CHANGELOG.md has been updated with current PR information.\n\nPlease ensure the update meets the following criteria.\n"
else
echo -e "\n\n>>>> FAILED: CHANGELOG.md has not been updated!\n\nPlease add a line describing this PR to CHANGELOG.md in the repository root directory. The line should meet the following criteria.\n"
RETVAL=1
fi

cat << EOF
It should be placed under the section for the appropriate release.
It should be placed under "New Features", "Improvements", or "Bug Fixes" as appropriate.
It should be formatted as '- PR #<PR number> <Concise human-readable description of the PR's new feature, improvement, or bug fix>'
Example format for #491 '- PR #491 Add CI test script to check for updates to CHANGELOG.md in PRs'
EOF

exit $RETVAL
39 changes: 39 additions & 0 deletions ci/checks/style.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Ignore errors and set path
set +e
PATH=/conda/bin:$PATH
LC_ALL=C.UTF-8
LANG=C.UTF-8

# Activate common conda env
source activate gdf

# Run black and get results/return code
BLACK=`black --check python`
BLACK_RETVAL=$?

# Run flake8 and get results/return code
FLAKE=`flake8 python`
FLAKE_RETVAL=$?

if [ "$BLACK_RETVAL" != "0" ]; then
echo -e "\n\n>>>> FAILED: black style check; begin output\n\n"
echo -e "$BLACK"
echo -e "\n\n>>>> FAILED: black style check; end output\n\n"
else
echo -e "\n\n>>>> PASSED: black style check\n\n"
fi

if [ "$FLAKE_RETVAL" != "0" ]; then
echo -e "\n\n>>>> FAILED: flake8 style check; begin output\n\n"
echo -e "$FLAKE"
echo -e "\n\n>>>> FAILED: flake8 style check; end output\n\n"
else
echo -e "\n\n>>>> PASSED: flake8 style check\n\n"
fi

RETVALS=($BLACK_RETVAL $FLAKE_RETVAL)
IFS=$'\n'
RETVAL=`echo "${RETVALS[*]}" | sort -nr | head -n1`

exit $RETVAL

63 changes: 63 additions & 0 deletions ci/cpu/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/bin/bash
# Copyright (c) 2018, NVIDIA CORPORATION.
######################################
# cuDF CPU conda build script for CI #
######################################
set -e

# Logger function for build status output
function logger() {
echo -e "\n>>>> $@\n"
}

# Set path and build parallel level
export PATH=/conda/bin:/usr/local/cuda/bin:$PATH
export PARALLEL_LEVEL=4

# Set home to the job's workspace
export HOME=$WORKSPACE

# Switch to project root; also root of repo checkout
cd $WORKSPACE

# Get latest tag and number of commits since tag
export GIT_DESCRIBE_TAG=`git describe --abbrev=0 --tags`
export GIT_DESCRIBE_NUMBER=`git rev-list ${GIT_DESCRIBE_TAG}..HEAD --count`

# If nightly build, append current YYMMDD to version
if [[ "$BUILD_MODE" = "branch" && "$SOURCE_BRANCH" = branch-* ]] ; then
export VERSION_SUFFIX=`date +%y%m%d`
fi

################################################################################
# SETUP - Check environment
################################################################################

logger "Get env..."
env

logger "Activate conda env..."
source activate gdf

logger "Check versions..."
python --version
gcc --version
g++ --version
conda list

# FIX Added to deal with Anancoda SSL verification issues during conda builds
conda config --set ssl_verify False

################################################################################
# BUILD - Conda package builds (conda deps: libcudf <- libcudf_cffi <- cudf)
################################################################################

logger "Build conda pkg for cuxfilter..."
source ci/cpu/cuxfilter/build_cuxfilter.sh

################################################################################
# UPLOAD - Conda packages
################################################################################

logger "Upload conda pkgs..."
source ci/cpu/upload_anaconda.sh
Loading