If you are interested in contributing to cuML, your contributions will fall into three categories:
- You want to report a bug, feature request, or documentation issue
- File an issue describing what you encountered or what you want to see changed.
- The RAPIDS team will evaluate the issues and triage them, scheduling them for a release. If you believe the issue needs priority attention comment on the issue to notify the team.
- You want to propose a new Feature and implement it
- Post about your intended feature, and we shall discuss the design and implementation.
- Once we agree that the plan looks good, go ahead and implement it, using the code contributions guide below.
- You want to implement a feature or bug-fix for an outstanding issue
- Follow the code contributions guide below.
- If you need more context on a particular issue, please ask and we shall provide.
- Follow the guide at the bottom of this page for Setting Up Your Build Environment.
- Find an issue to work on. The best way is to look for the good first issue or help wanted labels
- Comment on the issue saying you are going to work on it
- Get familar with the developer guide relevant for you:
- For C++ developers it is avaiable here DEVELOPER_GUIDE.md
- Code! Make sure to update unit tests!
- When done, create your pull request
- Verify that CI passes all status checks. Fix if needed
- Wait for other developers to review your code and update code as needed
- Once reviewed and approved, a RAPIDS developer will merge your pull request
Remember, if you are unsure about anything, don't hesitate to comment on issues and ask for clarifications!
Once you have gotten your feet wet and are more comfortable with the code, you can look at the prioritized issues of our next release in our project boards.
Pro Tip: Always look at the release board with the highest number for issues to work on. This is where RAPIDS developers also focus their efforts.
Look at the unassigned issues, and find an issue you are comfortable with contributing to. Start with Step 3 from above, commenting on the issue to let others know you are working on it. If you have any questions related to the implementation of the issue, ask them in the issue instead of the PR.
To install cuML from source, ensure the dependencies are met:
- cuDF (>=0.5.1)
- zlib Provided by zlib1g-dev in Ubuntu 16.04
- cmake (>= 3.12.4)
- CUDA (>= 9.2)
- Cython (>= 0.29)
- gcc (>=5.4.0)
- BLAS - Any BLAS compatible with Cmake's FindBLAS
Once dependencies are present, follow the steps below:
- Clone the repository.
$ git clone --recurse-submodules https://github.com/rapidsai/cuml.git
- Build and install
libcuml
(the C++/CUDA library containing the cuML algorithms), starting from the repository root folder:
$ cd cuML
$ mkdir build
$ cd build
$ export CUDA_BIN_PATH=$CUDA_HOME # (optional env variable if cuda binary is not in the PATH. Default CUDA_HOME=/path/to/cuda/)
$ cmake ..
If using a conda environment (recommended currently), then cmake can be configured appropriately via:
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
Note: The following warning message is dependent upon the version of cmake and the CMAKE_INSTALL_PREFIX
used. If this warning is displayed, the build should still run succesfully. We are currently working to resolve this open issue. You can silence this warning by adding -DCMAKE_IGNORE_PATH=$CONDA_PREFIX/lib
to your cmake
command.
Cannot generate a safe runtime search path for target ml_test because files
in some directories may conflict with libraries in implicit directories:
The configuration script will print the BLAS found on the search path. If the version found does not match the version intended, use the flag -DBLAS_LIBRARIES=/path/to/blas.so
with the cmake
command to force your own version.
- Build
libcuml
:
$ make -j
$ make install
To run tests (optional):
$ ./ml_test
If you want a list of the available tests:
$ ./ml_test --gtest_list_tests
- Build the
cuml
python package:
$ cd ../../python
$ python setup.py build_ext --inplace
To run Python tests (optional):
$ py.test -v
If you want a list of the available tests:
$ py.test cuML/test --collect-only
- Finally, install the Python package to your Python path:
$ python setup.py install
cuML's core folder structure:
-
cuML: C++/CUDA machine learning algorithms. See list of algorithms
-
python: Python bindings for the above algorithms, including interfaces for cuDF. These bindings connect the data to C++/CUDA based cuML and ml-prims libraries without leaving GPU memory.
-
ml-prims: Low level machine learning primitives header only library, used in cuML algorithms. Includes:
- Linear Algebra
- Statistics
- Basic Matrix Operations
- Distance Functions
- Random Number Generation
The external folders contains submodules that this project in-turn depends on. Appropriate location flags
will be automatically populated in the main CMakeLists.txt
file for these.
Current external submodules are:
Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md