diff --git a/CHANGELOG.md b/CHANGELOG.md index 689a214751f..d934273d0a7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,80 @@ +# cugraph 24.10.00 (9 Oct 2024) + +## 🚨 Breaking Changes + +- Add Additional Check for `NetworkX` Release Candidate Versions ([#4613](https://github.com/rapidsai/cugraph/pull/4613)) [@nv-rliu](https://github.com/nv-rliu) +- Updates to `cugraph.hypergraph` (Duplicate Col Labels Bug) ([#4610](https://github.com/rapidsai/cugraph/pull/4610)) [@nv-rliu](https://github.com/nv-rliu) +- Heterogeneous renumbering implementation ([#4602](https://github.com/rapidsai/cugraph/pull/4602)) [@seunghwak](https://github.com/seunghwak) +- [FEA] DGL Examples ([#4583](https://github.com/rapidsai/cugraph/pull/4583)) [@alexbarghi-nv](https://github.com/alexbarghi-nv) + +## πŸ› Bug Fixes + +- Updates docs to describe nx-cugraph based on latest updates for 24.10 ([#4694](https://github.com/rapidsai/cugraph/pull/4694)) [@nv-rliu](https://github.com/nv-rliu) +- Constrain versions of PyTorch and CI artifacts in CI Runs, upgrade to dgl 2.4 ([#4690](https://github.com/rapidsai/cugraph/pull/4690)) [@alexbarghi-nv](https://github.com/alexbarghi-nv) +- Drops duplicate edges in non-MultiGraph PLC `SGGraph` instances ([#4658](https://github.com/rapidsai/cugraph/pull/4658)) [@rlratzel](https://github.com/rlratzel) +- Install mg test executables ([#4656](https://github.com/rapidsai/cugraph/pull/4656)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Fix build strings in nx-cugraph ([#4639](https://github.com/rapidsai/cugraph/pull/4639)) [@bdice](https://github.com/bdice) +- Set CUDA_STATIC_MATH_LIBRARIES in Python builds ([#4612](https://github.com/rapidsai/cugraph/pull/4612)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Updates to `cugraph.hypergraph` (Duplicate Col Labels Bug) ([#4610](https://github.com/rapidsai/cugraph/pull/4610)) [@nv-rliu](https://github.com/nv-rliu) +- Biased sampling primitive bug fix ([#4607](https://github.com/rapidsai/cugraph/pull/4607)) [@seunghwak](https://github.com/seunghwak) +- Fix `test_property_graph_mg` Usage of Util Function ([#4600](https://github.com/rapidsai/cugraph/pull/4600)) [@nv-rliu](https://github.com/nv-rliu) +- Re-configure benchmarking devices & add markers to `bench_cugraph_uniform_neighbor_sample` ([#4561](https://github.com/rapidsai/cugraph/pull/4561)) [@nv-rliu](https://github.com/nv-rliu) + +## πŸ“– Documentation + +- Implementing some of the VDR feedback ([#4674](https://github.com/rapidsai/cugraph/pull/4674)) [@acostadon](https://github.com/acostadon) +- Add `nx-cugraph` Docs Pages ([#4669](https://github.com/rapidsai/cugraph/pull/4669)) [@nv-rliu](https://github.com/nv-rliu) +- Recommending `miniforge` for conda install ([#4650](https://github.com/rapidsai/cugraph/pull/4650)) [@mmccarty](https://github.com/mmccarty) + +## πŸš€ New Features + +- Add `nx-cugraph` introduction notebook to repo ([#4677](https://github.com/rapidsai/cugraph/pull/4677)) [@nv-rliu](https://github.com/nv-rliu) +- Support Negative Sampling in pylibcugraph and cuGraph-PyG ([#4660](https://github.com/rapidsai/cugraph/pull/4660)) [@alexbarghi-nv](https://github.com/alexbarghi-nv) +- Heterogeneous renumbering implementation ([#4602](https://github.com/rapidsai/cugraph/pull/4602)) [@seunghwak](https://github.com/seunghwak) +- [FEA] Biased Sampling in cuGraph-DGL ([#4595](https://github.com/rapidsai/cugraph/pull/4595)) [@alexbarghi-nv](https://github.com/alexbarghi-nv) +- [FEA] Biased Sampling in cuGraph-PyG ([#4586](https://github.com/rapidsai/cugraph/pull/4586)) [@alexbarghi-nv](https://github.com/alexbarghi-nv) +- [FEA] DGL Examples ([#4583](https://github.com/rapidsai/cugraph/pull/4583)) [@alexbarghi-nv](https://github.com/alexbarghi-nv) + +## πŸ› οΈ Improvements + +- `nx-cugraph`: add `NX_CUGRAPH_AUTOCONFIG=True` env var to enable full zero-code change ([#4685](https://github.com/rapidsai/cugraph/pull/4685)) [@eriknw](https://github.com/eriknw) +- Fix `cit-patents` Dataset for `nx-cugraph` Benchmark ([#4666](https://github.com/rapidsai/cugraph/pull/4666)) [@nv-rliu](https://github.com/nv-rliu) +- Update update-version.sh to use packaging lib ([#4664](https://github.com/rapidsai/cugraph/pull/4664)) [@AyodeAwe](https://github.com/AyodeAwe) +- Swtch traceback to `--native` in `cugraph` ([#4663](https://github.com/rapidsai/cugraph/pull/4663)) [@galipremsagar](https://github.com/galipremsagar) +- bump NCCL floor to 2.18.1.1, include nccl.h where it's needed ([#4661](https://github.com/rapidsai/cugraph/pull/4661)) [@jameslamb](https://github.com/jameslamb) +- Use CI workflow branch 'branch-24.10' again ([#4654](https://github.com/rapidsai/cugraph/pull/4654)) [@jameslamb](https://github.com/jameslamb) +- Update flake8 to 7.1.1. ([#4652](https://github.com/rapidsai/cugraph/pull/4652)) [@bdice](https://github.com/bdice) +- reduce pip verbosity in wheel builds ([#4651](https://github.com/rapidsai/cugraph/pull/4651)) [@jameslamb](https://github.com/jameslamb) +- Refactor the python function symmetrizing the edgelist ([#4649](https://github.com/rapidsai/cugraph/pull/4649)) [@jnke2016](https://github.com/jnke2016) +- Add `--cpu-only` or `--gpu-only` Arguments to `nx-cugraph` Benchmark ([#4648](https://github.com/rapidsai/cugraph/pull/4648)) [@nv-rliu](https://github.com/nv-rliu) +- Add support for Python 3.12 ([#4647](https://github.com/rapidsai/cugraph/pull/4647)) [@jameslamb](https://github.com/jameslamb) +- Biased Random Walks and Node2Vec implementation ([#4645](https://github.com/rapidsai/cugraph/pull/4645)) [@ChuckHastings](https://github.com/ChuckHastings) +- update a few more Python references to Python 3.10 ([#4637](https://github.com/rapidsai/cugraph/pull/4637)) [@jameslamb](https://github.com/jameslamb) +- Negative Sampling test needs whole GPU ([#4636](https://github.com/rapidsai/cugraph/pull/4636)) [@ChuckHastings](https://github.com/ChuckHastings) +- Update rapidsai/pre-commit-hooks ([#4633](https://github.com/rapidsai/cugraph/pull/4633)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Update examples to build with latest changes to cugraph ([#4632](https://github.com/rapidsai/cugraph/pull/4632)) [@ChuckHastings](https://github.com/ChuckHastings) +- Remove Warnings and Timeout from `bench_cugraph_uniform_neighbor_sample.py` ([#4631](https://github.com/rapidsai/cugraph/pull/4631)) [@nv-rliu](https://github.com/nv-rliu) +- Update edge triangle count to call a non detail primitive ([#4630](https://github.com/rapidsai/cugraph/pull/4630)) [@jnke2016](https://github.com/jnke2016) +- nx-cugraph: Updates nxcg.Graph classes for API-compatibility with NetworkX Graph classes, needed for zero code change graph generators ([#4629](https://github.com/rapidsai/cugraph/pull/4629)) [@eriknw](https://github.com/eriknw) +- Drop Python 3.9 support ([#4625](https://github.com/rapidsai/cugraph/pull/4625)) [@jameslamb](https://github.com/jameslamb) +- Download fewer datasets for C/C++ unit tests ([#4624](https://github.com/rapidsai/cugraph/pull/4624)) [@ChuckHastings](https://github.com/ChuckHastings) +- Use CUDA math wheels ([#4621](https://github.com/rapidsai/cugraph/pull/4621)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Fix ListColumn constructor argument ([#4620](https://github.com/rapidsai/cugraph/pull/4620)) [@mroeschke](https://github.com/mroeschke) +- Use CategoricalColumn instead of build_categorical_column ([#4618](https://github.com/rapidsai/cugraph/pull/4618)) [@mroeschke](https://github.com/mroeschke) +- Add `nx-cugraph` Benchmarking Scripts ([#4616](https://github.com/rapidsai/cugraph/pull/4616)) [@nv-rliu](https://github.com/nv-rliu) +- Remove NumPy <2 pin ([#4615](https://github.com/rapidsai/cugraph/pull/4615)) [@seberg](https://github.com/seberg) +- Add Additional Check for `NetworkX` Release Candidate Versions ([#4613](https://github.com/rapidsai/cugraph/pull/4613)) [@nv-rliu](https://github.com/nv-rliu) +- Remove a bunch of legacy code that's no longer used ([#4609](https://github.com/rapidsai/cugraph/pull/4609)) [@ChuckHastings](https://github.com/ChuckHastings) +- Update pre-commit hooks ([#4605](https://github.com/rapidsai/cugraph/pull/4605)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Improve update-version.sh ([#4599](https://github.com/rapidsai/cugraph/pull/4599)) [@bdice](https://github.com/bdice) +- Use tool.scikit-build.cmake.version, set scikit-build-core minimum-version ([#4597](https://github.com/rapidsai/cugraph/pull/4597)) [@jameslamb](https://github.com/jameslamb) +- Migrate get_sampling_index function from cugraph-ops to cugraph ([#4594](https://github.com/rapidsai/cugraph/pull/4594)) [@ChuckHastings](https://github.com/ChuckHastings) +- Merge branch-24.08 into branch-24.10 ([#4565](https://github.com/rapidsai/cugraph/pull/4565)) [@jameslamb](https://github.com/jameslamb) +- Fix ucx-py version, use UCX 1.17.0 in pip devcontainers ([#4562](https://github.com/rapidsai/cugraph/pull/4562)) [@bdice](https://github.com/bdice) +- Use stream_allocator_adaptor constructor instead of factory. ([#4557](https://github.com/rapidsai/cugraph/pull/4557)) [@bdice](https://github.com/bdice) +- Add an Explanatory Error Message for uint Types ([#4556](https://github.com/rapidsai/cugraph/pull/4556)) [@alexbarghi-nv](https://github.com/alexbarghi-nv) +- Define and Implement C++ API for negative sampling ([#4523](https://github.com/rapidsai/cugraph/pull/4523)) [@ChuckHastings](https://github.com/ChuckHastings) + # cugraph 24.08.00 (7 Aug 2024) ## 🚨 Breaking Changes diff --git a/ci/build_docs.sh b/ci/build_docs.sh index 55235c6ebb9..01c573c96ca 100755 --- a/ci/build_docs.sh +++ b/ci/build_docs.sh @@ -6,6 +6,10 @@ set -euo pipefail rapids-logger "Create test conda environment" . /opt/conda/etc/profile.d/conda.sh +export RAPIDS_VERSION="$(rapids-version)" +export RAPIDS_VERSION_MAJOR_MINOR="$(rapids-version-major-minor)" +export RAPIDS_VERSION_NUMBER="$RAPIDS_VERSION_MAJOR_MINOR" + rapids-dependency-file-generator \ --output conda \ --file-key docs \ @@ -22,35 +26,31 @@ PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python) if [[ "${RAPIDS_CUDA_VERSION}" == "11.8.0" ]]; then CONDA_CUDA_VERSION="11.8" - DGL_CHANNEL="dglteam/label/cu118" + DGL_CHANNEL="dglteam/label/th23_cu118" else CONDA_CUDA_VERSION="12.1" - DGL_CHANNEL="dglteam/label/cu121" + DGL_CHANNEL="dglteam/label/th23_cu121" fi rapids-mamba-retry install \ --channel "${CPP_CHANNEL}" \ --channel "${PYTHON_CHANNEL}" \ --channel conda-forge \ - --channel pyg \ --channel nvidia \ --channel "${DGL_CHANNEL}" \ - libcugraph \ - pylibcugraph \ - cugraph \ - cugraph-pyg \ - cugraph-dgl \ - cugraph-service-server \ - cugraph-service-client \ - libcugraph_etl \ - pylibcugraphops \ - pylibwholegraph \ - pytorch \ + "libcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pylibcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph-pyg=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph-dgl=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph-service-server=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph-service-client=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "libcugraph_etl=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pylibcugraphops=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pylibwholegraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pytorch>=2.3,<2.4" \ "cuda-version=${CONDA_CUDA_VERSION}" -export RAPIDS_VERSION="$(rapids-version)" -export RAPIDS_VERSION_MAJOR_MINOR="$(rapids-version-major-minor)" -export RAPIDS_VERSION_NUMBER="$RAPIDS_VERSION_MAJOR_MINOR" export RAPIDS_DOCS_DIR="$(mktemp -d)" for PROJECT in libcugraphops libwholegraph; do diff --git a/ci/build_python.sh b/ci/build_python.sh index 1ebc38b058b..c94cc2a0fce 100755 --- a/ci/build_python.sh +++ b/ci/build_python.sh @@ -61,7 +61,6 @@ if [[ ${RAPIDS_CUDA_MAJOR} == "11" ]]; then --no-test \ --channel "${CPP_CHANNEL}" \ --channel "${RAPIDS_CONDA_BLD_OUTPUT_DIR}" \ - --channel pyg \ --channel pytorch \ --channel pytorch-nightly \ conda/recipes/cugraph-pyg @@ -71,7 +70,7 @@ if [[ ${RAPIDS_CUDA_MAJOR} == "11" ]]; then --no-test \ --channel "${CPP_CHANNEL}" \ --channel "${RAPIDS_CONDA_BLD_OUTPUT_DIR}" \ - --channel dglteam \ + --channel dglteam/label/th23_cu118 \ --channel pytorch \ --channel pytorch-nightly \ conda/recipes/cugraph-dgl diff --git a/ci/test_cpp.sh b/ci/test_cpp.sh index 6c14870164e..fb9ab1f5e4e 100755 --- a/ci/test_cpp.sh +++ b/ci/test_cpp.sh @@ -8,6 +8,8 @@ cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../ . /opt/conda/etc/profile.d/conda.sh +RAPIDS_VERSION_MAJOR_MINOR="$(rapids-version-major-minor)" + rapids-logger "Generate C++ testing dependencies" rapids-dependency-file-generator \ --output conda \ @@ -30,7 +32,9 @@ rapids-print-env rapids-mamba-retry install \ --channel "${CPP_CHANNEL}" \ - libcugraph libcugraph_etl libcugraph-tests + "libcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "libcugraph_etl=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "libcugraph-tests=${RAPIDS_VERSION_MAJOR_MINOR}.*" rapids-logger "Check GPU usage" nvidia-smi diff --git a/ci/test_notebooks.sh b/ci/test_notebooks.sh index 31ec56074f0..b22671b48dc 100755 --- a/ci/test_notebooks.sh +++ b/ci/test_notebooks.sh @@ -5,6 +5,8 @@ set -Eeuo pipefail . /opt/conda/etc/profile.d/conda.sh +RAPIDS_VERSION_MAJOR_MINOR="$(rapids-version-major-minor)" + rapids-logger "Generate notebook testing dependencies" rapids-dependency-file-generator \ --output conda \ @@ -27,7 +29,9 @@ PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python) rapids-mamba-retry install \ --channel "${CPP_CHANNEL}" \ --channel "${PYTHON_CHANNEL}" \ - libcugraph pylibcugraph cugraph + "libcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pylibcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" NBTEST="$(realpath "$(dirname "$0")/utils/nbtest.sh")" NOTEBOOK_LIST="$(realpath "$(dirname "$0")/notebook_list.py")" diff --git a/ci/test_python.sh b/ci/test_python.sh index f21a06cf061..29b4c7be190 100755 --- a/ci/test_python.sh +++ b/ci/test_python.sh @@ -8,6 +8,8 @@ cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../ . /opt/conda/etc/profile.d/conda.sh +RAPIDS_VERSION_MAJOR_MINOR="$(rapids-version-major-minor)" + rapids-logger "Generate Python testing dependencies" rapids-dependency-file-generator \ --output conda \ @@ -34,12 +36,12 @@ rapids-print-env rapids-mamba-retry install \ --channel "${CPP_CHANNEL}" \ --channel "${PYTHON_CHANNEL}" \ - libcugraph \ - pylibcugraph \ - cugraph \ - nx-cugraph \ - cugraph-service-server \ - cugraph-service-client + "libcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pylibcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "nx-cugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph-service-server=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph-service-client=${RAPIDS_VERSION_MAJOR_MINOR}.*" rapids-logger "Check GPU usage" nvidia-smi @@ -151,14 +153,13 @@ if [[ "${RAPIDS_CUDA_VERSION}" == "11.8.0" ]]; then --channel "${CPP_CHANNEL}" \ --channel "${PYTHON_CHANNEL}" \ --channel conda-forge \ - --channel dglteam/label/cu118 \ + --channel dglteam/label/th23_cu118 \ --channel nvidia \ - libcugraph \ - pylibcugraph \ - pylibcugraphops \ - cugraph \ - cugraph-dgl \ - 'dgl>=1.1.0.cu*,<=2.0.0.cu*' \ + "libcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pylibcugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pylibcugraphops=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "cugraph-dgl=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ 'pytorch>=2.3,<2.4' \ 'cuda-version=11.8' @@ -208,16 +209,10 @@ if [[ "${RAPIDS_CUDA_VERSION}" == "11.8.0" ]]; then rapids-mamba-retry install \ --channel "${CPP_CHANNEL}" \ --channel "${PYTHON_CHANNEL}" \ - --channel pyg \ - "cugraph-pyg" \ + "cugraph-pyg=${RAPIDS_VERSION_MAJOR_MINOR}.*" \ + "pytorch>=2.3,<2.4" \ "ogb" - pip install \ - pyg_lib \ - torch_scatter \ - torch_sparse \ - -f ${PYG_URL} - rapids-print-env rapids-logger "pytest cugraph_pyg (single GPU)" @@ -253,7 +248,7 @@ if [[ "${RAPIDS_CUDA_VERSION}" == "11.8.0" ]]; then --channel "${PYTHON_CHANNEL}" \ --channel conda-forge \ --channel nvidia \ - cugraph-equivariant + "cugraph-equivariant=${RAPIDS_VERSION_MAJOR_MINOR}.*" pip install e3nn==0.5.1 rapids-print-env diff --git a/ci/test_wheel_cugraph-dgl.sh b/ci/test_wheel_cugraph-dgl.sh index 9b79cb17fe4..688c58026bd 100755 --- a/ci/test_wheel_cugraph-dgl.sh +++ b/ci/test_wheel_cugraph-dgl.sh @@ -30,10 +30,10 @@ else PYTORCH_CUDA_VER=$PKG_CUDA_VER fi PYTORCH_URL="https://download.pytorch.org/whl/cu${PYTORCH_CUDA_VER}" -DGL_URL="https://data.dgl.ai/wheels/cu${PYTORCH_CUDA_VER}/repo.html" +DGL_URL="https://data.dgl.ai/wheels/torch-2.3/cu${PYTORCH_CUDA_VER}/repo.html" rapids-logger "Installing PyTorch and DGL" rapids-retry python -m pip install torch==2.3.0 --index-url ${PYTORCH_URL} -rapids-retry python -m pip install dgl==2.0.0 --find-links ${DGL_URL} +rapids-retry python -m pip install dgl==2.4.0 --find-links ${DGL_URL} python -m pytest python/cugraph-dgl/tests diff --git a/conda/environments/all_cuda-118_arch-x86_64.yaml b/conda/environments/all_cuda-118_arch-x86_64.yaml index a23c2395646..ec3f61d383f 100644 --- a/conda/environments/all_cuda-118_arch-x86_64.yaml +++ b/conda/environments/all_cuda-118_arch-x86_64.yaml @@ -4,8 +4,7 @@ channels: - rapidsai - rapidsai-nightly - dask/label/dev -- pyg -- dglteam/label/cu118 +- dglteam/label/th23_cu118 - conda-forge - nvidia dependencies: diff --git a/conda/environments/all_cuda-125_arch-x86_64.yaml b/conda/environments/all_cuda-125_arch-x86_64.yaml index eca10584304..ff42bbbc365 100644 --- a/conda/environments/all_cuda-125_arch-x86_64.yaml +++ b/conda/environments/all_cuda-125_arch-x86_64.yaml @@ -4,8 +4,7 @@ channels: - rapidsai - rapidsai-nightly - dask/label/dev -- pyg -- dglteam/label/cu118 +- dglteam/label/th23_cu118 - conda-forge - nvidia dependencies: diff --git a/conda/recipes/cugraph-dgl/meta.yaml b/conda/recipes/cugraph-dgl/meta.yaml index c80ca6890a8..0383fc8adf8 100644 --- a/conda/recipes/cugraph-dgl/meta.yaml +++ b/conda/recipes/cugraph-dgl/meta.yaml @@ -25,7 +25,7 @@ requirements: - setuptools>=61.0.0 run: - cugraph ={{ version }} - - dgl >=1.1.0.cu* + - dgl >=2.4.0.th23.cu* - numba >=0.57 - numpy >=1.23,<3.0a0 - pylibcugraphops ={{ minor_version }} diff --git a/conda/recipes/cugraph-pyg/meta.yaml b/conda/recipes/cugraph-pyg/meta.yaml index 38d4a3d7d15..7d3e503e23a 100644 --- a/conda/recipes/cugraph-pyg/meta.yaml +++ b/conda/recipes/cugraph-pyg/meta.yaml @@ -36,7 +36,7 @@ requirements: - cugraph ={{ version }} - pylibcugraphops ={{ minor_version }} - tensordict >=0.1.2 - - pyg >=2.5,<2.6 + - pytorch_geometric >=2.5,<2.6 tests: imports: diff --git a/cpp/include/cugraph_c/graph.h b/cpp/include/cugraph_c/graph.h index 00fce0493a3..d812b503778 100644 --- a/cpp/include/cugraph_c/graph.h +++ b/cpp/include/cugraph_c/graph.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2023, NVIDIA CORPORATION. + * Copyright (c) 2021-2024, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -105,6 +105,8 @@ cugraph_error_code_t cugraph_sg_graph_create( weights, * or take the maximum weight), the caller should remove specific edges themselves and not rely * on this flag. + * @param [in] symmetrize If true, symmetrize the edgelist. The symmetrization of edges + * with edge_ids and/or edge_type_ids is currently not supported. * @param [in] do_expensive_check If true, do expensive checks to validate the input data * is consistent with software assumptions. If false bypass these checks. * @param [out] graph A pointer to the graph object @@ -126,6 +128,7 @@ cugraph_error_code_t cugraph_graph_create_sg( bool_t renumber, bool_t drop_self_loops, bool_t drop_multi_edges, + bool_t symmetrize, bool_t do_expensive_check, cugraph_graph_t** graph, cugraph_error_t** error); @@ -150,6 +153,8 @@ cugraph_error_code_t cugraph_graph_create_sg( * If false, do not renumber. Renumbering enables some significant optimizations within * the graph primitives library, so it is strongly encouraged. Renumbering is required if * the vertices are not sequential integer values from 0 to num_vertices. + * @param [in] symmetrize If true, symmetrize the edgelist. The symmetrization of edges + * with edge_ids and/or edge_type_ids is currently not supported. * @param [in] do_expensive_check If true, do expensive checks to validate the input data * is consistent with software assumptions. If false bypass these checks. * @param [out] graph A pointer to the graph object @@ -168,6 +173,7 @@ cugraph_error_code_t cugraph_sg_graph_create_from_csr( const cugraph_type_erased_device_array_view_t* edge_type_ids, bool_t store_transposed, bool_t renumber, + bool_t symmetrize, bool_t do_expensive_check, cugraph_graph_t** graph, cugraph_error_t** error); @@ -190,6 +196,8 @@ cugraph_error_code_t cugraph_sg_graph_create_from_csr( * If false, do not renumber. Renumbering enables some significant optimizations within * the graph primitives library, so it is strongly encouraged. Renumbering is required if * the vertices are not sequential integer values from 0 to num_vertices. + * @param [in] symmetrize If true, symmetrize the edgelist. The symmetrization of edges + * with edge_ids and/or edge_type_ids is currently not supported. * @param [in] do_expensive_check If true, do expensive checks to validate the input data * is consistent with software assumptions. If false bypass these checks. * @param [out] graph A pointer to the graph object @@ -208,6 +216,7 @@ cugraph_error_code_t cugraph_graph_create_sg_from_csr( const cugraph_type_erased_device_array_view_t* edge_type_ids, bool_t store_transposed, bool_t renumber, + bool_t symmetrize, bool_t do_expensive_check, cugraph_graph_t** graph, cugraph_error_t** error); @@ -289,6 +298,8 @@ cugraph_error_code_t cugraph_mg_graph_create( * Note that setting this flag will arbitrarily select one instance of a multi edge to be the * edge that survives. If the edges have properties that should be honored (e.g. sum the * weights, or take the maximum weight), the caller should do that on not rely on this flag. + * @param [in] symmetrize If true, symmetrize the edgelist. The symmetrization of edges + * with edge_ids and/or edge_type_ids is currently not supported. * @param [in] do_expensive_check If true, do expensive checks to validate the input data * is consistent with software assumptions. If false bypass these checks. * @param [out] graph A pointer to the graph object @@ -309,6 +320,7 @@ cugraph_error_code_t cugraph_graph_create_mg( size_t num_arrays, bool_t drop_self_loops, bool_t drop_multi_edges, + bool_t symmetrize, bool_t do_expensive_check, cugraph_graph_t** graph, cugraph_error_t** error); diff --git a/cpp/src/c_api/graph_mg.cpp b/cpp/src/c_api/graph_mg.cpp index cc4acd31743..fc8014a5dd8 100644 --- a/cpp/src/c_api/graph_mg.cpp +++ b/cpp/src/c_api/graph_mg.cpp @@ -71,6 +71,7 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { bool_t renumber_; bool_t drop_self_loops_; bool_t drop_multi_edges_; + bool_t symmetrize_; bool_t do_expensive_check_; cugraph::c_api::cugraph_graph_t* result_{}; @@ -91,6 +92,7 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { bool_t renumber, bool_t drop_self_loops, bool_t drop_multi_edges, + bool_t symmetrize, bool_t do_expensive_check) : abstract_functor(), properties_(properties), @@ -109,6 +111,7 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { renumber_(renumber), drop_self_loops_(drop_self_loops), drop_multi_edges_(drop_multi_edges), + symmetrize_(symmetrize), do_expensive_check_(do_expensive_check) { } @@ -224,6 +227,22 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { : false); } + if (symmetrize_) { + if (edgelist_edge_ids || edgelist_edge_types) { + // Currently doesn't support the symmetrization of edgelist with edge_ids and edge_types + unsupported(); + } + + // Symmetrize the edgelist + std::tie(edgelist_srcs, edgelist_dsts, edgelist_weights) = + cugraph::symmetrize_edgelist( + handle_, + std::move(edgelist_srcs), + std::move(edgelist_dsts), + std::move(edgelist_weights), + false); + } + std::tie(*graph, new_edge_weights, new_edge_ids, new_edge_types, new_number_map) = cugraph::create_graph_from_edgelisttype_; } + if (symmetrize == TRUE) { + CAPI_EXPECTS((properties->is_symmetric == TRUE), + CUGRAPH_INVALID_INPUT, + "Invalid input arguments: The graph property must be symmetric if 'symmetrize' " + "is set to True.", + *error); + } + CAPI_EXPECTS(p_src[i]->type_ == vertex_type, CUGRAPH_INVALID_INPUT, "Invalid input arguments: all vertex types must match", @@ -488,6 +516,7 @@ extern "C" cugraph_error_code_t cugraph_graph_create_mg( bool_t::TRUE, drop_self_loops, drop_multi_edges, + symmetrize, do_expensive_check); try { @@ -534,6 +563,7 @@ extern "C" cugraph_error_code_t cugraph_mg_graph_create( 1, FALSE, FALSE, + FALSE, do_expensive_check, graph, error); diff --git a/cpp/src/c_api/graph_sg.cpp b/cpp/src/c_api/graph_sg.cpp index ff71471a8d0..f6ea8e4142e 100644 --- a/cpp/src/c_api/graph_sg.cpp +++ b/cpp/src/c_api/graph_sg.cpp @@ -43,6 +43,7 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { bool_t renumber_; bool_t drop_self_loops_; bool_t drop_multi_edges_; + bool_t symmetrize_; bool_t do_expensive_check_; cugraph_data_type_id_t edge_type_; cugraph::c_api::cugraph_graph_t* result_{}; @@ -58,6 +59,7 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { bool_t renumber, bool_t drop_self_loops, bool_t drop_multi_edges, + bool_t symmetrize, bool_t do_expensive_check, cugraph_data_type_id_t edge_type) : abstract_functor(), @@ -72,6 +74,7 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { renumber_(renumber), drop_self_loops_(drop_self_loops), drop_multi_edges_(drop_multi_edges), + symmetrize_(symmetrize), do_expensive_check_(do_expensive_check), edge_type_(edge_type) { @@ -207,6 +210,22 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { : false); } + if (symmetrize_) { + if (edgelist_edge_ids || edgelist_edge_types) { + // Currently doesn't support the symmetrization with edge_ids and edge_types + unsupported(); + } + + // Symmetrize the edgelist + std::tie(edgelist_srcs, edgelist_dsts, edgelist_weights) = + cugraph::symmetrize_edgelist( + handle_, + std::move(edgelist_srcs), + std::move(edgelist_dsts), + std::move(edgelist_weights), + false); + } + std::tie(*graph, new_edge_weights, new_edge_ids, new_edge_types, new_number_map) = cugraph::create_graph_from_edgelist, edge_type_id_t>(handle_); + if (symmetrize_) { + if (edgelist_edge_ids || edgelist_edge_types) { + // Currently doesn't support the symmetrization with edge_ids and edge_types + unsupported(); + } + + // Symmetrize the edgelist + std::tie(edgelist_srcs, edgelist_dsts, edgelist_weights) = + cugraph::symmetrize_edgelist( + handle_, + std::move(edgelist_srcs), + std::move(edgelist_dsts), + std::move(edgelist_weights), + false); + } + std::tie(*graph, new_edge_weights, new_edge_ids, new_edge_types, new_number_map) = cugraph::create_graph_from_edgelist(edge_type_ids); + if (symmetrize == TRUE) { + CAPI_EXPECTS((properties->is_symmetric == TRUE), + CUGRAPH_INVALID_INPUT, + "Invalid input arguments: The graph property must be symmetric if 'symmetrize' is " + "set to True.", + *error); + } + CAPI_EXPECTS(p_src->size_ == p_dst->size_, CUGRAPH_INVALID_INPUT, "Invalid input arguments: src size != dst size.", @@ -606,6 +653,7 @@ extern "C" cugraph_error_code_t cugraph_graph_create_sg( renumber, drop_self_loops, drop_multi_edges, + symmetrize, do_expensive_check, edge_type); @@ -658,6 +706,7 @@ extern "C" cugraph_error_code_t cugraph_sg_graph_create( renumber, FALSE, FALSE, + FALSE, do_expensive_check, graph, error); @@ -673,6 +722,7 @@ cugraph_error_code_t cugraph_graph_create_sg_from_csr( const cugraph_type_erased_device_array_view_t* edge_type_ids, bool_t store_transposed, bool_t renumber, + bool_t symmetrize, bool_t do_expensive_check, cugraph_graph_t** graph, cugraph_error_t** error) @@ -707,6 +757,14 @@ cugraph_error_code_t cugraph_graph_create_sg_from_csr( weight_type = cugraph_data_type_id_t::FLOAT32; } + if (symmetrize == TRUE) { + CAPI_EXPECTS((properties->is_symmetric == TRUE), + CUGRAPH_INVALID_INPUT, + "Invalid input arguments: The graph property must be symmetric if 'symmetrize' is " + "set to True.", + *error); + } + CAPI_EXPECTS( (edge_type_ids == nullptr && edge_ids == nullptr) || (edge_type_ids != nullptr && edge_ids != nullptr), @@ -735,6 +793,7 @@ cugraph_error_code_t cugraph_graph_create_sg_from_csr( p_edge_ids, p_edge_type_ids, renumber, + FALSE, // symmetrize do_expensive_check); try { @@ -770,6 +829,7 @@ cugraph_error_code_t cugraph_sg_graph_create_from_csr( const cugraph_type_erased_device_array_view_t* edge_type_ids, bool_t store_transposed, bool_t renumber, + bool_t symmetrize, bool_t do_expensive_check, cugraph_graph_t** graph, cugraph_error_t** error) @@ -783,6 +843,7 @@ cugraph_error_code_t cugraph_sg_graph_create_from_csr( edge_type_ids, store_transposed, renumber, + symmetrize, do_expensive_check, graph, error); diff --git a/cpp/tests/c_api/create_graph_test.c b/cpp/tests/c_api/create_graph_test.c index 41b8691e79c..104787e4c7b 100644 --- a/cpp/tests/c_api/create_graph_test.c +++ b/cpp/tests/c_api/create_graph_test.c @@ -104,6 +104,7 @@ int test_create_sg_graph_simple() FALSE, FALSE, FALSE, + FALSE, &graph, &ret_error); TEST_ASSERT(test_ret_value, ret_code == CUGRAPH_SUCCESS, "graph creation failed."); @@ -213,6 +214,7 @@ int test_create_sg_graph_csr() FALSE, FALSE, FALSE, + FALSE, &graph, &ret_error); TEST_ASSERT(test_ret_value, ret_code == CUGRAPH_SUCCESS, "graph creation failed."); @@ -408,6 +410,7 @@ int test_create_sg_graph_symmetric_error() FALSE, FALSE, FALSE, + FALSE, TRUE, &graph, &ret_error); @@ -526,6 +529,7 @@ int test_create_sg_graph_with_isolated_vertices() FALSE, FALSE, FALSE, + FALSE, &graph, &ret_error); TEST_ASSERT(test_ret_value, ret_code == CUGRAPH_SUCCESS, "graph creation failed."); @@ -675,6 +679,7 @@ int test_create_sg_graph_csr_with_isolated() FALSE, FALSE, FALSE, + FALSE, &graph, &ret_error); TEST_ASSERT(test_ret_value, ret_code == CUGRAPH_SUCCESS, "graph creation failed."); @@ -840,6 +845,7 @@ int test_create_sg_graph_with_isolated_vertices_multi_input() TRUE, TRUE, FALSE, + FALSE, &graph, &ret_error); TEST_ASSERT(test_ret_value, ret_code == CUGRAPH_SUCCESS, "graph creation failed."); diff --git a/cpp/tests/c_api/mg_create_graph_test.c b/cpp/tests/c_api/mg_create_graph_test.c index dd817881325..12579f26d06 100644 --- a/cpp/tests/c_api/mg_create_graph_test.c +++ b/cpp/tests/c_api/mg_create_graph_test.c @@ -109,6 +109,7 @@ int test_create_mg_graph_simple(const cugraph_resource_handle_t* handle) 1, FALSE, FALSE, + FALSE, TRUE, &graph, &ret_error); @@ -251,6 +252,7 @@ int test_create_mg_graph_multiple_edge_lists(const cugraph_resource_handle_t* ha num_local_arrays, FALSE, FALSE, + FALSE, TRUE, &graph, &ret_error); @@ -446,6 +448,7 @@ int test_create_mg_graph_multiple_edge_lists_multi_edge(const cugraph_resource_h num_local_arrays, TRUE, TRUE, + FALSE, TRUE, &graph, &ret_error); diff --git a/dependencies.yaml b/dependencies.yaml index 640adf8099f..a4143ff90c9 100644 --- a/dependencies.yaml +++ b/dependencies.yaml @@ -323,8 +323,7 @@ channels: - rapidsai - rapidsai-nightly - dask/label/dev - - pyg - - dglteam/label/cu118 + - dglteam/label/th23_cu118 - conda-forge - nvidia dependencies: @@ -700,7 +699,7 @@ dependencies: - &pytorch_conda pytorch>=2.3,<2.4.0a0 - pytorch-cuda==11.8 - &tensordict tensordict>=0.1.2 - - dgl>=1.1.0.cu* + - dgl>=2.4.0.cu* cugraph_pyg_dev: common: - output_types: [conda] @@ -709,7 +708,7 @@ dependencies: - *pytorch_conda - pytorch-cuda==11.8 - *tensordict - - pyg>=2.5,<2.6 + - pytorch_geometric>=2.5,<2.6 depends_on_pytorch: common: diff --git a/docs/cugraph/source/_static/bc_benchmark.png b/docs/cugraph/source/_static/bc_benchmark.png new file mode 100644 index 00000000000..9e385c97e99 Binary files /dev/null and b/docs/cugraph/source/_static/bc_benchmark.png differ diff --git a/docs/cugraph/source/_static/colab.png b/docs/cugraph/source/_static/colab.png new file mode 100644 index 00000000000..c4c3f5b46e1 Binary files /dev/null and b/docs/cugraph/source/_static/colab.png differ diff --git a/docs/cugraph/source/_static/nxcg-execution-diagram.jpg b/docs/cugraph/source/_static/nxcg-execution-diagram.jpg new file mode 100644 index 00000000000..48136289af9 Binary files /dev/null and b/docs/cugraph/source/_static/nxcg-execution-diagram.jpg differ diff --git a/docs/cugraph/source/basics/cugraph_cascading.md b/docs/cugraph/source/basics/cugraph_cascading.md deleted file mode 100644 index bad3d7fa6a8..00000000000 --- a/docs/cugraph/source/basics/cugraph_cascading.md +++ /dev/null @@ -1,53 +0,0 @@ - -# Method Cascading and cuGraph - -BLUF: cuGraph does not support method cascading - -[Method Cascading](https://en.wikipedia.org/wiki/Method_cascading) is a popular, and useful, functional programming concept and is a great way to make code more readable. Python supports method cascading ... _for the most part_. There are a number of Python built-in classes that do not support cascading. - -An example, from cuDF, is a sequence of method calls for loading data and then finding the largest values from a subset of the data (yes there are other ways this could be done): - -``` -gdf = cudf.from_pandas(df).query(β€˜val > 200’).nlargest(β€˜va’3) -``` - -cuGraph does not support method cascading for two main reasons: (1) the object-oriented nature of the Graph data object leverages in-place methods, and (2) the fact that algorithms operate on graphs rather than graphs running algorithms. - -## Graph Data Objects -cuGraph follows an object-oriented design for the Graph objects. Users create a Graph and can then add data to object, but every add method call returns `None`. - -_Why Inplace methods?_
-cuGraph focuses on the big graph problems where there are 10s of millions to trillions of edges (Giga bytes to Terabytes of data). At that scale, creating a copy of the data becomes memory inefficient. - -_Why not return `self` rather than `None`?_
-It would be simple to modify the methods to return `self` rather than `None`, however it opens the methods to misinterpretation. Consider the following code: - -``` -# cascade flow - makes sense -G = cugraph.Graph().from_cudf_edgelist(df) - -# non-cascaded code can be confusing -G = cugraph.Graph() -G2 = G.from_cudf_edgelist(df) -G3 = G.from_cudf_edgelist(df2) -``` -The confusion with the non-cascade code is that G, G1, and G3 are all the same object with the same data. Users could be confused since it is not obvious that changing G3 would also change both G2 and G. To prevent confusion, cuGraph has opted to not return `self`. - -_Why not add a flag "return_self" to the methods?_
-``` -# cascade flow - makes sense -G = cugraph.Graph().from_cudf_edgelist(df, return_self=True) -``` -The fact that a developer would explicitly add a "return_self" flag to the method indicates that the developer is aware that the method returns None. It is just as easy for the developer to use a non-cascading workflow. - -### Algorithms -Algorithms operate on graph objects. -``` -cugraph.pagerank(G) and not G.pagerank() -``` -This pattern allows cuGraph to maintain a particular object-oriented model, where Graph objects simply maintain graph data, and algorithm functions operate independently on Graph objects. While this model has benefits that simplify the overall design and its usability in the majority of use cases, it does mean that the developer cannot cascade graph creation into an algorithm call. - -``` -# will not work -G = cugraph.Graph().from_cudf_edgelist(df).pagerank() -``` diff --git a/docs/cugraph/source/basics/cugraph_intro.md b/docs/cugraph/source/basics/index.md similarity index 99% rename from docs/cugraph/source/basics/cugraph_intro.md rename to docs/cugraph/source/basics/index.md index 7ad2825604a..36aad5166bc 100644 --- a/docs/cugraph/source/basics/cugraph_intro.md +++ b/docs/cugraph/source/basics/index.md @@ -1,5 +1,5 @@ - # cuGraph Introduction + The Data Scientist has a collection of techniques within their proverbial toolbox. Data engineering, statistical analysis, and machine learning are among the most commonly known. However, there @@ -20,8 +20,8 @@ into the RAPIDS data science ecosystem and allows the data scientist to easily call graph algorithms using data stored in a GPU DataFrame, NetworkX Graphs, or even CuPy or SciPy sparse Matrix. - ## Vision + The vision of RAPIDS cuGraph is to ___make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks___. This is a goal that many of us on the cuGraph team have been @@ -48,7 +48,6 @@ high-speed ETL, statistics, and machine learning. To make things even better, RAPIDS and DASK allows cuGraph to scale to multiple GPUs to support multi-billion edge graphs. - ## Terminology cuGraph is a collection of GPU accelerated graph algorithms and graph utility diff --git a/docs/cugraph/source/basics/index.rst b/docs/cugraph/source/basics/index.rst deleted file mode 100644 index 7bba301b657..00000000000 --- a/docs/cugraph/source/basics/index.rst +++ /dev/null @@ -1,11 +0,0 @@ -====== -Basics -====== - - -.. toctree:: - :maxdepth: 2 - - cugraph_intro - nx_transition - cugraph_cascading diff --git a/docs/cugraph/source/basics/nx_transition.rst b/docs/cugraph/source/basics/nx_transition.rst deleted file mode 100644 index 9da2fe9b49e..00000000000 --- a/docs/cugraph/source/basics/nx_transition.rst +++ /dev/null @@ -1,181 +0,0 @@ -************************************** -NetworkX by calling cuGraph Algorithms -************************************** - - -*Note: this is a work in progress and will be updatred and changed as we better flesh out -compatibility issues* - -Latest Update -############# - -Last Update: March 7th, 2024 -Release: 24.04 - -**CuGraph is now a registered backend for networkX. This is described in the following blog: -`Accelerating NetworkX on NVIDIA GPUs for High Performance Graph Analytics -`_ - - -Easy Path – Use NetworkX Graph Objects, Accelerated Algorithms -############################################################## - -Rather than updating all of your existing code, simply update the calls to -graph algorithms by replacing the module name. This allows all the complicated -ETL code to be unchanged while still seeing significate performance -improvements. Again this will be deprecated since networkX dispatching to nx_cugraph -has many advantages. - - -.. image:: ../images/Nx_Cg_1.png - :width: 600 - -It is that easy. All algorithms in cuGraph support a NetworkX graph object as -input and match the NetworkX API list of arguments. - -Currently, cuGraph accepts both NetworkX Graph and DiGraph objects. We will be -adding support for Bipartite graph and Multigraph over the next few releases. - -Differences in Algorithms -########################## - -Since cuGraph currently does not support attribute rich graphs, those -algorithms that return simple scores (centrality, clustering, etc.) best match -the NetworkX process. Algorithms that return a subgraph will do so without -any additional attributes on the nodes or edges. - -Algorithms that exactly match -***************************** - -+-------------------------------+------------------------+ -| Algorithm | Differences | -+===============================+========================+ -| Core Number | None | -+-------------------------------+------------------------+ -| HITS | None | -+-------------------------------+------------------------+ -| PageRank | None | -+-------------------------------+------------------------+ -| Personal PageRank | None | -+-------------------------------+------------------------+ -| Strongly Connected Components | None | -+-------------------------------+------------------------+ -| Weakly Connected Components | None | -+-------------------------------+------------------------+ - -| - - - -Algorithms that do not copy over additional attributes -************************************************************************ - -+-------------------------------+-------------------------------------+ -| Algorithm | Differences | -+===============================+=====================================+ -| K-Truss | Does not copy over attributes | -+-------------------------------+-------------------------------------+ -| K-Core | Does not copy over attributes | -+-------------------------------+-------------------------------------+ -| Subgraph Extraction | Does not copy over attributes | -+-------------------------------+-------------------------------------+ - -| - - -Algorithms not in NetworkX -************************** - -+--------------------------------------+----------------------------+ -| Algorithm | Differences | -+======================================+============================+ -| Ensemble Clustering for Graphs (ECG) | Currently not in NetworkX | -+--------------------------------------+----------------------------+ -| Force Atlas 2 | Currently not in NetworkX | -+--------------------------------------+----------------------------+ -| Leiden | Currently not in NetworkX | -+--------------------------------------+----------------------------+ -| Louvain | Currently not in NetworkX | -+--------------------------------------+----------------------------+ -| Overlap coefficient | Currently not in NetworkX | -+--------------------------------------+----------------------------+ -| Spectral Clustering | Currently not in NetworkX | -+--------------------------------------+----------------------------+ - -| - - -Algorithm where not all arguments are supported -*********************************************** - -+----------------------------+-------------------------------------------------+ -| Algorithm | Differences | -+============================+=================================================+ -|Betweenness Centrality | weight is currently not supported – ignored | -| | endpoints is currently not supported – ignored | -+----------------------------+-------------------------------------------------+ -|Edge Betweenness Centrality | weight is currently not supported – ignored | -+----------------------------+-------------------------------------------------+ -| Katz Centrality | beta is currently not supported – ignored | -| | max_iter defaults to 100 versus 1000 | -+----------------------------+-------------------------------------------------+ - -| - -Algorithms where the results are different -****************************************** - - -For example, the NetworkX traversal algorithms typically return a generator -rather than a dictionary. - - -+----------------------------+-------------------------------------------------+ -| Algorithm | Differences | -+============================+=================================================+ -| Triangle Counting | this algorithm simply returns the total number | -| | of triangle and not the number per vertex | -| | (on roadmap to update) | -+----------------------------+-------------------------------------------------+ -| Jaccard coefficient | Currently we only do a 1-hop computation rather | -| | than an all-pairs. Fix is on roadmap | -+----------------------------+-------------------------------------------------+ -| Breadth First Search (BFS) | Returns a Pandas DataFrame with: | -| | [vertex][distance][predecessor] | -+----------------------------+-------------------------------------------------+ -| Single Source | Returns a Pandas DataFrame with: | -| Shortest Path (SSSP) | [vertex][distance][predecessor] | -+----------------------------+-------------------------------------------------+ - -| - -Graph Building -############## - -The biggest difference between NetworkX and cuGraph is with how Graph objects -are built. NetworkX, for the most part, stores graph data in a dictionary. -That structure allows easy insertion of new records. Consider the following -code for building a NetworkX Graph:: - - # Read the node data - df = pd.read_csv( data_file) - - # Construct graph from edge list. - G = nx.DiGraph() - - for row in df.iterrows(): - G.add_edge( - row[1]["1"], row[1]["2"], count=row[1]["3"] - ) - - -The code block is perfectly fine for NetworkX. However, the process of iterating over the dataframe and adding one node at a time is problematic for GPUs and something that we try and avoid. cuGraph stores data in columns (i.e. arrays). Resizing an array requires allocating a new array one element larger, copying the data, and adding the new value. That is not very efficient. - -If your code follows the above model of inserting one element at a time, the we suggest either rewriting that code or using it as is within NetworkX and just accelerating the algorithms with cuGraph. - -Now, if your code bulk loads the data from Pandas, then RAPIDS can accelerate that process by orders of magnitude. - -.. image:: ../images/Nx_Cg_2.png - :width: 600 - -The above cuGraph code will create cuGraph.Graph object and not a NetworkX.Graph object. diff --git a/docs/cugraph/source/graph_support/DGL_support.md b/docs/cugraph/source/graph_support/DGL_support.md index ba9a28e3170..7d32a9efe37 100644 --- a/docs/cugraph/source/graph_support/DGL_support.md +++ b/docs/cugraph/source/graph_support/DGL_support.md @@ -8,9 +8,12 @@ Install and update cugraph-dgl and the required dependencies using the command: -``` -conda install mamba -n base -c conda-forge -mamba install cugraph-dgl -c rapidsai-nightly -c rapidsai -c pytorch -c conda-forge -c nvidia -c dglteam +```shell +# CUDA 11 +conda install -c rapidsai -c pytorch -c conda-forge -c nvidia -c dglteam/label/th23_cu118 cugraph-dgl + +# CUDA 12 +conda install -c rapidsai -c pytorch -c conda-forge -c nvidia -c dglteam/label/th23_cu121 cugraph-dgl ``` ## Build from Source diff --git a/docs/cugraph/source/index.rst b/docs/cugraph/source/index.rst index 9ea9e4d65cf..0db1860b2b9 100644 --- a/docs/cugraph/source/index.rst +++ b/docs/cugraph/source/index.rst @@ -1,49 +1,87 @@ RAPIDS Graph documentation ========================== + .. image:: images/cugraph_logo_2.png :width: 600 -*Making graph analytics fast and easy regardless of scale* - - -.. list-table:: RAPIDS Graph covers a range of graph libraries and packages, that includes: - :widths: 25 25 25 - :header-rows: 1 - - * - Core - - GNN - - Extension - * - :abbr:`cugraph (Python wrapper with lots of convenience functions)` - - :abbr:`cugraph-ops (GNN aggregators and operators)` - - :abbr:`cugraph-service (Graph-as-a-service provides both Client and Server packages)` - * - :abbr:`pylibcugraph (light-weight Python wrapper with no guard rails)` - - :abbr:`cugraph-dgl (Accelerated extensions for use with the DGL framework)` - - - * - :abbr:`libcugraph (C++ API)` - - :abbr:`cugraph-pyg (Accelerated extensions for use with the PyG framework)` - - - * - :abbr:`libcugraph_etl (C++ renumbering function for strings)` - - :abbr:`wholegraph (Shared memory-based GPU-accelerated GNN training)` - - -.. -| ~~~~~~~~~~~~ Introduction ~~~~~~~~~~~~ cuGraph is a library of graph algorithms that seamlessly integrates into the RAPIDS data science ecosystem and allows the data scientist to easily call -graph algorithms using data stored in GPU DataFrames, NetworkX Graphs, or -even CuPy or SciPy sparse Matrices. +graph algorithms using data stored in cuDF/Pandas DataFrames or CuPy/SciPy +sparse matrices. + +--------------------------- +cuGraph Using NetworkX Code +--------------------------- + +cuGraph is now available as a NetworkX backend using `nx-cugraph `_. +Our major integration effort with NetworkX offers NetworkX users a **zero code change** option to accelerate +their existing NetworkX code using an NVIDIA GPU and cuGraph. + +Check out `zero code change accelerated NetworkX `_. If you would like to continue using standard cuGraph, then continue down below. + +---------------------------- +Getting started with cuGraph +---------------------------- + +Required hardware/software for cuGraph and `RAPIDS `_ + * NVIDIA GPU, Volta architecture or later, with `compute capability 7.0+ `_ + * CUDA 11.2-11.8, 12.0-12.5 + * Python version 3.10, 3.11, or 3.12 + +++++++++++++ +Installation +++++++++++++ + +Please see the latest `RAPIDS System Requirements documentation `_. + +This includes several ways to set up cuGraph + +* From Unix + + * `Conda `_ + * `Docker `_ + * `pip `_ + + +**Note: Windows use of RAPIDS depends on prior installation of** `WSL2 `_. + +* From Windows + + * `Conda `_ + * `Docker `_ + * `pip `_ + + Cugraph API Example + + .. code-block:: python + + import cugraph + import cudf + + # Create an instance of the popular Zachary Karate Club graph + from cugraph.datasets import karate + G = karate.get_graph() + + # Call cugraph.degree_centrality + vertex_bc = cugraph.degree_centrality(G) + + There are several resources containing cuGraph examples, the cuGraph `notebook repository `_ has many examples of loading graph data and running algorithms in Jupyter notebooks. + The cuGraph `test code `_ contains script examples of setting up and calling cuGraph algorithms. + + A simple example of `testing the degree centrality algorithm `_ is a good place to start. There are also `multi-GPU examples `_ with larger data sets as well. -Note: We are redoing all of our documents, please be patient as we update -the docs and links +---- -| +~~~~~~~~~~~~~~~~~ +Table of Contents +~~~~~~~~~~~~~~~~~ .. toctree:: :maxdepth: 2 - :caption: Contents: basics/index nx_cugraph/index @@ -54,8 +92,9 @@ the docs and links references/index api_docs/index +~~~~~~~~~~~~~~~~~~ Indices and tables -================== +~~~~~~~~~~~~~~~~~~ * :ref:`genindex` * :ref:`search` diff --git a/docs/cugraph/source/nx_cugraph/benchmarks.md b/docs/cugraph/source/nx_cugraph/benchmarks.md new file mode 100644 index 00000000000..45085c133a9 --- /dev/null +++ b/docs/cugraph/source/nx_cugraph/benchmarks.md @@ -0,0 +1,26 @@ +# Benchmarks + +## NetworkX vs. nx-cugraph +We ran several commonly used graph algorithms on both `networkx` and `nx-cugraph`. Here are the results + + +
+ +![bench-image](../_static/bc_benchmark.png) + +
Results from running this Benchmark
+
+ +## Reproducing Benchmarks + +Below are the steps to reproduce the results on your own. + +1. Clone the latest + +2. Follow the instructions to build and activate an environment + +4. Install the latest `nx-cugraph` by following the [Installation Guide](installation.md) + +5. Follow the instructions written in the README [here](https://github.com/rapidsai/cugraph/blob/HEAD/benchmarks/nx-cugraph/pytest-based) diff --git a/docs/cugraph/source/nx_cugraph/how-it-works.md b/docs/cugraph/source/nx_cugraph/how-it-works.md new file mode 100644 index 00000000000..5696688d1b5 --- /dev/null +++ b/docs/cugraph/source/nx_cugraph/how-it-works.md @@ -0,0 +1,113 @@ +# How it Works + +NetworkX has the ability to **dispatch function calls to separately-installed third-party backends**. + +NetworkX backends let users experience improved performance and/or additional functionality without changing their NetworkX Python code. Examples include backends that provide algorithm acceleration using GPUs, parallel processing, graph database integration, and more. + +While NetworkX is a pure-Python implementation, backends may be written to use other libraries and even specialized hardware. `nx-cugraph` is a NetworkX backend that uses RAPIDS cuGraph and NVIDIA GPUs to significantly improve NetworkX performance. + +![nxcg-execution-flow](../_static/nxcg-execution-diagram.jpg) + +## Enabling nx-cugraph + +It is recommended to use `networkx>=3.4` for optimal zero code change performance, but `nx-cugraph` will also work with `networkx 3.0+`. + +NetworkX will use `nx-cugraph` as the backend if any of the following are used: + +### `NX_CUGRAPH_AUTOCONFIG` environment variable. + +The `NX_CUGRAPH_AUTOCONFIG` environment variable can be used to configure NetworkX for full zero code change acceleration using `nx-cugraph`. If a NetworkX function is called that `nx-cugraph` supports, NetworkX will redirect the function call to `nx-cugraph` automatically, or fall back to either another backend if enabled or the default NetworkX implementation. See the [NetworkX documentation on backends](https://networkx.org/documentation/stable/reference/backends.html) for configuring NetworkX manually. + +``` +bash> NX_CUGRAPH_AUTOCONFIG=True python my_networkx_script.py +``` + +### `backend=` keyword argument + +To explicitly specify a particular backend for an API, use the `backend=` +keyword argument. This argument takes precedence over the +`NX_CUGRAPH_AUTOCONFIG` environment variable. This requires anyone +running code that uses the `backend=` keyword argument to have the specified +backend installed. + +Example: +```python +nx.betweenness_centrality(cit_patents_graph, k=k, backend="cugraph") +``` + +### Type-based dispatching + +NetworkX also supports automatically dispatching to backends associated with +specific graph types. Like the `backend=` keyword argument example above, this +requires the user to write code for a specific backend, and therefore requires +the backend to be installed, but has the advantage of ensuring a particular +behavior without the potential for runtime conversions. + +To use type-based dispatching with `nx-cugraph`, the user must import the backend +directly in their code to access the utilities provided to create a Graph +instance specifically for the `nx-cugraph` backend. + +Example: +```python +import networkx as nx +import nx_cugraph as nxcg + +G = nx.Graph() + +# populate the graph +# ... + +nxcg_G = nxcg.from_networkx(G) # conversion happens once here +nx.betweenness_centrality(nxcg_G, k=1000) # nxcg Graph type causes cugraph backend + # to be used, no conversion necessary +``` + +## Command Line Example + +--- + +Create `bc_demo.ipy` and paste the code below. + +```python +import pandas as pd +import networkx as nx + +url = "https://data.rapids.ai/cugraph/datasets/cit-Patents.csv" +df = pd.read_csv(url, sep=" ", names=["src", "dst"], dtype="int32") +G = nx.from_pandas_edgelist(df, source="src", target="dst") + +%time result = nx.betweenness_centrality(G, k=10) +``` +Run the command: +``` +user@machine:/# ipython bc_demo.ipy + +CPU times: user 7min 36s, sys: 5.22 s, total: 7min 41s +Wall time: 7min 41s +``` + +You will observe a run time of approximately 7 minutes...more or less depending on your CPU. + +Run the command again, this time specifying cugraph as the NetworkX backend. +```bash +user@machine:/# NX_CUGRAPH_AUTOCONFIG=True ipython bc_demo.ipy + +CPU times: user 4.14 s, sys: 1.13 s, total: 5.27 s +Wall time: 5.32 s +``` +This run will be much faster, typically around 5 seconds depending on your GPU. + +
+ +*Note, the examples above were run using the following specs*: + +    *NetworkX 3.4*
+    *nx-cugraph 24.10*
+    *CPU: Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz 45GB RAM*
+    *GPU: NVIDIA Quadro RTX 8000 80GB RAM*
+ +
+ +--- + +The latest list of algorithms supported by `nx-cugraph` can be found in [GitHub](https://github.com/rapidsai/cugraph/blob/HEAD/python/nx-cugraph/README.md#algorithms), or in the [Supported Algorithms Section](supported-algorithms.md). diff --git a/docs/cugraph/source/nx_cugraph/index.rst b/docs/cugraph/source/nx_cugraph/index.rst index ef6f51601ab..730958a5b73 100644 --- a/docs/cugraph/source/nx_cugraph/index.rst +++ b/docs/cugraph/source/nx_cugraph/index.rst @@ -1,9 +1,66 @@ -=============================== -nxCugraph as a NetworkX Backend -=============================== +nx-cugraph +----------- +``nx-cugraph`` is a NetworkX backend that provides **GPU acceleration** to many popular NetworkX algorithms. + +By simply `installing and enabling nx-cugraph `_, users can see significant speedup on workflows where performance is hindered by the default NetworkX implementation. + +Users can have GPU-based, large-scale performance **without** changing their familiar and easy-to-use NetworkX code. + +.. centered:: Timed result from running the following code snippet (called ``demo.ipy``, showing NetworkX with vs. without ``nx-cugraph``) + +.. code-block:: python + + import pandas as pd + import networkx as nx + + url = "https://data.rapids.ai/cugraph/datasets/cit-Patents.csv" + df = pd.read_csv(url, sep=" ", names=["src", "dst"], dtype="int32") + G = nx.from_pandas_edgelist(df, source="src", target="dst") + + %time result = nx.betweenness_centrality(G, k=10) + + +:: + + user@machine:/# ipython demo.ipy + CPU times: user 7min 36s, sys: 5.22 s, total: 7min 41s + Wall time: 7min 41s + + +:: + + user@machine:/# NX_CUGRAPH_AUTOCONFIG=True ipython demo.ipy + CPU times: user 4.14 s, sys: 1.13 s, total: 5.27 s + Wall time: 5.32 s + + +.. figure:: ../_static/colab.png + :width: 200px + :target: https://nvda.ws/4drM4re + + Try it on Google Colab! + + ++--------------------------------------------------------------------------------------------------------+ +| **Zero Code Change Acceleration** | +| | +| Just set the environment variable ``NX_CUGRAPH_AUTOCONFIG=True`` to enable ``nx-cugraph`` in NetworkX. | ++--------------------------------------------------------------------------------------------------------+ +| **Run the same code on CPU or GPU** | +| | +| Nothing changes, not even your `import` statements, when going from CPU to GPU. | ++--------------------------------------------------------------------------------------------------------+ + + +``nx-cugraph`` is now Generally Available (GA) as part of the ``RAPIDS`` package. See `RAPIDS +Quick Start `_ to get up-and-running with ``nx-cugraph``. .. toctree:: - :maxdepth: 2 + :maxdepth: 1 + :caption: Contents: - nx_cugraph.md + how-it-works + installation + supported-algorithms + benchmarks diff --git a/docs/cugraph/source/nx_cugraph/installation.md b/docs/cugraph/source/nx_cugraph/installation.md new file mode 100644 index 00000000000..a816801d001 --- /dev/null +++ b/docs/cugraph/source/nx_cugraph/installation.md @@ -0,0 +1,50 @@ +# Installing nx-cugraph + +This guide describes how to install ``nx-cugraph`` and use it in your workflows. + + +## System Requirements + +`nx-cugraph` requires the following: + + - **Volta architecture or later NVIDIA GPU, with [compute capability](https://developer.nvidia.com/cuda-gpus) 7.0+** + - **[CUDA](https://docs.nvidia.com/cuda/index.html) 11.2, 11.4, 11.5, 11.8, 12.0, 12.2, or 12.5** + - **Python >= 3.10** + - **[NetworkX](https://networkx.org/documentation/stable/install.html#) >= 3.0 (version 3.4 or higher recommended)** + +More details about system requirements can be found in the [RAPIDS System Requirements Documentation](https://docs.rapids.ai/install#system-req). + +## Installing Packages + +Read the [RAPIDS Quick Start Guide](https://docs.rapids.ai/install) to learn more about installing all RAPIDS libraries. + +`nx-cugraph` can be installed using conda or pip. It is included in the RAPIDS metapackage, or can be installed separately. + +### Conda +**Nightly version** +```bash +conda install -c rapidsai-nightly -c conda-forge -c nvidia nx-cugraph +``` + +**Stable version** +```bash +conda install -c rapidsai -c conda-forge -c nvidia nx-cugraph +``` + +### pip +**Nightly version** +```bash +pip install nx-cugraph-cu11 --extra-index-url https://pypi.anaconda.org/rapidsai-wheels-nightly/simple +``` + +**Stable version** +```bash +pip install nx-cugraph-cu11 --extra-index-url https://pypi.nvidia.com +``` + +
+ +**Note:** + - The `pip install` examples above are for CUDA 11. To install for CUDA 12, replace `-cu11` with `-cu12` + +
diff --git a/docs/cugraph/source/nx_cugraph/nx_cugraph.md b/docs/cugraph/source/nx_cugraph/nx_cugraph.md index 75a30b0be5c..900362a6e2b 100644 --- a/docs/cugraph/source/nx_cugraph/nx_cugraph.md +++ b/docs/cugraph/source/nx_cugraph/nx_cugraph.md @@ -1,18 +1,10 @@ ### nx_cugraph -nx-cugraph is a [NetworkX -backend]() that provides GPU acceleration to many popular NetworkX algorithms. - -By simply [installing and enabling nx-cugraph](), users can see significant speedup on workflows where performance is hindered by the default NetworkX implementation. With nx-cugraph, users can have GPU-based, large-scale performance without changing their familiar and easy-to-use NetworkX code. - -Let's look at some examples of algorithm speedups comparing NetworkX with and without GPU acceleration using nx-cugraph. - -Each chart has three measurements. -* NX - default NetworkX, no GPU acceleration -* nx-cugraph - GPU-accelerated NetworkX using nx-cugraph. This involves an internal conversion/transfer of graph data from CPU to GPU memory -* nx-cugraph (preconvert) - GPU-accelerated NetworkX using nx-cugraph with the graph data pre-converted/transferred to GPU +`nx-cugraph` is a [networkX backend]() that accelerates many popular NetworkX functions using cuGraph and NVIDIA GPUs. +Users simply [install and enable nx-cugraph](installation.md) to experience GPU speedups. +Lets look at some examples of algorithm speedups comparing CPU based NetworkX to dispatched versions run on GPU with nx_cugraph. ![Ancestors](../images/ancestors.png) ![BFS Tree](../images/bfs_tree.png) @@ -22,46 +14,3 @@ Each chart has three measurements. ![Pagerank](../images/pagerank.png) ![Single Source Shortest Path](../images/sssp.png) ![Weakly Connected Components](../images/wcc.png) - -### Command line example -Open bc_demo.ipy and paste the code below. - -``` -import pandas as pd -import networkx as nx - -url = "https://data.rapids.ai/cugraph/datasets/cit-Patents.csv" -df = pd.read_csv(url, sep=" ", names=["src", "dst"], dtype="int32") -G = nx.from_pandas_edgelist(df, source="src", target="dst") - -%time result = nx.betweenness_centrality(G, k=10) -``` -Run the command: -``` -user@machine:/# ipython bc_demo.ipy -``` - -You will observe a run time of approximately 7 minutes...more or less depending on your cpu. - -Run the command again, this time specifying cugraph as the NetworkX backend. -``` -user@machine:/# NETWORKX_BACKEND_PRIORITY=cugraph ipython bc_demo.ipy -``` -This run will be much faster, typically around 20 seconds depending on your GPU. -``` -user@machine:/# NETWORKX_BACKEND_PRIORITY=cugraph ipython bc_demo.ipy -``` -There is also an option to cache the graph conversion to GPU. This can dramatically improve performance when running multiple algorithms on the same graph. -``` -NETWORKX_BACKEND_PRIORITY=cugraph NETWORKX_CACHE_CONVERTED_GRAPHS=True ipython bc_demo.ipy -``` - -When running Python interactively, the cugraph backend can be specified as an argument in the algorithm call. - -For example: -``` -nx.betweenness_centrality(cit_patents_graph, k=k, backend="cugraph") -``` - - -The latest list of algorithms supported by nx-cugraph can be found [here](https://github.com/rapidsai/cugraph/blob/main/python/nx-cugraph/README.md#algorithms). diff --git a/docs/cugraph/source/nx_cugraph/supported-algorithms.rst b/docs/cugraph/source/nx_cugraph/supported-algorithms.rst new file mode 100644 index 00000000000..8f57c02b240 --- /dev/null +++ b/docs/cugraph/source/nx_cugraph/supported-algorithms.rst @@ -0,0 +1,355 @@ +Supported Algorithms +===================== + +The nx-cugraph backend to NetworkX connects +`pylibcugraph `_ (cuGraph's low-level Python +interface to its CUDA-based graph analytics library) and +`CuPy `_ (a GPU-accelerated array library) to NetworkX's +familiar and easy-to-use API. + +Below is the list of algorithms that are currently supported in nx-cugraph. + + +Algorithms +---------- + ++-----------------------------+ +| **Centrality** | ++=============================+ +| betweenness_centrality | ++-----------------------------+ +| edge_betweenness_centrality | ++-----------------------------+ +| degree_centrality | ++-----------------------------+ +| in_degree_centrality | ++-----------------------------+ +| out_degree_centrality | ++-----------------------------+ +| eigenvector_centrality | ++-----------------------------+ +| katz_centrality | ++-----------------------------+ + ++---------------------+ +| **Cluster** | ++=====================+ +| average_clustering | ++---------------------+ +| clustering | ++---------------------+ +| transitivity | ++---------------------+ +| triangles | ++---------------------+ + ++--------------------------+ +| **Community** | ++==========================+ +| louvain_communities | ++--------------------------+ + ++--------------------------+ +| **Bipartite** | ++==========================+ +| complete_bipartite_graph | ++--------------------------+ + ++------------------------------------+ +| **Components** | ++====================================+ +| connected_components | ++------------------------------------+ +| is_connected | ++------------------------------------+ +| node_connected_component | ++------------------------------------+ +| number_connected_components | ++------------------------------------+ +| weakly_connected | ++------------------------------------+ +| is_weakly_connected | ++------------------------------------+ +| number_weakly_connected_components | ++------------------------------------+ +| weakly_connected_components | ++------------------------------------+ + ++-------------+ +| **Core** | ++=============+ +| core_number | ++-------------+ +| k_truss | ++-------------+ + ++-------------+ +| **DAG** | ++=============+ +| ancestors | ++-------------+ +| descendants | ++-------------+ + ++--------------------+ +| **Isolate** | ++====================+ +| is_isolate | ++--------------------+ +| isolates | ++--------------------+ +| number_of_isolates | ++--------------------+ + ++-------------------+ +| **Link analysis** | ++===================+ +| hits | ++-------------------+ +| pagerank | ++-------------------+ + ++----------------+ +| **Operators** | ++================+ +| complement | ++----------------+ +| reverse | ++----------------+ + ++----------------------+ +| **Reciprocity** | ++======================+ +| overall_reciprocity | ++----------------------+ +| reciprocity | ++----------------------+ + ++---------------------------------------+ +| **Shortest Paths** | ++=======================================+ +| has_path | ++---------------------------------------+ +| shortest_path | ++---------------------------------------+ +| shortest_path_length | ++---------------------------------------+ +| all_pairs_shortest_path | ++---------------------------------------+ +| all_pairs_shortest_path_length | ++---------------------------------------+ +| bidirectional_shortest_path | ++---------------------------------------+ +| single_source_shortest_path | ++---------------------------------------+ +| single_source_shortest_path_length | ++---------------------------------------+ +| single_target_shortest_path | ++---------------------------------------+ +| single_target_shortest_path_length | ++---------------------------------------+ +| all_pairs_bellman_ford_path | ++---------------------------------------+ +| all_pairs_bellman_ford_path_length | ++---------------------------------------+ +| all_pairs_dijkstra | ++---------------------------------------+ +| all_pairs_dijkstra_path | ++---------------------------------------+ +| all_pairs_dijkstra_path_length | ++---------------------------------------+ +| bellman_ford_path | ++---------------------------------------+ +| bellman_ford_path_length | ++---------------------------------------+ +| dijkstra_path | ++---------------------------------------+ +| dijkstra_path_length | ++---------------------------------------+ +| single_source_bellman_ford | ++---------------------------------------+ +| single_source_bellman_ford_path | ++---------------------------------------+ +| single_source_bellman_ford_path_length| ++---------------------------------------+ +| single_source_dijkstra | ++---------------------------------------+ +| single_source_dijkstra_path | ++---------------------------------------+ +| single_source_dijkstra_path_length | ++---------------------------------------+ + ++---------------------------+ +| **Traversal** | ++===========================+ +| bfs_edges | ++---------------------------+ +| bfs_layers | ++---------------------------+ +| bfs_predecessors | ++---------------------------+ +| bfs_successors | ++---------------------------+ +| bfs_tree | ++---------------------------+ +| descendants_at_distance | ++---------------------------+ +| generic_bfs_edges | ++---------------------------+ + ++---------------------+ +| **Tree** | ++=====================+ +| is_arborescence | ++---------------------+ +| is_branching | ++---------------------+ +| is_forest | ++---------------------+ +| is_tree | ++---------------------+ + + +Utilities +------- + ++-------------------------+ +| **Classes** | ++=========================+ +| is_negatively_weighted | ++-------------------------+ + ++----------------------+ +| **Convert** | ++======================+ +| from_dict_of_lists | ++----------------------+ +| to_dict_of_lists | ++----------------------+ + ++--------------------------+ +| **Convert Matrix** | ++==========================+ +| from_pandas_edgelist | ++--------------------------+ +| from_scipy_sparse_array | ++--------------------------+ + ++-----------------------------------+ +| **Relabel** | ++===================================+ +| convert_node_labels_to_integers | ++-----------------------------------+ +| relabel_nodes | ++-----------------------------------+ + +Generators +------------ + ++-------------------------------+ +| **Classic** | ++===============================+ +| barbell_graph | ++-------------------------------+ +| circular_ladder_graph | ++-------------------------------+ +| complete_graph | ++-------------------------------+ +| complete_multipartite_graph | ++-------------------------------+ +| cycle_graph | ++-------------------------------+ +| empty_graph | ++-------------------------------+ +| ladder_graph | ++-------------------------------+ +| lollipop_graph | ++-------------------------------+ +| null_graph | ++-------------------------------+ +| path_graph | ++-------------------------------+ +| star_graph | ++-------------------------------+ +| tadpole_graph | ++-------------------------------+ +| trivial_graph | ++-------------------------------+ +| turan_graph | ++-------------------------------+ +| wheel_graph | ++-------------------------------+ + ++-----------------+ +| **Classic** | ++=================+ +| caveman_graph | ++-----------------+ + ++------------+ +| **Ego** | ++============+ +| ego_graph | ++------------+ + ++------------------------------+ +| **small** | ++==============================+ +| bull_graph | ++------------------------------+ +| chvatal_graph | ++------------------------------+ +| cubical_graph | ++------------------------------+ +| desargues_graph | ++------------------------------+ +| diamond_graph | ++------------------------------+ +| dodecahedral_graph | ++------------------------------+ +| frucht_graph | ++------------------------------+ +| heawood_graph | ++------------------------------+ +| house_graph | ++------------------------------+ +| house_x_graph | ++------------------------------+ +| icosahedral_graph | ++------------------------------+ +| krackhardt_kite_graph | ++------------------------------+ +| moebius_kantor_graph | ++------------------------------+ +| octahedral_graph | ++------------------------------+ +| pappus_graph | ++------------------------------+ +| petersen_graph | ++------------------------------+ +| sedgewick_maze_graph | ++------------------------------+ +| tetrahedral_graph | ++------------------------------+ +| truncated_cube_graph | ++------------------------------+ +| truncated_tetrahedron_graph | ++------------------------------+ +| tutte_graph | ++------------------------------+ + ++-------------------------------+ +| **Social** | ++===============================+ +| davis_southern_women_graph | ++-------------------------------+ +| florentine_families_graph | ++-------------------------------+ +| karate_club_graph | ++-------------------------------+ +| les_miserables_graph | ++-------------------------------+ + + +To request nx-cugraph backend support for a NetworkX API that is not listed +above, visit the `cuGraph GitHub repo `_. diff --git a/docs/cugraph/source/wholegraph/installation/container.md b/docs/cugraph/source/wholegraph/installation/container.md index 3a2c627c56a..6aac53cf88f 100644 --- a/docs/cugraph/source/wholegraph/installation/container.md +++ b/docs/cugraph/source/wholegraph/installation/container.md @@ -24,6 +24,7 @@ RUN pip3 install Cython setuputils3 scikit-build nanobind pytest-forked pytest To run GNN applications, you may also need cuGraphOps, DGL and/or PyG libraries to run the GNN layers. You may refer to [DGL](https://www.dgl.ai/pages/start.html) or [PyG](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html) For example, to install DGL, you may need to add: + ```dockerfile -RUN pip3 install dgl -f https://data.dgl.ai/wheels/cu118/repo.html +RUN pip3 install dgl -f https://data.dgl.ai/wheels/torch-2.3/cu118/repo.html ``` diff --git a/notebooks/demo/accelerating_networkx.ipynb b/notebooks/demo/accelerating_networkx.ipynb new file mode 100644 index 00000000000..1a6c6cfb3f6 --- /dev/null +++ b/notebooks/demo/accelerating_networkx.ipynb @@ -0,0 +1,614 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "R2cpVp2WdOsp" + }, + "source": [ + "# NetworkX - Easy Graph Analytics\n", + "\n", + "NetworkX is the most popular library for graph analytics available in Python, or quite possibly any language. To illustrate this, NetworkX was downloaded more than 71 million times in September of 2024 alone, which is roughly 71 times more than the next most popular graph analytics library! [*](https://en.wikipedia.org/wiki/NetworkX) NetworkX has earned this popularity from its very easy-to-use API, the wealth of documentation and examples available, the large (and friendly) community behind it, and its easy installation which requires nothing more than Python.\n", + "\n", + "However, NetworkX users are familiar with the tradeoff that comes with those benefits. The pure-Python implementation often results in poor performance when graph data starts to reach larger scales, limiting the usefulness of the library for many real-world problems.\n", + "\n", + "# Accelerated NetworkX - Easy (and fast!) Graph Analytics\n", + "\n", + "To address the performance problem, NetworkX 3.0 introduced a mechanism to dispatch algorithm calls to alternate implementations. The NetworkX Python API remains the same but NetworkX will use more capable algorithm implementations provided by one or more backends. This approach means users don't have to give up NetworkX -or even change their code- in order to take advantage of GPU performance." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xkg10FrNThrK" + }, + "source": [ + "# Let's Get the Environment Setup\n", + "This notebook will demonstrate NetworkX both with and without GPU acceleration provided by the `nx-cugraph` backend.\n", + "\n", + "`nx-cugraph` is available as a package installable using `pip`, `conda`, and [from source](https://github.com/rapidsai/nx-cugraph). Before importing `networkx`, lets install `nx-cugraph` so it can be registered as an available backend by NetworkX when needed. We'll use `pip` to install.\n", + "\n", + "NOTES:\n", + "* `nx-cugraph` requires a compatible NVIDIA GPU, NVIDIA CUDA and associated drivers, and a supported OS. Details about these and other installation prerequisites can be seen [here](https://docs.rapids.ai/install#system-req).\n", + "* The `nx-cugraph` package is currently hosted by NVIDIA and therefore the `--extra-index-url` option must be used.\n", + "* `nx-cugraph` is supported on specific 11.x and 12.x CUDA versions, and the major version number must be known in order to install the correct build (this is determined automatically when using `conda`).\n", + "\n", + "To find the CUDA major version on your system, run the following command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NMFwzc1I95BS" + }, + "outputs": [], + "source": [ + "!nvcc --version" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i91Yj-yZ-nGS" + }, + "source": [ + "From the above output we can see we're using CUDA 12.x so we'll be installing `nx-cugraph-cu12`. If we were using CUDA 11.x, the package name would be `nx-cugraph-cu11`. We'll also be adding `https://pypi.nvidia.com` as an `--extra-index-url`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mYYN9EpnWphu" + }, + "outputs": [], + "source": [ + "!pip install nx-cugraph-cu12 --extra-index-url=https://pypi.nvidia.com" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0h1K-7tI_AZH" + }, + "source": [ + "Of course, we'll also be using `networkx`, which is already provided in the Colab environment. This notebook will be using features added in version 3.3, so we'll import it here to verify we have a compatible version." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YTV0ZTME2tV6" + }, + "outputs": [], + "source": [ + "import networkx as nx\n", + "nx.__version__" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UiZKOa3WC7be" + }, + "source": [ + "# Let's Start with Something Simple\n", + "\n", + "To begin, we'll compare NetworkX results without a backend to results of the same algorithm using the `nx-cugraph` backend on a small graph. `nx.karate_club_graph()` returns an instance of the famous example graph consisting of 34 nodes and 78 edges from Zachary's paper, described [here](https://en.wikipedia.org/wiki/Zachary%27s_karate_club)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3atL3tI0frYm" + }, + "source": [ + "## Betweenness Centrality\n", + "[Betweenness Centrality](https://en.wikipedia.org/wiki/Betweenness_centrality) is a graph algorithm that computes a centrality score for each node (`v`) based on how many of the shortest paths between pairs of nodes in the graph pass through `v`. A higher centrality score represents a node that \"connects\" other nodes in a network more than that of a node with a lower score.\n", + "\n", + "First, let's create a NetworkX Graph instance of the the Karate Club graph and inspect it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JSw7EZ46-kRu" + }, + "outputs": [], + "source": [ + "G = nx.karate_club_graph()\n", + "G.number_of_nodes(), G.number_of_edges()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_-E17u2gKgbC" + }, + "source": [ + "Next, let's run betweenness centrality and save the results. Because the Karate Club graph is so small, this should not take long." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qjxXXKJhKQ4s" + }, + "outputs": [], + "source": [ + "%%time\n", + "nx_bc_results = nx.betweenness_centrality(G)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ClrR3z9XMfLr" + }, + "source": [ + "Now, let's run the same algorithm on the same data using the `nx-cugraph` backend.\n", + "\n", + "There are several ways to instruct NetworkX to use a particular backend instead of the default implementation. Here, we will use the `config` API, which was added in NetworkX version 3.3.\n", + "\n", + "The following two lines set the backend to \"cugraph\" and enable graph conversion caching.\n", + "\n", + "Some notes:\n", + "* The standard convention for NetworkX backends is to name the package with a `nx-` prefix to denote that these are packages intended to be used with NetworkX, but the `nx-` prefix is not included when referring to them in NetworkX API calls. Here, `nx-cugraph` is the name of the backend package, and `\"cugraph\"` is the name NetworkX will use to refer to it.\n", + "* NetworkX can use multiple backends! `nx.config.backend_priority` is a list that can contain several backends, ordered based on priority. If a backend in the list cannot run a particular algorithm (either because it isn't supported in the backend, the algorithm doesn't support a particular option, or some other reason), NetworkX will try the next backend in the list. If no specified backend is able to run the algorithm, NetworkX will fall back to the default implementation.\n", + "* Many backends have their own data structures for representing an input graph, often optimized for that backend's implementation. Prior to running a backend algorithm, NetworkX will have the backend convert the standard NetworkX Graph instance to the backend-specific type. This conversion can be expensive, and rather than repeat it as part of each algorithm call, NetworkX can cache the conversion so it can be skipped on future calls if the graph doesn't change. This caching can save significant time and improve overall performance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oFHwNqqsNsqS" + }, + "outputs": [], + "source": [ + "nx.config.backend_priority=[\"cugraph\"] # NETWORKX_BACKEND_PRIORITY=cugraph\n", + "nx.config.cache_converted_graphs=True # NETWORKX_CACHE_CONVERTED_GRAPHS=True" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HrUeWRRQRzFP" + }, + "outputs": [], + "source": [ + "%%time\n", + "nxcg_bc_results = nx.betweenness_centrality(G)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z1hxut3GTj5A" + }, + "source": [ + "You may have noticed that using the `nx-cugraph` backend resulted in a slightly slower execution time. This is not surprising when working with a graph this small, since the overhead of converting the graph for the first time and launching the algorithm kernel on the GPU is actually significantly more than the computation time itself. We'll see later that this overhead is negligible when compared to the time saved when running on a GPU for larger graphs.\n", + "\n", + "Since we've enabled graph conversion caching, we can see that if we re-run the same call the execution time is noticeably shorter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7a0XvpUOr9Ju" + }, + "outputs": [], + "source": [ + "%%time\n", + "nxcg_bc_results = nx.betweenness_centrality(G)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ppjE5J5RscOe" + }, + "source": [ + "Notice the warning above about using the cache. This will only be raised **once** per graph instance (it can also be easily disabled), but its purpose is to point out that the cache should not be used if the Graph object will have its attribute dictionary modified directly. In this case and many others, we won't be modifying the dictionaries directly. Instead, we will use APIs such as `nx.set_node_attributes` which properly clear the cache, so it's safe for us to use the cache. Because of that, we'll disable the warning so we don't see it on other graphs in this session." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Namb5JLvwS-q" + }, + "outputs": [], + "source": [ + "import warnings\n", + "warnings.filterwarnings(\"ignore\", message=\"Using cached graph for 'cugraph' backend\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BzGAphcILFsT" + }, + "source": [ + "Smaller graphs are also easy to visualize with NetworkX's plotting utilities. The flexibility of NetworkX's `Graph` instances make it trivial to add the betweenness centrality scores back to the graph object as node attributes. This will allow us to use those values for the visualization.\n", + "\n", + "In this case, we'll create new attributes for each node called \"nx_bc\" for the default NetworkX results, and \"nxcg_bc\" for the nx-cugraph results. We'll use those values to assign the color for each node and plot two graphs side-by-side. This will make it easy to visually validate that the nodes with the higher centrality scores for both implementations match and do indeed appear to be more \"central\" to other nodes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1coV6ZfcUoqI" + }, + "outputs": [], + "source": [ + "nx.set_node_attributes(G, nx_bc_results, \"nx_bc\")\n", + "nx.set_node_attributes(G, nxcg_bc_results, \"nxcg_bc\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Sba2iYJgLoN2" + }, + "outputs": [], + "source": [ + "# Configure plot size and layout/position for each node\n", + "import matplotlib.pyplot as plt\n", + "plt.rcParams['figure.figsize'] = [12, 8]\n", + "pos = nx.spring_layout(G)\n", + "\n", + "# Assign colors for each set of betweenness centrality results\n", + "nx_colors = [G.nodes[n][\"nx_bc\"] for n in G.nodes()]\n", + "nxcg_colors = [G.nodes[n][\"nxcg_bc\"] for n in G.nodes()]\n", + "\n", + "# Plot the graph and color each node corresponding to NetworkX betweenness centrality values\n", + "plt.subplot(1, 2, 1)\n", + "nx.draw(G, pos=pos, with_labels=True, node_color=nx_colors)\n", + "\n", + "# Plot the graph and color each node corresponding to nx-cugraph betweenness centrality values\n", + "plt.subplot(1, 2, 2)\n", + "nx.draw(G, pos=pos, with_labels=True, node_color=nxcg_colors)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dJXH4Zn5VNSg" + }, + "source": [ + "As we can see, the same two nodes (`0` and `33`) are the two most central in both graphs, followed by `2`, `31`, and `32`.\n", + "\n", + "## PageRank\n", + "Another popular algorithm is [PageRank](https://en.wikipedia.org/wiki/PageRank). PageRank also assigns scores to each node, but these scores are based on analyzing links to each node to determine relative \"importance\" within the graph.\n", + "\n", + "Let's update the config to use the default NetworkX implementation and run `nx.pagerank`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9CdYNk62E1v_" + }, + "outputs": [], + "source": [ + "nx.config.backend_priority=[]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Jo39YxVmYolq" + }, + "outputs": [], + "source": [ + "%%time\n", + "nx_pr_results = nx.pagerank(G)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sV6dM8ToZDiC" + }, + "source": [ + "We could set `nx.config.backend_priority` again to list `\"cugraph\"` as the backend, but let's instead show how the `backend` kwarg can be used to override the priority list and force a specific backend to be used." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oMSvQVGKY0rn" + }, + "outputs": [], + "source": [ + "%%time\n", + "nxcg_pr_results = nx.pagerank(G, backend=\"cugraph\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZGux_8xFZneI" + }, + "source": [ + "In this example, instead of plotting the graph to show that the results are identical, we can compare them directly using the saved values from both runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RcmtdFy4Zw7p" + }, + "outputs": [], + "source": [ + "sorted(nx_pr_results) == sorted(nxcg_pr_results)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mefjUEAnZ4pq" + }, + "source": [ + "# Working with Bigger Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yLY-yl6PuNYo" + }, + "source": [ + "Now we'll look at a larger dataset from https://snap.stanford.edu/data/cit-Patents.html which contains citations across different U.S. patents granted from January 1, 1963 to December 30, 1999. The dataset represents 16.5M citations (edges) between 3.77M patents (nodes).\n", + "\n", + "This will demonstrate that data of this size starts to push the limits of the default pure-Python NetworkX implementation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lyYF0LbtFwjh" + }, + "outputs": [], + "source": [ + "# The locale encoding may have been modified from the plots above, reset here to run shell commands\n", + "import locale\n", + "locale.getpreferredencoding = lambda: \"UTF-8\"\n", + "!wget https://data.rapids.ai/cugraph/datasets/cit-Patents.csv # Skip if cit-Patents.csv already exists.\n", + "# !wget https://snap.stanford.edu/data/cit-Patents.txt.gz # Skip if cit-Patents.txt.gz already exists." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kjGINYphQSQ2" + }, + "outputs": [], + "source": [ + "%load_ext cudf.pandas\n", + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iV4DieGZOalc" + }, + "outputs": [], + "source": [ + "%%time\n", + "df = pd.read_csv(\"cit-Patents.csv\",\n", + " sep=\" \",\n", + " names=[\"src\", \"dst\"],\n", + " dtype=\"int32\",\n", + ")\n", + "# df = pd.read_csv(\"cit-Patents.txt.gz\",\n", + "# compression=\"gzip\",\n", + "# skiprows=4,\n", + "# sep=\"\\t\",\n", + "# names=[\"src\", \"dst\"],\n", + "# dtype=\"int32\",\n", + "# )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PREA67u4eKat" + }, + "outputs": [], + "source": [ + "%%time\n", + "G = nx.from_pandas_edgelist(df, source=\"src\", target=\"dst\")\n", + "G.number_of_nodes(), G.number_of_edges()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NcsUxBqpu4zY" + }, + "source": [ + "By default, `nx.betweenness_centrality` will perform an all-pairs shortest path analysis when determining the centrality scores for each node. However, due to the much larger size of this graph, determining the shortest path for all pairs of nodes in the graph is not feasible. Instead, we'll use the parameter `k` to limit the number of shortest path computations used for determining the centrality scores, at the expense of accuracy. As we'll see when using a dataset this size with `nx.betweenness_centrality`, we have to limit `k` to `1` which is not practical but is sufficient here for demonstration purposes (since anything larger than `1` will result in many minutes of execution time)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gNDWbj3kAk3j" + }, + "outputs": [], + "source": [ + "%%time\n", + "bc_results = nx.betweenness_centrality(G, k=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NB8xmxMd1PlX" + }, + "source": [ + "Now we'll configure NetworkX to use the `nx-cugraph` backend (again, using the name convention that drops the package name's `nx-` prefix) and run the same call. Because this is a Graph that `nx-cugraph` hasn't seen before, the runtime will include the time to convert and cache a GPU-based graph." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xUYNG1xhvbWc" + }, + "outputs": [], + "source": [ + "nx.config.backend_priority = [\"cugraph\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cmK8ZuQGvfPo" + }, + "outputs": [], + "source": [ + "%%time\n", + "bc_results = nx.betweenness_centrality(G, k=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vdHb1YXP15TZ" + }, + "source": [ + "Let's run betweenness centrality again, now with a more useful number of samples by setting `k=100`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fKjIrzL-vrGS" + }, + "outputs": [], + "source": [ + "%%time\n", + "bc_results = nx.betweenness_centrality(G, k=100)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QeMcrAX2HZSM" + }, + "source": [ + "Let's also run pagerank on the same dataset to compare." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gR8ID6ekHgHt" + }, + "outputs": [], + "source": [ + "nx.config.backend_priority = []" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rTFuvX5wb_c1" + }, + "outputs": [], + "source": [ + "%%time\n", + "nx_pr_results = nx.pagerank(G)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8sJx9aeJV9hv" + }, + "outputs": [], + "source": [ + "%%time\n", + "nxcg_pr_results = nx.pagerank(G, backend=\"cugraph\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wGOVQ6ZyY4Ih" + }, + "outputs": [], + "source": [ + "sorted(nx_pr_results) == sorted(nxcg_pr_results)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "k2DfAaZaDIBj" + }, + "source": [ + "---\n", + "\n", + "Information on the U.S. Patent Citation Network dataset used in this notebook is as follows:\n", + "
Authors: Jure Leskovec and Andrej Krevl\n", + "
Title: SNAP Datasets, Stanford Large Network Dataset Collection\n", + "
URL: http://snap.stanford.edu/data\n", + "
Date: June 2014\n", + "
\n" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/python/cugraph-dgl/README.md b/python/cugraph-dgl/README.md index ac4cb2f6253..013d4fe5e2e 100644 --- a/python/cugraph-dgl/README.md +++ b/python/cugraph-dgl/README.md @@ -8,9 +8,12 @@ Install and update cugraph-dgl and the required dependencies using the command: -``` -conda install mamba -n base -c conda-forge -mamba install cugraph-dgl -c rapidsai-nightly -c rapidsai -c pytorch -c conda-forge -c nvidia -c dglteam +```shell +# CUDA 11 +conda install -c rapidsai -c pytorch -c conda-forge -c nvidia -c dglteam/label/th23_cu118 cugraph-dgl + +# CUDA 12 +conda install -c rapidsai -c pytorch -c conda-forge -c nvidia -c dglteam/label/th23_cu121 cugraph-dgl ``` ## Build from Source diff --git a/python/cugraph-dgl/conda/cugraph_dgl_dev_cuda-118.yaml b/python/cugraph-dgl/conda/cugraph_dgl_dev_cuda-118.yaml index 42cbcab5008..174012b8f8c 100644 --- a/python/cugraph-dgl/conda/cugraph_dgl_dev_cuda-118.yaml +++ b/python/cugraph-dgl/conda/cugraph_dgl_dev_cuda-118.yaml @@ -4,13 +4,12 @@ channels: - rapidsai - rapidsai-nightly - dask/label/dev -- pyg -- dglteam/label/cu118 +- dglteam/label/th23_cu118 - conda-forge - nvidia dependencies: - cugraph==24.12.*,>=0.0.0a0 -- dgl>=1.1.0.cu* +- dgl>=2.4.0.cu* - pandas - pre-commit - pylibcugraphops==24.12.*,>=0.0.0a0 diff --git a/python/cugraph-pyg/conda/cugraph_pyg_dev_cuda-118.yaml b/python/cugraph-pyg/conda/cugraph_pyg_dev_cuda-118.yaml index 39b1ab21edb..4778ff0eaf6 100644 --- a/python/cugraph-pyg/conda/cugraph_pyg_dev_cuda-118.yaml +++ b/python/cugraph-pyg/conda/cugraph_pyg_dev_cuda-118.yaml @@ -4,15 +4,13 @@ channels: - rapidsai - rapidsai-nightly - dask/label/dev -- pyg -- dglteam/label/cu118 +- dglteam/label/th23_cu118 - conda-forge - nvidia dependencies: - cugraph==24.12.*,>=0.0.0a0 - pandas - pre-commit -- pyg>=2.5,<2.6 - pylibcugraphops==24.12.*,>=0.0.0a0 - pytest - pytest-benchmark @@ -20,6 +18,7 @@ dependencies: - pytest-xdist - pytorch-cuda==11.8 - pytorch>=2.3,<2.4.0a0 +- pytorch_geometric>=2.5,<2.6 - scipy - tensordict>=0.1.2 name: cugraph_pyg_dev_cuda-118 diff --git a/python/cugraph/cugraph/structure/graph_classes.py b/python/cugraph/cugraph/structure/graph_classes.py index e90c0576f55..84234f7e904 100644 --- a/python/cugraph/cugraph/structure/graph_classes.py +++ b/python/cugraph/cugraph/structure/graph_classes.py @@ -116,6 +116,7 @@ def from_cudf_edgelist( renumber=True, store_transposed=False, legacy_renum_only=False, + symmetrize=None, ): """ Initialize a graph from the edge list. It is an error to call this @@ -174,6 +175,15 @@ def from_cudf_edgelist( This parameter is deprecated and will be removed. + symmetrize: bool, optional (default=None) + If True, symmetrize the edge list for an undirected graph. Setting + this flag to True for a directed graph returns an error. The default + behavior symmetrizes the edges if the graph is undirected. This flag + cannot be set to True if the edgelist contains edge IDs or edge Types. + If the incoming edgelist is intended for an undirected graph and it is + known to be symmetric, this flag can be set to False to skip the + symmetrization step for better performance. + Examples -------- >>> df = cudf.read_csv(datasets_path / 'karate.csv', delimiter=' ', @@ -201,6 +211,7 @@ def from_cudf_edgelist( renumber=renumber, store_transposed=store_transposed, legacy_renum_only=legacy_renum_only, + symmetrize=symmetrize, ) def from_cudf_adjlist( @@ -210,6 +221,7 @@ def from_cudf_adjlist( value_col=None, renumber=True, store_transposed=False, + symmetrize=None, ): """ Initialize a graph from the adjacency list. It is an error to call this @@ -247,6 +259,14 @@ def from_cudf_adjlist( store_transposed : bool, optional (default=False) If True, stores the transpose of the adjacency matrix. Required for certain algorithms. + symmetrize: bool, optional (default=None) + If True, symmetrize the edge list for an undirected graph. Setting + this flag to True for a directed graph returns an error. The default + behavior symmetrizes the edges if the graph is undirected. This flag + cannot be set to True if the edgelist contains edge IDs or edge Types. + If the incoming edgelist is intended for an undirected graph and it is + known to be symmetric, this flag can be set to False to skip the + symmetrization step for better performance. Examples -------- @@ -268,7 +288,12 @@ def from_cudf_adjlist( raise RuntimeError("Graph is already initialized") elif self._Impl.edgelist is not None or self._Impl.adjlist is not None: raise RuntimeError("Graph already has values") - self._Impl._simpleGraphImpl__from_adjlist(offset_col, index_col, value_col) + self._Impl._simpleGraphImpl__from_adjlist( + offset_col=offset_col, + index_col=index_col, + value_col=value_col, + symmetrize=symmetrize, + ) def from_dask_cudf_edgelist( self, diff --git a/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py b/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py index 7f3f7e83e59..83dad234287 100644 --- a/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py +++ b/python/cugraph/cugraph/structure/graph_implementation/simpleDistributedGraph.py @@ -34,7 +34,6 @@ ) from cugraph.structure.number_map import NumberMap -from cugraph.structure.symmetrize import symmetrize from cugraph.dask.common.part_utils import ( persist_dask_df_equal_parts_per_worker, ) @@ -98,6 +97,7 @@ def _make_plc_graph( edge_id_type, edge_type_id, drop_multi_edges, + symmetrize, ): weights = None edge_ids = None @@ -151,6 +151,7 @@ def _make_plc_graph( else ([cudf.Series(dtype=edge_type_id)] if edge_type_id else None), num_arrays=num_arrays, store_transposed=store_transposed, + symmetrize=symmetrize, do_expensive_check=False, drop_multi_edges=drop_multi_edges, ) @@ -172,6 +173,7 @@ def __from_edgelist( renumber=True, store_transposed=False, legacy_renum_only=False, + symmetrize=None, ): if not isinstance(input_ddf, dask_cudf.DataFrame): raise TypeError("input should be a dask_cudf dataFrame") @@ -184,6 +186,35 @@ def __from_edgelist( ].dtype not in [np.int32, np.int64]: raise ValueError("set renumber to True for non integer columns ids") + if self.properties.directed and symmetrize: + raise ValueError( + "The edgelist can only be symmetrized for undirected graphs." + ) + + if self.properties.directed: + if symmetrize: + raise ValueError( + "The edgelist can only be symmetrized for undirected graphs." + ) + else: + if symmetrize or symmetrize is None: + unsupported = False + if edge_id is not None or edge_type is not None: + unsupported = True + if isinstance(edge_attr, list): + if len(edge_attr) > 1: + unsupported = True + if unsupported: + raise ValueError( + "Edge list containing Edge Ids or Types can't be symmetrized. " + "If the edges are already symmetric, set the 'symmetrize' " + "flag to False" + ) + + if symmetrize is None: + # default behavior + symmetrize = not self.properties.directed + s_col = source d_col = destination if not isinstance(s_col, list): @@ -266,27 +297,11 @@ def __from_edgelist( ddf_columns += value_col_names input_ddf = input_ddf[ddf_columns] - if len(value_col_names) == 0: - source_col, dest_col = symmetrize( - input_ddf, - source, - destination, - multi=True, # Deprecated parameter - symmetrize=not self.properties.directed, - ) - value_col = None - else: - source_col, dest_col, value_col = symmetrize( - input_ddf, - source, - destination, - value_col_names, - multi=True, # Deprecated parameter - symmetrize=not self.properties.directed, - ) - # Create a dask_cudf dataframe from the cudf series # or dataframe objects obtained from symmetrization + source_col = input_ddf[source] + dest_col = input_ddf[destination] + value_col = input_ddf[value_col_names] if isinstance(source_col, dask_cudf.Series): frames = [ source_col.to_frame(name=source), @@ -370,6 +385,7 @@ def __from_edgelist( self.edge_id_type, self.edge_type_id_type, not self.properties.multi_edge, + not self.properties.directed, ) for w, edata in persisted_keys_d.items() } diff --git a/python/cugraph/cugraph/structure/graph_implementation/simpleGraph.py b/python/cugraph/cugraph/structure/graph_implementation/simpleGraph.py index bc5cca67c2e..858b114ebdc 100644 --- a/python/cugraph/cugraph/structure/graph_implementation/simpleGraph.py +++ b/python/cugraph/cugraph/structure/graph_implementation/simpleGraph.py @@ -13,7 +13,7 @@ from cugraph.structure import graph_primtypes_wrapper from cugraph.structure.replicate_edgelist import replicate_cudf_dataframe -from cugraph.structure.symmetrize import symmetrize +from cugraph.structure.symmetrize import symmetrize as symmetrize_df from cugraph.structure.number_map import NumberMap import cugraph.dask.common.mg_utils as mg_utils import cudf @@ -134,6 +134,7 @@ def __from_edgelist( renumber=True, legacy_renum_only=False, store_transposed=False, + symmetrize=None, ): if legacy_renum_only: warning_msg = ( @@ -143,6 +144,35 @@ def __from_edgelist( warning_msg, ) + if self.properties.directed and symmetrize: + raise ValueError( + "The edgelist can only be symmetrized for undirected graphs." + ) + + if self.properties.directed: + if symmetrize: + raise ValueError( + "The edgelist can only be symmetrized for undirected graphs." + ) + else: + if symmetrize or symmetrize is None: + unsupported = False + if edge_id is not None or edge_type is not None: + unsupported = True + if isinstance(edge_attr, list): + if len(edge_attr) > 1: + unsupported = True + if unsupported: + raise ValueError( + "Edge list containing Edge Ids or Types can't be symmetrized. " + "If the edges are already symmetric, set the 'symmetrize' " + "flag to False" + ) + + if symmetrize is None: + # default behavior + symmetrize = not self.properties.directed + # Verify column names present in input DataFrame s_col = source d_col = destination @@ -264,45 +294,27 @@ def __from_edgelist( ) raise ValueError("set renumber to True for non integer columns ids") - # The dataframe will be symmetrized iff the graph is undirected - # otherwise the inital dataframe will be returned. Duplicated edges - # will be dropped unless the graph is a MultiGraph(Not Implemented yet) - # TODO: Update Symmetrize to work on Graph and/or DataFrame + # The dataframe will be symmetrized iff the graph is undirected with the + # symmetrize flag set to None or True otherwise, the inital dataframe will + # be returned. If set to False, the API will assume that the edges are already + # symmetric. Duplicated edges will be dropped unless the graph is a + # MultiGraph(Not Implemented yet) + if edge_attr is not None: - source_col, dest_col, value_col = symmetrize( - elist, - source, - destination, - edge_attr, - multi=self.properties.multi_edge, # Deprecated parameter - symmetrize=not self.properties.directed, - ) + value_col = { + self.edgeWeightCol: elist[weight] if weight in edge_attr else None, + self.edgeIdCol: elist[edge_id] if edge_id in edge_attr else None, + self.edgeTypeCol: elist[edge_type] if edge_type in edge_attr else None, + } - if isinstance(value_col, cudf.DataFrame): - value_dict = {} - for i in value_col.columns: - value_dict[i] = value_col[i] - value_col = value_dict else: value_col = None - source_col, dest_col = symmetrize( - elist, - source, - destination, - multi=self.properties.multi_edge, # Deprecated parameter - symmetrize=not self.properties.directed, - ) - - if isinstance(value_col, dict): - value_col = { - self.edgeWeightCol: value_col[weight] if weight in value_col else None, - self.edgeIdCol: value_col[edge_id] if edge_id in value_col else None, - self.edgeTypeCol: value_col[edge_type] - if edge_type in value_col - else None, - } - self.edgelist = simpleGraphImpl.EdgeList(source_col, dest_col, value_col) + # FIXME: if the user calls self.edgelist.edgelist_df after creating a + # symmetric graph, return the symmetric edgelist? + self.edgelist = simpleGraphImpl.EdgeList( + elist[source], elist[destination], value_col + ) if self.batch_enabled: self._replicate_edgelist() @@ -312,6 +324,7 @@ def __from_edgelist( store_transposed=store_transposed, renumber=renumber, drop_multi_edges=not self.properties.multi_edge, + symmetrize=symmetrize, ) def to_pandas_edgelist( @@ -549,13 +562,23 @@ def __from_adjlist( value_col=None, renumber=True, store_transposed=False, + symmetrize=None, ): self.adjlist = simpleGraphImpl.AdjList(offset_col, index_col, value_col) + + if self.properties.directed and symmetrize: + raise ValueError("The edges can only be symmetrized for undirected graphs.") + if value_col is not None: self.properties.weighted = True self._make_plc_graph( - value_col=value_col, store_transposed=store_transposed, renumber=renumber + value_col=value_col, + store_transposed=store_transposed, + renumber=renumber, + symmetrize=not self.properties.directed + if symmetrize is None + else symmetrize, ) if self.batch_enabled: @@ -1146,6 +1169,7 @@ def _make_plc_graph( store_transposed: bool = False, renumber: bool = True, drop_multi_edges: bool = False, + symmetrize: bool = False, ): """ Parameters @@ -1164,6 +1188,8 @@ def _make_plc_graph( int32 or int64 type. drop_multi_edges: bool (default=False) Whether to drop multi edges + symmetrize: bool (default=False) + Whether to symmetrize """ if value_col is None: @@ -1228,6 +1254,7 @@ def _make_plc_graph( do_expensive_check=True, input_array_format=input_array_format, drop_multi_edges=drop_multi_edges, + symmetrize=symmetrize, ) def to_directed(self, DiG, store_transposed=False): @@ -1253,12 +1280,18 @@ def to_directed(self, DiG, store_transposed=False): DiG._make_plc_graph(value_col, store_transposed) def to_undirected(self, G, store_transposed=False): + """ Return an undirected copy of the graph. Note: This will discard any edge ids or edge types but will preserve edge weights if present. """ + # FIXME: Update this function to not call the deprecated + # symmetrize function. + # 1) Import the C++ function that symmetrize a graph + # 2) decompress the edgelist to update 'simpleGraphImpl.EdgeList' + # Doesn't work for edgelists with edge_ids and edge_types. G.properties.renumbered = self.properties.renumbered G.renumber_map = self.renumber_map if self.properties.directed is False: @@ -1268,14 +1301,14 @@ def to_undirected(self, G, store_transposed=False): else: df = self.edgelist.edgelist_df if self.edgelist.weights: - source_col, dest_col, value_col = symmetrize( + source_col, dest_col, value_col = symmetrize_df( df, simpleGraphImpl.srcCol, simpleGraphImpl.dstCol, simpleGraphImpl.edgeWeightCol, ) else: - source_col, dest_col = symmetrize( + source_col, dest_col = symmetrize_df( df, simpleGraphImpl.srcCol, simpleGraphImpl.dstCol ) value_col = None @@ -1310,6 +1343,28 @@ def has_edge(self, u, v): v = tmp["id"][1] df = self.edgelist.edgelist_df + + if self.edgelist.weights: + # FIXME: Update this function to not call the deprecated + # symmetrize function. + source_col, dest_col, value_col = symmetrize_df( + df, + simpleGraphImpl.srcCol, + simpleGraphImpl.dstCol, + simpleGraphImpl.edgeWeightCol, + symmetrize=not self.properties.directed, + ) + else: + source_col, dest_col = symmetrize_df( + df, + simpleGraphImpl.srcCol, + simpleGraphImpl.dstCol, + symmetrize=not self.properties.directed, + ) + value_col = None + + self.edgelist = simpleGraphImpl.EdgeList(source_col, dest_col, value_col) + return ( (df[simpleGraphImpl.srcCol] == u) & (df[simpleGraphImpl.dstCol] == v) ).any() diff --git a/python/cugraph/cugraph/structure/graph_primtypes.pxd b/python/cugraph/cugraph/structure/graph_primtypes.pxd index eaf552195da..f547db5c463 100644 --- a/python/cugraph/cugraph/structure/graph_primtypes.pxd +++ b/python/cugraph/cugraph/structure/graph_primtypes.pxd @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. +# Copyright (c) 2019-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -21,7 +21,7 @@ from libcpp.memory cimport unique_ptr from libcpp.utility cimport pair from libcpp.vector cimport vector from pylibraft.common.handle cimport * -from rmm._lib.device_buffer cimport device_buffer +from rmm.librmm.device_buffer cimport device_buffer cdef extern from "cugraph/legacy/graph.hpp" namespace "cugraph::legacy": diff --git a/python/cugraph/cugraph/structure/graph_primtypes.pyx b/python/cugraph/cugraph/structure/graph_primtypes.pyx index 10f3871e157..063790a33a4 100644 --- a/python/cugraph/cugraph/structure/graph_primtypes.pyx +++ b/python/cugraph/cugraph/structure/graph_primtypes.pyx @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2023, NVIDIA CORPORATION. +# Copyright (c) 2020-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -20,7 +20,7 @@ import numpy as np from libc.stdint cimport uintptr_t from libcpp.utility cimport move -from rmm._lib.device_buffer cimport DeviceBuffer +from rmm.pylibrmm.device_buffer cimport DeviceBuffer from cudf.core.buffer import as_buffer import cudf diff --git a/python/cugraph/cugraph/structure/graph_utilities.pxd b/python/cugraph/cugraph/structure/graph_utilities.pxd index 39e2cdbbff5..5612990c452 100644 --- a/python/cugraph/cugraph/structure/graph_utilities.pxd +++ b/python/cugraph/cugraph/structure/graph_utilities.pxd @@ -21,7 +21,7 @@ from libcpp.memory cimport unique_ptr from libcpp.utility cimport pair from libcpp.vector cimport vector -from rmm._lib.device_buffer cimport device_buffer +from rmm.librmm.device_buffer cimport device_buffer from pylibraft.common.handle cimport handle_t diff --git a/python/cugraph/cugraph/structure/hypergraph.py b/python/cugraph/cugraph/structure/hypergraph.py index bdc98333da0..55e6bbcca3d 100644 --- a/python/cugraph/cugraph/structure/hypergraph.py +++ b/python/cugraph/cugraph/structure/hypergraph.py @@ -37,6 +37,7 @@ import cudf import numpy as np from cugraph.structure.graph_classes import Graph +from cugraph.structure.symmetrize import symmetrize def hypergraph( @@ -277,6 +278,32 @@ def hypergraph( renumber=True, ) + df = cudf.DataFrame() + + # Need to refactor this code as it uses the + # deprecated symmetrize call. + if "weights" in graph.edgelist.edgelist_df: + source_col, dest_col, value_col = symmetrize( + graph.edgelist.edgelist_df, + "src", + "dst", + "weights", + symmetrize=not graph.is_directed(), + ) + + df["src"] = source_col + df["dst"] = dest_col + df["weights"] = value_col + else: + source_col, dest_col = symmetrize( + graph.edgelist.edgelist_df, "src", "dst", symmetrize=not graph.is_directed() + ) + + df["src"] = source_col + df["dst"] = dest_col + + graph.edgelist.edgelist_df = df + return { "nodes": nodes, "edges": edges, diff --git a/python/cugraph/cugraph/structure/property_graph.py b/python/cugraph/cugraph/structure/property_graph.py index 53c1bf778c7..5f55a15888a 100644 --- a/python/cugraph/cugraph/structure/property_graph.py +++ b/python/cugraph/cugraph/structure/property_graph.py @@ -15,6 +15,7 @@ import numpy as np import cugraph +from cugraph.structure.symmetrize import symmetrize from cugraph.utilities.utils import ( import_optional, MissingModule, @@ -2005,6 +2006,33 @@ def edge_props_to_graph( else: G.from_pandas_edgelist(edge_prop_df.reset_index(), **create_args) + # FIXME: Property_graph does not fully leverage the PLC API yet. + # It still relies on the edges being symmetrized by the deprecated + # symmetrize function. + + # Symmetrize the internal representation of the edgelists + + if edge_attr is not None: + source_col, dest_col, value_col = symmetrize( + G.edgelist.edgelist_df, + "src", + "dst", + "weights", + symmetrize=not G.is_directed(), + ) + else: + source_col, dest_col = symmetrize( + G.edgelist.edgelist_df, "src", "dst", symmetrize=not G.is_directed() + ) + + renumbered_edge_prop_df = cudf.DataFrame() + renumbered_edge_prop_df["src"] = source_col + renumbered_edge_prop_df["dst"] = dest_col + if edge_attr: + renumbered_edge_prop_df["weights"] = value_col + + G.edgelist.edgelist_df = renumbered_edge_prop_df + if add_edge_data: # Set the edge_data on the resulting Graph to a DataFrame # containing the edges and the edge ID for each. Edge IDs are diff --git a/python/cugraph/cugraph/structure/symmetrize.py b/python/cugraph/cugraph/structure/symmetrize.py index 3e46d81b6ff..b59661b1cd4 100644 --- a/python/cugraph/cugraph/structure/symmetrize.py +++ b/python/cugraph/cugraph/structure/symmetrize.py @@ -257,6 +257,11 @@ def symmetrize( >>> df['values'] = cudf.Series(M['2']) >>> src, dst, val = symmetrize(df, 'sources', 'destinations', 'values', multi=True) """ + warnings.warn( + "This method is deprecated and will no longer be supported. The symmetrization " + "of the edges are only supported by setting the 'symmetrize' flag to 'True'", + FutureWarning, + ) # FIXME: Redundant check that should be done at the graph creation if "edge_id" in input_df.columns and symmetrize: diff --git a/python/cugraph/cugraph/tests/sampling/test_random_walks_mg.py b/python/cugraph/cugraph/tests/sampling/test_random_walks_mg.py index 2db3c6f5907..34eeb2902f8 100644 --- a/python/cugraph/cugraph/tests/sampling/test_random_walks_mg.py +++ b/python/cugraph/cugraph/tests/sampling/test_random_walks_mg.py @@ -19,8 +19,10 @@ import cugraph import dask_cudf import cugraph.dask as dcg +import cudf from cugraph.testing import SMALL_DATASETS from cugraph.datasets import karate_asymmetric +from cugraph.structure.symmetrize import symmetrize from pylibcugraph.testing.utils import gen_fixture_params_product @@ -205,4 +207,15 @@ def input_graph(request): def test_dask_mg_random_walks(dask_client, input_graph): path_data, seeds, max_depth = calc_random_walks(input_graph) df_G = input_graph.input_df.compute().reset_index(drop=True) - check_random_walks(input_graph, path_data, seeds, max_depth, df_G) + + # FIXME: leverages the deprecated symmetrize call + source_col, dest_col, value_col = symmetrize( + df_G, "src", "dst", "value", symmetrize=not input_graph.is_directed() + ) + + df = cudf.DataFrame() + df["src"] = source_col + df["dst"] = dest_col + df["value"] = value_col + + check_random_walks(input_graph, path_data, seeds, max_depth, df) diff --git a/python/cugraph/cugraph/tests/sampling/test_uniform_neighbor_sample.py b/python/cugraph/cugraph/tests/sampling/test_uniform_neighbor_sample.py index 304ead6fea9..ad0dbe77f7d 100644 --- a/python/cugraph/cugraph/tests/sampling/test_uniform_neighbor_sample.py +++ b/python/cugraph/cugraph/tests/sampling/test_uniform_neighbor_sample.py @@ -21,6 +21,7 @@ from cugraph import uniform_neighbor_sample from cugraph.testing import UNDIRECTED_DATASETS from cugraph.datasets import email_Eu_core, small_tree +from cugraph.structure.symmetrize import symmetrize from pylibcugraph.testing.utils import gen_fixture_params_product @@ -148,6 +149,15 @@ def test_uniform_neighbor_sample_simple(input_combo): # should be 'None' if the datasets was never renumbered input_df = G.edgelist.edgelist_df + # FIXME: Uses the deprecated implementation of symmetrize. + source_col, dest_col = symmetrize( + input_df, "src", "dst", symmetrize=not G.is_directed() + ) + + input_df = cudf.DataFrame() + input_df["src"] = source_col + input_df["dst"] = dest_col + result_nbr = uniform_neighbor_sample( G, input_combo["start_list"], @@ -235,6 +245,19 @@ def test_uniform_neighbor_sample_tree(directed): G = cugraph.Graph(directed=directed) G.from_cudf_edgelist(df, "src", "dst", "value") + # FIXME: Uses the deprecated implementation of symmetrize. + source_col, dest_col, value_col = symmetrize( + G.edgelist.edgelist_df, "src", "dst", "weights", symmetrize=not G.is_directed() + ) + + # Retrieve the input dataframe. + # input_df != df if 'directed = False' because df will be symmetrized + # internally. + input_df = cudf.DataFrame() + input_df["src"] = source_col + input_df["dst"] = dest_col + input_df["value"] = value_col + # # Make sure the old C++ renumbering was skipped because: # 1) Pylibcugraph already does renumbering @@ -245,11 +268,6 @@ def test_uniform_neighbor_sample_tree(directed): assert G.renumbered is False - # Retrieve the input dataframe. - # input_df != df if 'directed = False' because df will be symmetrized - # internally. - input_df = G.edgelist.edgelist_df - # TODO: Incomplete, include more testing for tree graph as well as # for larger graphs start_list = cudf.Series([0, 0], dtype="int32") diff --git a/python/cugraph/cugraph/tests/sampling/test_uniform_neighbor_sample_mg.py b/python/cugraph/cugraph/tests/sampling/test_uniform_neighbor_sample_mg.py index c65535f98a2..4a85b49a66e 100644 --- a/python/cugraph/cugraph/tests/sampling/test_uniform_neighbor_sample_mg.py +++ b/python/cugraph/cugraph/tests/sampling/test_uniform_neighbor_sample_mg.py @@ -27,6 +27,7 @@ from cugraph.dask import uniform_neighbor_sample from cugraph.dask.common.mg_utils import is_single_gpu from cugraph.structure.symmetrize import _memory_efficient_drop_duplicates +from cugraph.structure.symmetrize import symmetrize_ddf from cugraph.datasets import email_Eu_core, small_tree from pylibcugraph.testing.utils import gen_fixture_params_product @@ -144,6 +145,10 @@ def test_mg_uniform_neighbor_sample_simple(dask_client, input_combo): input_df, vertex_col_name, len(workers) ) + input_df = symmetrize_ddf( + input_df, src_name="src", dst_name="dst", symmetrize=not dg.is_directed() + ) + result_nbr = uniform_neighbor_sample( dg, input_combo["start_list"], @@ -247,6 +252,11 @@ def test_mg_uniform_neighbor_sample_tree(dask_client, directed): # input_df != ddf if 'directed = False' because ddf will be symmetrized # internally. input_df = G.input_df + + input_df = symmetrize_ddf( + input_df, src_name="src", dst_name="dst", symmetrize=not G.is_directed() + ) + join = result_nbr.merge( input_df, left_on=[*result_nbr.columns[:2]], right_on=[*input_df.columns[:2]] ) diff --git a/python/cugraph/cugraph/tests/structure/test_graph.py b/python/cugraph/cugraph/tests/structure/test_graph.py index c0524fcfe77..48a0b257b12 100644 --- a/python/cugraph/cugraph/tests/structure/test_graph.py +++ b/python/cugraph/cugraph/tests/structure/test_graph.py @@ -25,6 +25,7 @@ from cugraph.testing import utils from cudf.testing import assert_series_equal from cudf.testing.testing import assert_frame_equal +from cugraph.structure.symmetrize import symmetrize # MG import dask_cudf @@ -534,6 +535,18 @@ def test_to_directed(graph_file): # cugraph add_edge_list G = cugraph.Graph() G.from_cudf_edgelist(cu_M, source="0", destination="1") + + # FIXME: Uses the deprecated implementation of symmetrize. + source_col, dest_col = symmetrize( + G.edgelist.edgelist_df, "src", "dst", symmetrize=not G.is_directed() + ) + + input_df = cudf.DataFrame() + input_df["src"] = source_col + input_df["dst"] = dest_col + + G.edgelist.edgelist_df = input_df + Gnx = nx.from_pandas_edgelist(M, source="0", target="1", create_using=nx.Graph()) DiG = G.to_directed() diff --git a/python/cugraph/cugraph/tests/structure/test_multigraph.py b/python/cugraph/cugraph/tests/structure/test_multigraph.py index a9ea617fdb8..e245894b479 100644 --- a/python/cugraph/cugraph/tests/structure/test_multigraph.py +++ b/python/cugraph/cugraph/tests/structure/test_multigraph.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2023, NVIDIA CORPORATION. +# Copyright (c) 2020-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -76,7 +76,7 @@ def test_Graph_from_MultiGraph(graph_file): G = cugraph.Graph(GM) Gnx = nx.Graph(GnxM) - assert Gnx.number_of_edges() == G.number_of_edges() + assert Gnx.number_of_edges() == G.number_of_edges(directed_edges=True) GdM = graph_file.get_graph(create_using=cugraph.MultiGraph(directed=True)) GnxdM = nx.from_pandas_edgelist( nxM, diff --git a/python/cugraph/cugraph/tests/utils/test_dataset.py b/python/cugraph/cugraph/tests/utils/test_dataset.py index a52b99dabfe..3873cd1c3e4 100644 --- a/python/cugraph/cugraph/tests/utils/test_dataset.py +++ b/python/cugraph/cugraph/tests/utils/test_dataset.py @@ -26,6 +26,7 @@ from cugraph.dask.common.mg_utils import is_single_gpu from cugraph.datasets import karate from cugraph.structure import Graph +from cugraph.structure.symmetrize import symmetrize from cugraph.testing import ( RAPIDS_DATASET_ROOT_DIR_PATH, ALL_DATASETS, @@ -379,6 +380,29 @@ def test_node_and_edge_count(dataset): download=True, create_using=Graph(directed=dataset_is_directed) ) + df = cudf.DataFrame() + if "weights" in G.edgelist.edgelist_df: + source_col, dest_col, value_col = symmetrize( + G.edgelist.edgelist_df, + "src", + "dst", + "weights", + symmetrize=not G.is_directed(), + ) + + df["src"] = source_col + df["dst"] = dest_col + df["weights"] = value_col + else: + source_col, dest_col = symmetrize( + G.edgelist.edgelist_df, "src", "dst", symmetrize=not G.is_directed() + ) + + df["src"] = source_col + df["dst"] = dest_col + + G.edgelist.edgelist_df = df + assert G.number_of_nodes() == dataset.metadata["number_of_nodes"] assert G.number_of_edges() == dataset.metadata["number_of_edges"] diff --git a/python/cugraph/cugraph/utilities/nx_factory.py b/python/cugraph/cugraph/utilities/nx_factory.py index d07d17978d7..794fb33a7a1 100644 --- a/python/cugraph/cugraph/utilities/nx_factory.py +++ b/python/cugraph/cugraph/utilities/nx_factory.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2023, NVIDIA CORPORATION. +# Copyright (c) 2020-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -24,6 +24,8 @@ from cudf import from_pandas from cudf.api.types import is_integer_dtype +from cugraph.structure.symmetrize import symmetrize + # nx will be a MissingModule instance if NetworkX is not installed (any # attribute access on a MissingModule instance results in a RuntimeError). nx = import_optional("networkx") @@ -129,6 +131,17 @@ def convert_from_nx( if is_weighted is False: _gdf = convert_unweighted_to_gdf(nxG, vertex_type) + # FIXME: The legacy algorithms do not support the PLC graph + # hence, the symmetrization cannot be performed at the graph + # creation. Use the deprecated 'symmetrize' function for now. + source_col, dest_col = symmetrize( + _gdf, "src", "dst", symmetrize=not G.is_directed() + ) + + _gdf = cudf.DataFrame() + + _gdf["src"] = source_col + _gdf["dst"] = dest_col G.from_cudf_edgelist( _gdf, source="src", @@ -140,6 +153,18 @@ def convert_from_nx( else: if weight is None: _gdf = convert_weighted_unnamed_to_gdf(nxG, vertex_type) + # FIXME: The legacy algorithms do not support the PLC graph + # hence, the symmetrization cannot be performed at the graph + # creation. Use the deprecated 'symmetrize' function for now. + source_col, dest_col, value_col = symmetrize( + _gdf, "src", "target", "weight", symmetrize=not G.is_directed() + ) + + _gdf = cudf.DataFrame() + + _gdf["src"] = source_col + _gdf["target"] = dest_col + _gdf["weight"] = value_col G.from_cudf_edgelist( _gdf, source="source", @@ -148,8 +173,22 @@ def convert_from_nx( renumber=do_renumber, store_transposed=store_transposed, ) + else: _gdf = convert_weighted_named_to_gdf(nxG, weight, vertex_type) + # FIXME: The legacy algorithms do not support the PLC graph + # hence, the symmetrization cannot be performed at the graph + # creation. Use the deprecated 'symmetrize' function for now. + source_col, dest_col, value_col = symmetrize( + _gdf, "src", "dst", "weight", symmetrize=not G.is_directed() + ) + + _gdf = cudf.DataFrame() + + _gdf["src"] = source_col + _gdf["dst"] = dest_col + _gdf["weight"] = value_col + G.from_cudf_edgelist( _gdf, source="src", diff --git a/python/cugraph/pytest.ini b/python/cugraph/pytest.ini index bca148538d9..2f01a0cc51b 100644 --- a/python/cugraph/pytest.ini +++ b/python/cugraph/pytest.ini @@ -71,3 +71,4 @@ filterwarnings = ignore:This function is deprecated. Batched support for multiple vertices:DeprecationWarning # Called via dask. Not obviously addressable in cugraph. ignore:The behavior of array concatenation with empty entries is deprecated:FutureWarning + ignore:This method is deprecated and will no longer be supported. The symmetrization:FutureWarning diff --git a/python/nx-cugraph/_nx_cugraph/__init__.py b/python/nx-cugraph/_nx_cugraph/__init__.py index 428d266dd2e..fc0bea47180 100644 --- a/python/nx-cugraph/_nx_cugraph/__init__.py +++ b/python/nx-cugraph/_nx_cugraph/__init__.py @@ -36,7 +36,7 @@ "backend_name": "cugraph", "project": "nx-cugraph", "package": "nx_cugraph", - "url": f"https://github.com/rapidsai/cugraph/tree/branch-{_version_major:0>2}.{_version_minor:0>2}/python/nx-cugraph", + "url": f"https://rapids.ai/nx-cugraph", "short_summary": "GPU-accelerated backend.", # "description": "TODO", "functions": { @@ -301,6 +301,45 @@ def get_info(): .lower() == "true", } + + # Enable zero-code change usage with a simple environment variable + # by setting or updating other NETWORKX environment variables. + if os.environ.get("NX_CUGRAPH_AUTOCONFIG", "").strip().lower() == "true": + from itertools import chain + + def update_env_var(varname): + """Add "cugraph" to a list of backend names environment variable.""" + if varname not in os.environ: + os.environ[varname] = "cugraph" + return + string = os.environ[varname] + vals = [ + stripped for x in string.strip().split(",") if (stripped := x.strip()) + ] + if "cugraph" not in vals: + # Should we append or prepend? Let's be first! + os.environ[varname] = ",".join(chain(["cugraph"], vals)) + + # Automatically convert NetworkX Graphs to nx-cugraph for algorithms + if (varname := "NETWORKX_BACKEND_PRIORITY_ALGOS") in os.environ: + # "*_ALGOS" is given priority in NetworkX >=3.4 + update_env_var(varname) + # But update this too to "just work" if users mix env vars and nx versions + os.environ["NETWORKX_BACKEND_PRIORITY"] = os.environ[varname] + else: + update_env_var("NETWORKX_BACKEND_PRIORITY") + # And for older NetworkX versions + update_env_var("NETWORKX_AUTOMATIC_BACKENDS") # For NetworkX 3.2 + update_env_var("NETWORKX_GRAPH_CONVERT") # For NetworkX 3.0 and 3.1 + # Automatically create nx-cugraph Graph from graph generators + update_env_var("NETWORKX_BACKEND_PRIORITY_GENERATORS") + # Run default NetworkX implementation (in >=3.4) if not implemented by nx-cugraph + if (varname := "NETWORKX_FALLBACK_TO_NX") not in os.environ: + os.environ[varname] = "true" + # Cache graph conversions (default is False in NetworkX 3.2 + if (varname := "NETWORKX_CACHE_CONVERTED_GRAPHS") not in os.environ: + os.environ[varname] = "true" + return d diff --git a/python/pylibcugraph/pylibcugraph/_cugraph_c/graph.pxd b/python/pylibcugraph/pylibcugraph/_cugraph_c/graph.pxd index 4247bcc1b2a..497607860bd 100644 --- a/python/pylibcugraph/pylibcugraph/_cugraph_c/graph.pxd +++ b/python/pylibcugraph/pylibcugraph/_cugraph_c/graph.pxd @@ -67,6 +67,7 @@ cdef extern from "cugraph_c/graph.h": bool_t renumber, bool_t drop_self_loops, bool_t drop_multi_edges, + bool_t symmetrize, bool_t check, cugraph_graph_t** graph, cugraph_error_t** error) @@ -117,6 +118,7 @@ cdef extern from "cugraph_c/graph.h": const cugraph_type_erased_device_array_view_t* edge_type_ids, bool_t store_transposed, bool_t renumber, + bool_t symmetrize, bool_t check, cugraph_graph_t** graph, cugraph_error_t** error @@ -173,6 +175,7 @@ cdef extern from "cugraph_c/graph.h": size_t num_arrays, bool_t drop_self_loops, bool_t drop_multi_edges, + bool_t symmetrize, bool_t do_expensive_check, cugraph_graph_t** graph, cugraph_error_t** error) diff --git a/python/pylibcugraph/pylibcugraph/graphs.pyx b/python/pylibcugraph/pylibcugraph/graphs.pyx index def47390ce5..6eda0a83d3e 100644 --- a/python/pylibcugraph/pylibcugraph/graphs.pyx +++ b/python/pylibcugraph/pylibcugraph/graphs.pyx @@ -123,9 +123,17 @@ cdef class SGGraph(_GPUGraph): drop_self_loops : bool, optional (default='False') If true, drop any self loops that exist in the provided edge list. + Not supported for CSR graph. + drop_multi_edges: bool, optional (default='False') If true, drop any multi edges that exist in the provided edge list + Not supported for CSR graph. + + symmetrize: bool, optional (default='False') + If true, symmetrize the edge list + + Examples --------- >>> import pylibcugraph, cupy, numpy @@ -155,7 +163,8 @@ cdef class SGGraph(_GPUGraph): input_array_format="COO", vertices_array=None, drop_self_loops=False, - drop_multi_edges=False): + drop_multi_edges=False, + symmetrize=False): # FIXME: add tests for these if not(isinstance(store_transposed, (int, bool))): @@ -217,6 +226,7 @@ cdef class SGGraph(_GPUGraph): renumber, drop_self_loops, drop_multi_edges, + symmetrize, do_expensive_check, &(self.c_graph_ptr), &error_ptr) @@ -234,6 +244,7 @@ cdef class SGGraph(_GPUGraph): edge_type_view_ptr, store_transposed, renumber, + symmetrize, # drop_self_loops, #FIXME: Not supported yet # drop_multi_edges, #FIXME: Not supported yet do_expensive_check, @@ -325,6 +336,10 @@ cdef class MGGraph(_GPUGraph): drop_multi_edges: bool, optional (default='False') If true, drop any multi edges that exist in the provided edge list + + symmetrize: bool, optional (default='False') + If true, symmetrize the edge list + """ def __cinit__(self, ResourceHandle resource_handle, @@ -339,7 +354,8 @@ cdef class MGGraph(_GPUGraph): vertices_array=None, size_t num_arrays=1, # default value to not break users drop_self_loops=False, - drop_multi_edges=False): + drop_multi_edges=False, + symmetrize=False): if not(isinstance(store_transposed, (int, bool))): raise TypeError("expected int or bool for store_transposed, got " @@ -465,6 +481,7 @@ cdef class MGGraph(_GPUGraph): num_arrays, drop_self_loops, drop_multi_edges, + symmetrize, do_expensive_check, &(self.c_graph_ptr), &error_ptr) diff --git a/readme_pages/pylibcugraph.md b/readme_pages/pylibcugraph.md index 3bb552141e9..fcb5a624931 100644 --- a/readme_pages/pylibcugraph.md +++ b/readme_pages/pylibcugraph.md @@ -4,7 +4,7 @@


-CuGraph pylibcugraph +cuGraph pylibcugraph

Part of [RAPIDS](https://rapids.ai) cuGraph, pylibcugraph is a wrapper around the cuGraph C API. It is aimed more at integrators instead of algorithm writers or end users like Data Scientists. Most of the cuGraph python API uses pylibcugraph to efficiently run algorithms by removing much of the overhead of the python-centric implementation, relying more on cython instead. Pylibcugraph is intended for applications that require a tighter integration with cuGraph at the Python layer with fewer dependencies.