[FEA] number_of_vertices from edge list #241

afender · 2019-04-23T22:11:35Z

The current implementation computes the adj_list (if not present) in order to return the number of vertices. We should compute and store max(max(src),max(dst)) when there is only the edge list.

The text was updated successfully, but these errors were encountered:

* Add new function to compute the number of vertices from the COO arrays * Add a new data member to the graph structure to store the number of vertices * Update the python binding for getting the number of vertices to use new data/function

@pentschev

The barrier synchronous communication pattern of the RAFT comms allows the senders and receivers to know ahead of time when a message needs to be initiated. Because of this, we only need to place the rank of the sender in 32-bits of the tag for the receiver and we only use UCX endpoints for sending messages while the receiver uses the `ucx_worker` to receive. This is a little different than the fully asynchronous pattern of ucx-py, where an endpoint is created in the connection listener and its reference held in order to send messages asynchronously at a later time. Still, this endpoint's life cycle is also expected to be managed by the user. We are still not entirely sure why this additional endpoint causes issues under some circumstances and not others - for example, we might never encounter an issue with one configuration, while another configuration may fail every single time (such as a timeout, lockup, or explicit error). @pentschev and I tested this change on his configuration on UCX 1.11 w/ the latest dask/distributed and it appears to fix the hang. I have also tested that it runs on UCX 1.9 successfully. In my tests, I run an MNMG nearest neighbors on 50k rows. Below are the configuration options we used: UCX 1.9 ``` export DASK_UCX__CUDA_COPY=True export DASK_UCX__TCP=True export DASK_UCX__NVLINK=True export DASK_UCX__INFINIBAND=True export DASK_UCX__RDMACM=False export DASK_RMM__POOL_SIZE=0.5GB export DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT="100s" export DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP="600s" export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MIN="1s" export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MAX="60s" export DASK_DISTRIBUTED__WORKER__MEMORY__Terminate="False" export DASK_UCX__REUSE_ENDPOINTS=True export UCXPY_IFNAME="ib0" export UCX_NET_DEVICES=all export UCX_MAX_RNDV_RAILS=1 # <-- must be set in the client env too! export DASK_LOGGING__DISTRIBUTED="DEBUG" export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 SCHEDULER_FILE=${SHARED_DIR}/dask-scheduler.json SCHEDULER_ARGS="--protocol ucx --port 8792 --interface ib0 --scheduler-file $SCHEDULER_FILE" WORKER_ARGS="--enable-tcp-over-ucx --enable-nvlink --enable-infiniband --rmm-pool-size=1G --net-devices="ib0" --local-directory /tmp/$LOGNAME --scheduler-file $SCHEDULER_FILE" ``` UCX 1.11 ``` export DASK_UCX__CUDA_COPY=True export DASK_UCX__TCP=True export DASK_UCX__NVLINK=True export DASK_UCX__INFINIBAND=True export DASK_UCX__RDMACM=True export DASK_RMM__POOL_SIZE=0.5GB export DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT="100s" export DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP="600s" export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MIN="1s" export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MAX="60s" export DASK_DISTRIBUTED__WORKER__MEMORY__Terminate="False" export DASK_UCX__REUSE_ENDPOINTS=True export UCXPY_IFNAME="ib0" export UCX_MAX_RNDV_RAILS=1 # <-- must be set in the client env too! export DASK_LOGGING__DISTRIBUTED="DEBUG" export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 SCHEDULER_FILE=${SHARED_DIR}/dask-scheduler.json SCHEDULER_ARGS="--protocol ucx --port 8792 --interface ib0 --scheduler-file $SCHEDULER_FILE" WORKER_ARGS="--enable-tcp-over-ucx --enable-nvlink --enable-infiniband --enable-rdmacm --rmm-pool-size=1G --local-directory /tmp/$LOGNAME --scheduler-file $SCHEDULER_FILE" ``` And for the client: UCX 1.9 ``` initialize(enable_tcp_over_ucx=True, enable_nvlink=True, enable_infiniband=True, enable_rdmacm=False, ) ``` UCX 1.11 ``` initialize(enable_tcp_over_ucx=True, enable_nvlink=True, enable_infiniband=True, enable_rdmacm=True, ) ``` Also tagging @rlratzel Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai/raft#241

afender added feature request New feature or request ? - Needs Triage Need team to review and classify labels Apr 23, 2019

afender added this to the 0.9.0 milestone Apr 23, 2019

afender mentioned this issue Apr 23, 2019

[REVIEW] Bug ext two graphs #239

Merged

BradReesWork removed the ? - Needs Triage Need team to review and classify label May 6, 2019

BradReesWork removed this from the 0.9.0 milestone Jun 17, 2019

BradReesWork assigned ChuckHastings Jul 29, 2019

ChuckHastings mentioned this issue Aug 2, 2019

[REVIEW] FEA number of vertices from edge list #439

Merged

afender closed this as completed in #439 Aug 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] number_of_vertices from edge list #241

[FEA] number_of_vertices from edge list #241

afender commented Apr 23, 2019

[FEA] number_of_vertices from edge list #241

[FEA] number_of_vertices from edge list #241

Comments

afender commented Apr 23, 2019