Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new NVTX module #406

Merged
merged 2 commits into from
Oct 7, 2020
Merged

Conversation

pentschev
Copy link
Member

This is being removed from cuDF in rapidsai/cudf#6413, we should use its own new nvtx package available from conda-forge.

@jakirkham
Copy link
Member

Adding to RAPIDS integration packages with PR ( rapidsai/integration#141 ).

@quasiben
Copy link
Member

quasiben commented Oct 5, 2020

rerun tests

@quasiben
Copy link
Member

quasiben commented Oct 5, 2020

Rerunning tests due to errors in CI like the following:

08:20:13 numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

@pentschev
Copy link
Member Author

We're seeing errors in the test_explicit_comms, such as:

dask_cuda/tests/test_explicit_comms.py::test_dataframe_merge[tcp-pandas-1] Coverage.py warning: --include is ignored because --source is set (include-ignored)
/opt/conda/envs/rapids/lib/python3.8/site-packages/pandas/util/__init__.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing
Coverage.py warning: --include is ignored because --source is set (include-ignored)
Coverage.py warning: --include is ignored because --source is set (include-ignored)
Coverage.py warning: --include is ignored because --source is set (include-ignored)
distributed.worker - WARNING -  Compute Failed
Function:  _run_coroutine_on_worker
args:      (b'I7J\xfc2\x0bA\x0f\xaa1\x07#\xe6G\x93\xc2', <function _dataframe_merge at 0x7f4beb4afca0>, ({0}, [{0: 2}, {0: 1}], [[('from_pandas-c8d33e0e44258c474725a2f75e204e9f', 1), ('from_pandas-c8d33e0e44258c474725a2f75e204e9f', 0)], [('from_pandas-deeca6f237c3f5d251a0eab58252e733', 0)]], ['key'], ['key']))
kwargs:    {}
Exception: AttributeError("'Series' object has no attribute 'merge'")

Process SpawnProcess-3:
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/rapids/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/jenkins/workspace/rapidsai/gpuci-a100/dask-cuda/prb/dask-cuda-gpu-build_2/dask_cuda/tests/test_explicit_comms.py", line 92, in _test_dataframe_merge
    ddf3 = dataframe_merge(ddf1, ddf2, on="key").set_index("key")
  File "/var/lib/jenkins/workspace/rapidsai/gpuci-a100/dask-cuda/prb/dask-cuda-gpu-build_2/dask_cuda/explicit_comms/dataframe_merge.py", line 175, in dataframe_merge
    return comms.default_comms().dataframe_operation(
  File "/var/lib/jenkins/workspace/rapidsai/gpuci-a100/dask-cuda/prb/dask-cuda-gpu-build_2/dask_cuda/explicit_comms/comms.py", line 240, in dataframe_operation
    return utils.dataframes_to_dask_dataframe(ret)
  File "/var/lib/jenkins/workspace/rapidsai/gpuci-a100/dask-cuda/prb/dask-cuda-gpu-build_2/dask_cuda/explicit_comms/utils.py", line 48, in dataframes_to_dask_dataframe
    meta = c.submit(get_meta, dfs[0]).result()
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/client.py", line 225, in result
    raise exc.with_traceback(tb)
  File "/var/lib/jenkins/workspace/rapidsai/gpuci-a100/dask-cuda/prb/dask-cuda-gpu-build_2/dask_cuda/explicit_comms/comms.py", line 56, in _run_coroutine_on_worker
    return executor.submit(_run).result()
  File "/opt/conda/envs/rapids/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/opt/conda/envs/rapids/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/opt/conda/envs/rapids/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/var/lib/jenkins/workspace/rapidsai/gpuci-a100/dask-cuda/prb/dask-cuda-gpu-build_2/dask_cuda/explicit_comms/comms.py", line 53, in _run
    return future.result()
  File "/opt/conda/envs/rapids/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/opt/conda/envs/rapids/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/var/lib/jenkins/workspace/rapidsai/gpuci-a100/dask-cuda/prb/dask-cuda-gpu-build_2/dask_cuda/explicit_comms/dataframe_merge.py", line 92, in _dataframe_merge
    return df1.merge(df2, left_on=left_on, right_on=right_on)
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pandas/core/generic.py", line 5136, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'merge'

TBH, I don't know what's the source, it seems like it could be either serialization or some change in cuDF's/Dask attributes.

@jakirkham @rjzamora @madsbk do any of you know what could be the cause for those errors? It fails for both TCP and UCX, so I don't think it's communication protocol specific either.

@pentschev pentschev mentioned this pull request Oct 5, 2020
@jakirkham
Copy link
Member

I'm able to reproduce the error locally as well. It's not surprising that Series.merge does not exist. One would want DataFrame.merge. However what is more interesting is that a Series shows up here at all. That I don't know how to explain.

@jakirkham
Copy link
Member

Have filed as issue ( #407 ).

@jakirkham
Copy link
Member

rerun tests

@jakirkham
Copy link
Member

jakirkham commented Oct 6, 2020

Test failures on CI appear to be due to issue ( dask/distributed#4153 ).

@jakirkham
Copy link
Member

rerun tests

@codecov-io
Copy link

codecov-io commented Oct 7, 2020

Codecov Report

Merging #406 into branch-0.16 will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.16     #406   +/-   ##
============================================
  Coverage        56.86%   56.86%           
============================================
  Files               19       19           
  Lines             1442     1442           
============================================
  Hits               820      820           
  Misses             622      622           
Impacted Files Coverage Δ
dask_cuda/utils.py 87.31% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93458a4...2d344ea. Read the comment docs.

@jakirkham
Copy link
Member

Needs PR ( #408 )

@jakirkham jakirkham merged commit 04dcbb6 into rapidsai:branch-0.16 Oct 7, 2020
@pentschev pentschev deleted the update-nvml-module branch October 29, 2020 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants