[REVIEW] Update CTK to CUDA 11.0 #9

mike-wendt · 2020-07-30T15:36:58Z

In addition add NVIDIA EULA and update about section to reflect the contents of this package.

Builds for Linux and Windows
CUDA 11 GA (May 2020) - https://anaconda.org/nvidia/cudatoolkit/files?version=11.0.194
CUDA 11 Update 1 (Aug 2020) - https://anaconda.org/nvidia/cudatoolkit/files?version=11.0.221
Fix run constraints, see comment below

This follows initial work in #7 for the CUDA 11RC

Still need to test and verify all libs are included

There are `10` `11` and `110` versions in the 11.0 installer

Instead of being '11' it is '10' from the installer

Unable to use pre-link scripts as they do not show the msg and instead show a warning about pre-link scripts being removed

jjhelmus · 2020-07-30T17:58:15Z

@mike-wendt Can you rebase now that #6 has been merged.

Also adding __cuda >=11.0 as a run requirement will prevent installation on incompatible system including CentOS 6.

Address merge-conflicts and update changes to work for CUDA 11 * upstream-master: Add override flag fix nonembedded extract fix for embedded image add ppc64le support # Conflicts: # build.py

jjhelmus · 2020-08-07T15:32:41Z

I'm getting errors similar to FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpnydooixm/cuda-toolkit/lib64' when I build this on linux? Should the recipe be looking for files in .../lib rather than ../lib64?

The recent merge seems to have broken the process that was working. CUDA 11 appears to prefer '--toolkit' over '--extract' for CUDA 10.2 and earlier.

mike-wendt · 2020-08-14T20:01:06Z

I'm getting errors similar to FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpnydooixm/cuda-toolkit/lib64' when I build this on linux? Should the recipe be looking for files in .../lib rather than ../lib64?

I think fixing the merge in my latest commit fixes this as I revereted to using --toolkit which I had successfully working before. That said I'm unable to build due to conda/conda-build#4000

@jjhelmus @jakirkham do you have any suggestions. Right now for testing I've been removing it but I know it is needed in the end package that is released.

Attempting to use the changes from ppc64le merge before changing them

Will probably need to add a ppc64le override to 'lib64'

This does not owkr either and causes failures during solve. Need to remove to unblock our CI.

mike-wendt · 2020-08-20T08:13:25Z

These versions do not have any run constraints:

CUDA 11 GA - https://anaconda.org/nvidia/cudatoolkit/files?version=11.0.194
CUDA 11 Update 1 - https://anaconda.org/nvidia/cudatoolkit/files?version=11.0.221

Adding __cuda>=11.0 broke all of our solves in gpuCI, trying to use __glibc>=2.17 also broke when it should solve. I understand the need for a constraint to ensure CentOS 6 users do not pick up this package, but there has to be a way to prevent them getting the package an not breaking existing solves. I'm open to suggestions, but at this point both of those options do not work for us currently.

Error message using __glibc>=2.17:

  - feature:/linux-64::__glibc==2.23=0
  - feature:|@/linux-64::__glibc==2.23=0
  - cudatoolkit=11.0 -> __glibc[version='>=2.17']

Your installed version is: 2.23

Error using __cuda>=11.0:

  - cudatoolkit=11.0 -> __cuda[version='>=11.0']

Your installed version is: not available

Both of these were from docker builds running on a CPU-only node, but building CUDA images. As I mentioned in a commit we need something that works for CPU-only environments as well given we build all of our conda packages on CPU-only nodes with CUDA images.

cc @kkraus14 @jakirkham

jakirkham · 2020-08-20T19:24:38Z

What does conda info say?

mike-wendt · 2020-08-21T14:39:33Z

What does conda info say?

As far as? I've replaced the packages so I don't have any to test without rebuilding them

jakirkham · 2020-08-21T15:07:58Z

Am trying to understand more about the machine where this conflict is showing up. conda info would help with that.

mike-wendt · 2020-08-21T15:22:21Z

Am trying to understand more about the machine where this conflict is showing up. conda info would help with that.

It was running on an ubuntu 18.04 AWS node doing a docker build. The error comes from inside the docker build on any image. This is just one example

Any of the failed CUDA 11 builds for this job have the same __glibc error

mike-wendt · 2020-08-21T15:24:10Z

This is a failing job for __cuda constraint and the full matrix

mike-wendt · 2020-08-21T20:20:20Z

@jjhelmus ready for review and input on the above constraints issues. Thanks

kkraus14 · 2020-08-25T17:44:25Z

@jjhelmus would it be possible for you to review this again in the near future? We're trying to push out the RAPIDS 0.15 release and this update is needed to build CUDA 11 enabled conda packages in conda-forge for things like CuPy.

Is there anything we can do on our end to help reduce the maintenance burden on you?

jjhelmus · 2020-08-25T21:30:07Z

@kkraus14 @mike-wendt This looks good outside of the question on how to constrain the package. I've been able to replicate the build on our machines for linux-64 and am trying a build on our linux-ppc64le machine as well.

I need to do some more testing around the __cuda and __glibc virtual packages. They are not behaving as I expect. Would it be reasonable to include run_constained: _cuda >=11.0 here for the time being and if necessary hotfix this with something different. The existing cudatoolkit packages use constrained to enforce the driver requirement.

jjhelmus · 2020-08-25T21:51:59Z

Was able to confirm this builds fine on linux-ppc64le as well. The proposed change to match the other cudatoolkit packages is:

$ git diff 
diff --git a/meta.yaml b/meta.yaml
index 1f5361f..6ad4d6c 100644
--- a/meta.yaml
+++ b/meta.yaml
@@ -32,8 +32,8 @@ requirements:
     - tqdm
     # for run_exports
     - {{ compiler('cxx') }}
-  #run:
-  #  - __glibc >=2.17 # [linux]
+  run_constrained:
+    - _cuda >=11.0

I have linux packages built with this change that I plan on uploaded to defaults tonight/early tomorrow morning unless there are concerns.

jjhelmus · 2020-08-25T22:03:10Z

mike-wendt#4

mike-wendt · 2020-08-25T22:06:22Z

@jjhelmus this is not in master currently - how did the ppc64le 10.2 pkg get published and have the constraint on the web but the tarball and included meta.yaml not have it?

add run_constrained requirement on __cuda

meta.yaml

Co-authored-by: jakirkham <jakirkham@gmail.com>

jakirkham · 2020-08-25T22:12:21Z

So I thought we were already hotfixing cudatoolkit packages in PR ( AnacondaRecipes/repodata-hotfixes#81 ). Does that already do what we want or should we be doing something different?

jjhelmus · 2020-08-25T22:12:38Z

@jjhelmus this is not in master currently - how did the ppc64le 10.2 pkg get published and have the constraint on the web but the tarball and included meta.yaml not have it?

The constraint gets added via a patch when the packages are indexed. Details are in AnacondaRecipes/repodata-hotfixes#81

jjhelmus · 2020-08-25T22:13:05Z

So I thought we were already hotfixing cudatoolkit packages in PR ( AnacondaRecipes/repodata-hotfixes#81 ). Does that already do what we want or should we be doing something different?

We are but ideally the packages would have the constraint included rather than patched in.

mike-wendt · 2020-08-25T22:13:17Z

So I thought we were already hotfixing cudatoolkit packages in PR ( AnacondaRecipes/repodata-hotfixes#81 ). Does that already do what we want or should we be doing something different?

This is my point on why is this change necessary here when it is obvious it is being added elsewhere. Now I know where.

jakirkham · 2020-08-25T22:18:31Z

Ok, that sounds fine.

On the building point, could you please check whether this ( #9 (comment) ) works, Mike?

mike-wendt · 2020-08-25T22:24:08Z

@jakirkham I have to rebuild this package and then try to build images which is an hour or more of work. Given we're in the middle of a release I don't have that time to troubleshoot this at the moment. If you're both happy with this then I would say merge and publish.

I still have pkgs without the constraint so we won't be impacted but my suspicion is this is a larger issue. From my view an image that is FROM nvidia/cuda and has miniconda installed should work and not fail with this constraint. That being said it looks like I have a workaround so I'm good.

jjhelmus · 2020-08-25T22:28:14Z

With conda 4.8.4 I'm able to build packages from either of these two recipes if CONDA_OVERRIDE_CUDA=11.0 is set the the shell prior to calling conda build:

constrained:

package:
  name: test
  version: 1.0.0
requirements:
  run_constrained:
    - __cuda >=11.0
test:
  commands:
    - echo "Hi"

run:

package:
  name: test
  version: 2.0.0
requirements:
  run:
    - __cuda >=11.0
test:
  commands:
    - echo "Hi"

test-1.0.0 (the constraned version) is not install-able with conda 4.8.4 on system without the CUDA 11 driver but it can with earlier version of conda.

test-2.0.0 is not install-able without the CUDA 11 driver with both conda 4.8.4 or earlier versions.

jakirkham · 2020-08-25T22:31:33Z

No worries @mike-wendt. Just trying to make sure you have a path forward 🙂

jjhelmus · 2020-08-25T22:39:11Z

Merging. linux-64 and linux-ppc64le packages should be available on default tonight. win-64 will need to wait until the end of the week.

mike-wendt added 20 commits June 10, 2020 11:50

ENH Update versions for CUDA 11

3d76c81

Still need to test and verify all libs are included

FIX Add missing libs

54ef3c0

FIX Add nvjpeg to standard config, it is on windows now

5088359

FIX Remove nvgraph

4d8491e

FIX Remove nppicom

f02bba8

FIX Only add accinj64 and cuinj64 on linux

e126a5e

FIX Correct exclusion, cuinj64 is available on windows

f8be5d3

FIX Update globs to match mixture of versions

e52d9a2

There are `10` `11` and `110` versions in the 11.0 installer

FIX Remove extra lib

cb359ba

FIX Remove embedded blob, installer does not have one

ae3acce

FIX Update installer command to match current iteration

82f0ab4

FIX Invoke override to skip environment checks

0d6c1e4

FIX Drop library added in error

4bd015a

FIX Change libdevice glob to match installer version

e299c17

Instead of being '11' it is '10' from the installer

FIX Reset build number to 0

1609219

FIX Add --nox11 for linux installs

382bbe1

ENH Update versions for latest CUDA 11 release

203f27c

ENH Update about section and add NVIDIA EULA

951a2b7

DOC Update description to match what is included

e3f6d7a

ENH Add post-link scripts to show EULA acceptance msg

4a304e3

Unable to use pre-link scripts as they do not show the msg and instead show a warning about pre-link scripts being removed

mike-wendt added 2 commits July 30, 2020 15:02

Merge branch 'upstream-master' into enh-cuda110

f68e05b

Address merge-conflicts and update changes to work for CUDA 11 * upstream-master: Add override flag fix nonembedded extract fix for embedded image add ppc64le support # Conflicts: # build.py

FIX Add run requirement to prevent use on cos6

5694fd1

REV Use --toolkit over --extract

7c3b77b

The recent merge seems to have broken the process that was working. CUDA 11 appears to prefer '--toolkit' over '--extract' for CUDA 10.2 and earlier.

mike-wendt added 4 commits August 14, 2020 17:27

TST Remove __cuda req for test builds

5a38405

FIX Handle cuinj naming differences

2a4a094

REV Switch back to extract

51d6b2c

Attempting to use the changes from ppc64le merge before changing them

FIX Add ability to customize lib folder name

7fab1d5

Will probably need to add a ppc64le override to 'lib64'

mike-wendt force-pushed the enh-cuda110 branch from 880f8d3 to 894ab09 Compare August 20, 2020 07:40

mike-wendt added 2 commits August 20, 2020 03:45

FIX Restrict __glibc constraint to linux

c14652d

FIX Remove __glibc run constraint

0f70af4

This does not owkr either and causes failures during solve. Need to remove to unblock our CI.

mike-wendt force-pushed the enh-cuda110 branch from 0208f0c to 0f70af4 Compare August 20, 2020 09:07

add run_constrained requirement on __cuda

7044b2b

Merge pull request #4 from jjhelmus/enh-cuda110

783001c

add run_constrained requirement on __cuda

jakirkham reviewed Aug 25, 2020

View reviewed changes

meta.yaml Outdated Show resolved Hide resolved

Update meta.yaml

b094a09

Co-authored-by: jakirkham <jakirkham@gmail.com>

jjhelmus merged commit 3310110 into AnacondaRecipes:master Aug 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Update CTK to CUDA 11.0 #9

[REVIEW] Update CTK to CUDA 11.0 #9

mike-wendt commented Jul 30, 2020 •

edited

Loading

jjhelmus commented Jul 30, 2020

jjhelmus commented Aug 7, 2020

mike-wendt commented Aug 14, 2020

mike-wendt commented Aug 20, 2020 •

edited

Loading

jakirkham commented Aug 20, 2020

mike-wendt commented Aug 21, 2020

jakirkham commented Aug 21, 2020

mike-wendt commented Aug 21, 2020 •

edited

Loading

mike-wendt commented Aug 21, 2020

mike-wendt commented Aug 21, 2020

kkraus14 commented Aug 25, 2020

jjhelmus commented Aug 25, 2020 •

edited

Loading

jjhelmus commented Aug 25, 2020

jjhelmus commented Aug 25, 2020

mike-wendt commented Aug 25, 2020

jakirkham commented Aug 25, 2020

jjhelmus commented Aug 25, 2020

jjhelmus commented Aug 25, 2020

mike-wendt commented Aug 25, 2020

jakirkham commented Aug 25, 2020

mike-wendt commented Aug 25, 2020

jjhelmus commented Aug 25, 2020 •

edited

Loading

jakirkham commented Aug 25, 2020

jjhelmus commented Aug 25, 2020

[REVIEW] Update CTK to CUDA 11.0 #9

[REVIEW] Update CTK to CUDA 11.0 #9

Conversation

mike-wendt commented Jul 30, 2020 • edited Loading

jjhelmus commented Jul 30, 2020

jjhelmus commented Aug 7, 2020

mike-wendt commented Aug 14, 2020

mike-wendt commented Aug 20, 2020 • edited Loading

jakirkham commented Aug 20, 2020

mike-wendt commented Aug 21, 2020

jakirkham commented Aug 21, 2020

mike-wendt commented Aug 21, 2020 • edited Loading

mike-wendt commented Aug 21, 2020

mike-wendt commented Aug 21, 2020

kkraus14 commented Aug 25, 2020

jjhelmus commented Aug 25, 2020 • edited Loading

jjhelmus commented Aug 25, 2020

jjhelmus commented Aug 25, 2020

mike-wendt commented Aug 25, 2020

jakirkham commented Aug 25, 2020

jjhelmus commented Aug 25, 2020

jjhelmus commented Aug 25, 2020

mike-wendt commented Aug 25, 2020

jakirkham commented Aug 25, 2020

mike-wendt commented Aug 25, 2020

jjhelmus commented Aug 25, 2020 • edited Loading

jakirkham commented Aug 25, 2020

jjhelmus commented Aug 25, 2020

mike-wendt commented Jul 30, 2020 •

edited

Loading

mike-wendt commented Aug 20, 2020 •

edited

Loading

mike-wendt commented Aug 21, 2020 •

edited

Loading

jjhelmus commented Aug 25, 2020 •

edited

Loading

jjhelmus commented Aug 25, 2020 •

edited

Loading