Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocblas: add spack test support #27436

Merged
merged 3 commits into from
Feb 28, 2022
Merged

rocblas: add spack test support #27436

merged 3 commits into from
Feb 28, 2022

Conversation

cgmb
Copy link
Contributor

@cgmb cgmb commented Nov 13, 2021

This change adds support for building the rocblas-test client and teaches spack how to run the fast subset of tests. I also tried to narrow down the cmake, python and rocm dependencies a bit. It takes ages to build Tensile, so I've only tested on 4.3.1 thus far.

rocBLAS also supports using blis as a reference library. That would make the tests significantly faster, but the CMake for using blas is more or less the same for rocblas, hipblas (#27074), rocsolver (#26919) and hipsolver, so I figured I'd start with that.

To give it a spin, try:

spack install --verbose --test=root rocblas@4.3.1

@haampie @srekolam @arjun-raj-kuppala

@cgmb
Copy link
Contributor Author

cgmb commented Nov 13, 2021

It seems that most ROCm math libraries need patches to build the tests properly. Patching every version back to 3.5 seems like unnecessary complexity. I'm just going to try to get 4.3 working, and anything that happens to work before then is a bonus.

I need to reformat the PR, but I'm having a bit of trouble with spack style. It wants to use my system Python, then tries to install py-setuptools, which fails because it doesn't know which compiler the system python was built with.

==> Running mypy checks
[+] /usr (external python-3.6-jt555nuzd7sa2loui3rexa3vjauhu5dj)
==> Installing py-setuptools-58.2.0-h2su7qatjvwtks7drk6zn6axwgbfx7jo
==> No binary for py-setuptools-58.2.0-h2su7qatjvwtks7drk6zn6axwgbfx7jo found: installing from source
Traceback (most recent call last):
  File "<string>", line 3, in <module>
ModuleNotFoundError: No module named 'distutils.sysconfig'
==> Error: KeyError: 'CC'

/home/cgmb/spack/var/spack/repos/builtin/packages/python/package.py:973, in setup_dependent_build_environment:
        970        for compile_var, link_var in [('CC', 'LDSHARED'),
        971                                      ('CXX', 'LDCXXSHARED')]:
        972            # First, we get the values from the sysconfigdata:
  >>    973            config_compile = self.config_vars[compile_var]
        974            config_link = self.config_vars[link_var]
        975
        976            # The dependent environment will have the compilation command set to


@haampie
Copy link
Member

haampie commented Nov 14, 2021

@cgmb what do you think, should we drop older versions of rocm packages at some point? I don't think we're ever going to be able to support them properly unless AMD keeps CI running on machines with old and new drivers...

Dropping a couple of old versions would probably keep things simpler.

@cgmb
Copy link
Contributor Author

cgmb commented Nov 15, 2021

@cgmb what do you think, should we drop older versions of rocm packages at some point?

It's not like I have any authority on this matter, but if it were up to me personally, I would drop anything prior to 4.1.0. That's when target-id was introduced, and it's kind of a dividing line for compiler compatibility.

Spack is currently specifying a 1:1 relationship between the llvm-amdgpu version and the ROCm math library versions, but that's really more of a recommendation than a hard requirement. It depends on what architectures you're targeting, but I think generally you can use llvm-amdgpu@4.1.0: to build any math lib (possibly with a few exceptions).

I suppose that highlights a missing feature in Spack: an understanding of GPU architectures. It should be fine for llvm-andgpu@4.1.0 to build rocBLAS@4.5.0, but that version of LLVM doesn't have support for gfx90a. Spack includes logic for handling that sort of problem with CPU architectures. For example, Spack picks up that I have a Zen 2 processor and my system compiler is gcc@7.5.0, so it chooses %gcc@7.5.0 arch=linux-ubuntu18.04-zen.

Of course, that then underscores that we're still hacking HIP language support in rather than doing it properly. Spack still thinks that rocBLAS is being compiled as %gcc@7.5.0 arch=linux-ubuntu18.04-zen and doesn't recognize that we're actually using the llvm-amdgpu package to provide our compiler. Realistically, it will be a while before HIP compilers are ubiquitous, so ideally we would have proper support in Spack for using a toolchain that was built from source.

Additionally, AMD libraries should start using CMake's HIP language support (from CMake 3.21.3+), and Spack should add explicit support for setting HIPCXX in compilers.yaml (like CC, CXX and FC). The GPU architectures could then be set via CMAKE_HIP_ARCHITECTURES.

That's a rather large chunk of work. I suppose I should maybe see if I can find out if that's already been planned, or if there's someone I can talk to in order to make it happen. We shouldn't be hacking things together forever.

...

That was kind of a tangent, wasn't it? Back to your point. I would probably drop versions before 4.1.0. It takes a couple hours to do a basic build and test of rocBLAS. Doing that three or four times is feasible, but ten is a lot!

@cgmb
Copy link
Contributor Author

cgmb commented Nov 15, 2021

For what it's worth, I'm on spack develop (b16bfe4) and I can't get rocblas@4.0.0 to install anyway.

cgmb@localhost:~/spack$ spack install rocblas@4.0.0 
...
==> rocblas: Executing phase: 'cmake'
==> Error: TypeError: can only concatenate tuple (not "str") to tuple

/home/cgmb/spack/var/spack/repos/builtin/packages/rocblas/package.py:90, in get_gpulist_for_tensile_support:
         87        arch = self.spec.variants['tensile_architecture'].value
         88        if arch[0] == 'all':
         89            if self.spec.satisfies('@:4.0.0'):
  >>     90                arch_value = self.tensile_architecture[1:2] + 'gfx906,gfx908'
         91            elif self.spec.satisfies('@4.1.0:4.2.0'):
         92                arch_value = self.tensile_architecture[1:4]
         93            elif self.spec.satisfies('@4.3.0:'):

If I specify rocblas@4.0.0 tensile_architecture=gfx906:xnack- it fails with:

2 errors found in build log:
     220                                [--embed-library-key EMBEDLIBRARYKEY]
     221                                [--version VERSION] [--generate-manifest-and-exit]
     222                                [--library-format {yaml,msgpack}]
     223                                [--generate-sources-and-exit] [--jobs CPUTHREADS]
     224                                [--verbose PRINTLEVEL]
     225                                LogicPath OutputPath {OCL,HIP,HSA}
  >> 226    TensileCreateLibrary: error: argument --architecture: invalid choice: 'gfx906:xnack-' (choose from 'all', 'gfx000', 'gfx803', 'gfx900', 'gfx906', 'gfx908')
  >> 227    CMake Error at /tmp/cgmb/spack-stage/spack-stage-rocblas-4.0.0-r6sd4aliqvmujfvx3gxu2gynmylgw76p/spack-build-r6sd4al/virtualenv/lib/python3.8/site-packages/Tensile/Source/TensileCreateLibrary
            .cmake:115 (message):
     228      Error generating kernels
     229    Call Stack (most recent call first):
     230      library/src/CMakeLists.txt:55 (TensileCreateLibraryCmake)
     231    
     232    
     233    -- Configuring incomplete, errors occurred!

If I specify rocblas@4.0.0 tensile_architecture=gfx906, it fails with:

cgmb@localhost:~/spack$ spack install rocblas@4.0.0 tensile_architecture=gfx906       
==> Error: invalid values for variant "tensile_architecture" in package "rocblas": ['gfx906']

@srekolam
Copy link
Contributor

i will look into fixing the error with rocblas@4.0.0 in my next PR

@cgmb
Copy link
Contributor Author

cgmb commented Feb 6, 2022

I've rebased against develop and tested 4.5.2, 4.3.1, 4.2.0 and 4.1.0. Please take another look.

@cgmb cgmb requested a review from haampie February 10, 2022 17:47
@cgmb
Copy link
Contributor Author

cgmb commented Feb 10, 2022

@haampie, @srekolam, and @arjun-raj-kuppala could you please review?

@cgmb
Copy link
Contributor Author

cgmb commented Feb 13, 2022

@spackbot run pipeline

@spackbot-app
Copy link

spackbot-app bot commented Feb 13, 2022

I've started that pipeline for you!

@cgmb
Copy link
Contributor Author

cgmb commented Feb 14, 2022

@spackbot run pipeline

@spackbot-app
Copy link

spackbot-app bot commented Feb 14, 2022

I've started that pipeline for you!

@cgmb
Copy link
Contributor Author

cgmb commented Feb 14, 2022

@spackbot run pipeline

@spackbot-app
Copy link

spackbot-app bot commented Feb 14, 2022

I've started that pipeline for you!

@alalazo
Copy link
Member

alalazo commented Feb 14, 2022

@spackbot run pipeline

@spackbot-app
Copy link

spackbot-app bot commented Feb 14, 2022

I've started that pipeline for you!

@cgmb
Copy link
Contributor Author

cgmb commented Feb 14, 2022

@haampie, all tests have passed. Can we get this merge train started?

@cgmb
Copy link
Contributor Author

cgmb commented Feb 23, 2022

@haampie?

@cgmb
Copy link
Contributor Author

cgmb commented Feb 28, 2022

@haampie, is there something you're waiting for?

@haampie haampie merged commit 2bd0168 into spack:develop Feb 28, 2022
@cgmb
Copy link
Contributor Author

cgmb commented Feb 28, 2022

🎉

@cgmb cgmb deleted the rocblas-test branch February 28, 2022 23:27
@haampie
Copy link
Member

haampie commented Mar 1, 2022

Sorry for the delay, too many github notifications on a daily basis 😬

bvanessen pushed a commit to bvanessen/spack that referenced this pull request Mar 2, 2022
* rocblas: add spack test support

* rocblas: limit spack test support to @4.2.0:

* rocblas: simplify define
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants