Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netcdf: forward 'mpi' to 'hdf5' #4672

Closed
wants to merge 2 commits into from

Conversation

alalazo
Copy link
Member

@alalazo alalazo commented Jul 4, 2017

Currently when we ask Spack to build netcdf~mpi what we get back after concretization is:

$ spack spec netcdf~mpi
Input spec
--------------------------------
netcdf~mpi

Normalized
--------------------------------
netcdf~mpi
    ^hdf5
        ^zlib@1.2.5:
    ^m4

Concretized
--------------------------------
netcdf@4.4.1.1%gcc@4.8~cdmremote~dap~hdf4 maxdims=1024 maxvars=8192 ~mpi~parallel-netcdf+shared arch=linux-ubuntu14-x86_64 
    ^hdf5@1.10.1%gcc@4.8+cxx~debug+fortran+mpi+pic+shared~szip~threadsafe arch=linux-ubuntu14-x86_64 
        ^openmpi@2.1.1%gcc@4.8~cuda fabrics= ~java schedulers= ~sqlite3~thread_multiple+vt arch=linux-ubuntu14-x86_64 
            ^hwloc@1.11.7%gcc@4.8~cuda+libxml2+pci arch=linux-ubuntu14-x86_64 
                ^libpciaccess@0.13.5%gcc@4.8 arch=linux-ubuntu14-x86_64 
                    ^libtool@2.4.6%gcc@4.8 arch=linux-ubuntu14-x86_64 
                        ^m4@1.4.18%gcc@4.8+sigsegv arch=linux-ubuntu14-x86_64 
                            ^libsigsegv@2.11%gcc@4.8 arch=linux-ubuntu14-x86_64 
                    ^pkg-config@0.29.2%gcc@4.8+internal_glib arch=linux-ubuntu14-x86_64 
                    ^util-macros@1.19.1%gcc@4.8 arch=linux-ubuntu14-x86_64 
                ^libxml2@2.9.4%gcc@4.8~python arch=linux-ubuntu14-x86_64 
                    ^xz@5.2.3%gcc@4.8 arch=linux-ubuntu14-x86_64 
                    ^zlib@1.2.11%gcc@4.8+pic+shared arch=linux-ubuntu14-x86_64 

that is a netcdf that depends on mpi through hdf5. This PR prevents this particular combination from being possible as it forwards the mpi variant to hdf5. It also adds a conflict check on netcdf+parallel-netcdf~mpi.

…cdf'

Currently when we ask Spack to build `netcdf~mpi` what we get back after
concretization is `netcdf~mpi ^hdf5+mpi`. This commit prevents this
particular combination from being possible as it forwards the variant to hdf5.

It also adds a conflict check on `netcdf+parallel-netcdf~mpi`.
@alalazo alalazo added bug Something isn't working ready labels Jul 4, 2017
@alalazo alalazo requested a review from adamjstewart July 4, 2017 07:06
@adamjstewart
Copy link
Member

I've tried this before and it didn't work very well, hence the workaround. Can't remember the exact issue, but it was due to a concretization bug. I believe it was caused by differences in the default MPI variant for each package.

@alalazo
Copy link
Member Author

alalazo commented Jul 4, 2017

@adamjstewart Maybe it was due to bugs that have been solved in the meanwhile? I tested it manually with spack spec and different variations, for instance:

$ spack spec netcdf+mpi ^hdf5~mpi
Input spec
--------------------------------
netcdf+mpi
    ^hdf5~mpi

Normalized
--------------------------------
==> Error: Invalid spec: 'hdf5~mpi'. Package hdf5 requires variant +mpi, but spec asked for ~mpi

$ spack spec netcdf+mpi ^hdf5+mpi ^mpich
Input spec
--------------------------------
netcdf+mpi
    ^hdf5+mpi
    ^mpich

Normalized
--------------------------------
netcdf+mpi
    ^hdf5+mpi
        ^mpich
        ^zlib@1.2.5:
    ^m4

Concretized
--------------------------------
netcdf@4.4.1.1%gcc@4.8~cdmremote~dap~hdf4 maxdims=1024 maxvars=8192 +mpi~parallel-netcdf+shared arch=linux-ubuntu14-x86_64 
    ^hdf5@1.10.1%gcc@4.8+cxx~debug+fortran+mpi+pic+shared~szip~threadsafe arch=linux-ubuntu14-x86_64 
        ^mpich@3.2%gcc@4.8+hydra+pmi+romio~verbs arch=linux-ubuntu14-x86_64 
        ^zlib@1.2.11%gcc@4.8+pic+shared arch=linux-ubuntu14-x86_64 
    ^m4@1.4.18%gcc@4.8+sigsegv arch=linux-ubuntu14-x86_64 
        ^libsigsegv@2.11%gcc@4.8 arch=linux-ubuntu14-x86_64 

$ spack spec netcdf~mpi ^hdf5+mpi ^mpich
Input spec
--------------------------------
netcdf~mpi
    ^hdf5+mpi
    ^mpich

Normalized
--------------------------------
==> Error: Invalid spec: 'hdf5+mpi'. Package hdf5 requires variant ~mpi, but spec asked for +mpi

but I couldn't make it fail.

@alalazo
Copy link
Member Author

alalazo commented Jul 7, 2017

@adamjstewart What I get with the last commit with respect to the cases in #1553:

$ spack spec netcdf ^hdf5@1.8.16
Input spec
--------------------------------
netcdf
    ^hdf5@1.8.16

Normalized
--------------------------------
netcdf
    ^hdf5@1.8.16
        ^zlib@1.2.5:
    ^m4

Concretized
--------------------------------
netcdf@4.4.1.1%gcc@4.8~cdmremote~dap~hdf4 maxdims=1024 maxvars=8192 +mpi~parallel-netcdf+shared arch=linux-ubuntu14-x86_64 
    ^hdf5@1.8.16%gcc@4.8+cxx~debug+fortran+mpi+pic+shared~szip~threadsafe arch=linux-ubuntu14-x86_64 
        ^openmpi@2.1.1%gcc@4.8~cuda fabrics= ~java schedulers= ~sqlite3~thread_multiple+vt arch=linux-ubuntu14-x86_64 
            ^hwloc@1.11.7%gcc@4.8~cuda+libxml2+pci arch=linux-ubuntu14-x86_64 
                ^libpciaccess@0.13.5%gcc@4.8 arch=linux-ubuntu14-x86_64 
                    ^libtool@2.4.6%gcc@4.8 arch=linux-ubuntu14-x86_64 
                        ^m4@1.4.18%gcc@4.8+sigsegv arch=linux-ubuntu14-x86_64 
                            ^libsigsegv@2.11%gcc@4.8 arch=linux-ubuntu14-x86_64 
                    ^pkg-config@0.29.2%gcc@4.8+internal_glib arch=linux-ubuntu14-x86_64 
                    ^util-macros@1.19.1%gcc@4.8 arch=linux-ubuntu14-x86_64 
                ^libxml2@2.9.4%gcc@4.8~python arch=linux-ubuntu14-x86_64 
                    ^xz@5.2.3%gcc@4.8 arch=linux-ubuntu14-x86_64 
                    ^zlib@1.2.11%gcc@4.8+pic+shared arch=linux-ubuntu14-x86_64 

$ spack spec netcdf+mpi ^mvapich2
Input spec
--------------------------------
netcdf+mpi
    ^mvapich2

Normalized
--------------------------------
netcdf+mpi
    ^hdf5+mpi
        ^mvapich2
            ^bison
                ^m4@1.4.6:
            ^libpciaccess
                ^libtool
                ^pkg-config@0.9.0:
                ^util-macros
        ^zlib@1.2.5:

Concretized
--------------------------------
netcdf@4.4.1.1%gcc@4.8~cdmremote~dap~hdf4 maxdims=1024 maxvars=8192 +mpi~parallel-netcdf+shared arch=linux-ubuntu14-x86_64 
    ^hdf5@1.10.1%gcc@4.8+cxx~debug+fortran+mpi+pic+shared~szip~threadsafe arch=linux-ubuntu14-x86_64 
        ^mvapich2@2.2%gcc@4.8 ch3_rank_bits=32 ~debug fabrics=psm process_managers= threads=multiple arch=linux-ubuntu14-x86_64 
            ^bison@3.0.4%gcc@4.8 arch=linux-ubuntu14-x86_64 
                ^m4@1.4.18%gcc@4.8+sigsegv arch=linux-ubuntu14-x86_64 
                    ^libsigsegv@2.11%gcc@4.8 arch=linux-ubuntu14-x86_64 
            ^libpciaccess@0.13.5%gcc@4.8 arch=linux-ubuntu14-x86_64 
                ^libtool@2.4.6%gcc@4.8 arch=linux-ubuntu14-x86_64 
                ^pkg-config@0.29.2%gcc@4.8+internal_glib arch=linux-ubuntu14-x86_64 
                ^util-macros@1.19.1%gcc@4.8 arch=linux-ubuntu14-x86_64 
        ^zlib@1.2.11%gcc@4.8+pic+shared arch=linux-ubuntu14-x86_64 

@adamjstewart
Copy link
Member

@alalazo This is identical to my original solution in #1553: 4d12c54

However, I ended up backing out that change in 28537ae because I discovered a problem. With the following packages.yaml:

packages:
  hdf5:
    variants: ~mpi

spack spec netcdf crashes. Also, if the default for +mpi doesn't match in any two packages that depend on each other, spack spec crashes for forwarded variants. This is the bug I was referring to, which hasn't yet been fixed. Variant forwarding has always been a nightmare, and I've seen zero work on the new concretizer to solve it.

@alalazo
Copy link
Member Author

alalazo commented Jul 7, 2017

@adamjstewart Hmm, what do you propose to do? The current state seems really broken (asking netcdf~mpi and ending up having mpi through hdf5). I can put a conflict('hdf5+mpi', when='~mpi'), but that may be more annoying than helpful.

@adamjstewart
Copy link
Member

The conflict is basically the same solution I came up with in 28537ae, except that was before conflicts existed. I'm fine with conflicts, but it isn't a whole lot better than the error message you get from variant forwarding. We really do need the ability to add custom error messages to conflicts directives...

I'm also fine with variant forwarding, but I don't want to be the person answering all of the questions as to why it doesn't work. Variant forwarding is a better long-term solution, and @citibeth and I have proposed a couple means of doing it automatically. But again, I've seen no progress on that front in the last 2 years and I'm starting to lose hope that we'll ever see this magical promised concretizer 🙁

@alalazo
Copy link
Member Author

alalazo commented Jul 7, 2017

I'm also fine with variant forwarding, but I don't want to be the person answering all of the questions as to why it doesn't work. Variant forwarding is a better long-term solution, and @citibeth and I have proposed a couple means of doing it automatically. But again, I've seen no progress on that front in the last 2 years and I'm starting to lose hope that we'll ever see this magical promised concretizer

I implemented a form of variant forwarding one year and a half ago in #391, but ended up closing it because it was not well received. If I remember correctly the main issue was the 'local' nature of that implementation, where a package could forward variants only to its direct dependencies. Honestly I would still advocate for that 😄

@citibeth
Copy link
Member

citibeth commented Jul 7, 2017

Currently when we ask Spack to build `netcdf~mpi` what we get back after concretization is:

This is not a problem. You asked for a working NetCDF without MPI capability, and that's what you got. If you want to

Putting in unnecessary forwarding constraints can break concretization in unpredictable ways. Suppose I have a package A that uses non-parallel NetCDF and also paralle HDF5. It would be quite natural to write in package.py:

depends_on('hdf5+mpi')
depends_on('netcdf~mpi')

This should concretize, but it won't with this PR.

If the problem is extra stuff in your DAG that you don't want... well, I do recommend looking at what Spack generates before building it, and removing any "surprises." I don't know of any other way; I don't think we can automagically elimiate "surprises" by adding ad-hoc variant forwarding in random locations. (Or more to the point, we will replace one set of solvable surprises with another set of surprises that are less tractable).

Copy link
Member

@citibeth citibeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend against merging this.

@citibeth
Copy link
Member

citibeth commented Jul 7, 2017 via email

@alalazo
Copy link
Member Author

alalazo commented Jul 8, 2017

A that uses non-parallel NetCDF and also paralle HDF5

Is this made up, or do you have in mind some real use-case?

@citibeth
Copy link
Member

Is this made up, or do you have in mind some real use-case?

Suppose you're building a "Linux distro;" by which I mean, a set of interrelated packages in which there is exactly one installation of each package. This can be done by creating a top-level Bundle package, and listing each item you want in that package. This is the kind of stuff we want to support in Spack Environments.

It's easy to imagine that we might want to include A->netcdf~mpi and B->hdf+mpi in the same software distribution.

@citibeth
Copy link
Member

I experienced a new wrinkle on this one. Try: spack install gda+netcdf. I got:

/tmp/rpfische/spack-stage/spack-stage-uw7q2n/gdal-2.1.2/libgdal.so: undefined reference to `ompi_mpi_cxx_op_intercept'
/tmp/rpfische/spack-stage/spack-stage-uw7q2n/gdal-2.1.2/libgdal.so: undefined reference to `MPI::Datatype::Free()'
/tmp/rpfische/spack-stage/spack-stage-uw7q2n/gdal-2.1.2/libgdal.so: undefined reference to `MPI::Comm::Comm()'
/tmp/rpfische/spack-stage/spack-stage-uw7q2n/gdal-2.1.2/libgdal.so: undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
make[1]: *** [gdalinfo] Error 1
make[1]: Leaving directory `/tmp/rpfische/spack-stage/spack-stage-uw7q2n/gdal-2.1.2/apps'
make: *** [apps-target] Error 2

Funny, since GDAL doesn't use MPI (directly). But it does use:

gdal -> netcdf -> hdf5 -> mpi

Apparently, header files included from HDF5 (via NetCDF) result in an OpenMPI dependency in the compiled GDAL code. The upshot, as far as I can tell, is: If A->...->mpi, then A might require the MPI compiler wrappers.

I have no idea what the right way is to approach this. For now, I'm building:

gdal+netcdf^netcdf~mpi^hdf5~mpi

I just want this to work, and I don't need to link to GDAL. But if I DID need to link to GDAL, I'd be up a creek now (because GDAL doesn't compile with MPI-enabled HDF5, and my stuff NEEDS MPI-enabled HDF5).

@alalazo
Copy link
Member Author

alalazo commented Jul 20, 2017

It's easy to imagine that we might want to include A->netcdf~mpi and B->hdf+mpi in the same software distribution.

That doesn't mean we want to have them in the same DAG for a single application/library (that's what a conflict express). If in an environment one application uses netcdf~mpi and another hdf5+mpi then it's fine, because the two things will be concretized separately.

@citibeth
Copy link
Member

citibeth commented Jul 20, 2017 via email

@adamjstewart
Copy link
Member

I guess the only way to address @citibeth's concern is to answer the following questions:

  1. Is it possible to build netcdf+mpi ^hdf5~mpi?
  2. Is it possible to build netcdf~mpi ^hdf5+mpi?

If the answer to both of these questions is "no", then @alalazo's PR does the correct thing. If the answer to either of these questions is "yes", then @citibeth is correct in that we shouldn't force variant forwarding if it isn't required. Sound good?

@alalazo
Copy link
Member Author

alalazo commented Aug 5, 2017

@adamjstewart I think there's also another question - Do you expect to see mpi in the DAG of foo~mpi?

@adamjstewart
Copy link
Member

@alalazo I guess, when I run spack spec foo ~mpi, I expect to see ~mpi for every dependency, not just the top level dependency. But I also want to be able to override that: spack spec foo ~mpi ^bar+mpi. We need a better concretizer that knows to disable MPI for all dependencies by default. Adding depends_on prevents users from overriding this.

@alalazo
Copy link
Member Author

alalazo commented Aug 5, 2017

@adamjstewart In my opinion the problem is semantic: if I am allowed to do something like foo ~mpi ^bar+mpi then I argue that in foo the variant should not be called mpi, but something else. This just to conform to the principle of least astonishment.

@citibeth
Copy link
Member

citibeth commented Aug 6, 2017 via email

@becker33
Copy link
Member

becker33 commented Aug 7, 2017

@alalazo: A quick tangent on #391.

When #391 was opened @tgamblin and I were both of the opinion that forwarding should be an attribute of a variant, rather than an attribute of a dependency. That was the key issue at play.

I can't speak for Todd, but I am no longer certain of that opinion. I see potential problems with both styles. If we were to go with something like #391, I see the following challenges.

  1. In enhancement proposal : variant forwarding #391 it was impossible to forward a variant "across" a dependency that does not support that variant. For example, if foo depends on bar depends on baz and foo and baz both support an mpi variant, it would be nice to be able to forward +mpi or ~mpi from foo to baz. In enhancement proposal : variant forwarding #391, this would throw an error when bar tried to interpret +mpi as a variant.

There are potential solutions to this, such as searching upwards beyond parents for forwarded variants that could apply. The behavior could become counterintuitive if we're not careful.

  1. We will potentially want to forward lots of variants along a lot of different dependencies. We might need some sort of syntactic sugar to make package files stay readable. There are a lot of ways we could think about addressing this, but I'm not sure what the solutions would look like.

With that said, I'm not opposed to reopening #391 and trying to address these problems. I'm not sure we'll come up with a satisfactory solution, but I'm not opposed to that approach anymore.

@adamjstewart
Copy link
Member

We might need some sort of syntactic sugar to make package files stay readable.

I definitely agree with this. This becomes quite cumbersome:

depends_on('hdf5')
depends_on('hdf5+mpi', when='+mpi')
depends_on('hdf5~mpi', when='~mpi')

It also makes it impossible to build netcdf~mpi ^hdf5+mpi. It would be much nicer if the above logic was implied by:

depends_on('hdf5')

That is, by default, netcdf+mpi adds a dependency on hdf5+mpi, while netcdf~mpi adds a dependency on hdf5~mpi. Then, if you want to explicitly override it, you can use netcdf~mpi ^hdf5+mpi. Is there any reason we can't design the concretizer around dependency forwarding? Right now it feels like we're designing packages around bugs in the concretizer...

@alalazo
Copy link
Member Author

alalazo commented Aug 11, 2017

@becker33 For #391 the solutions I would have adopted to point 1 are either:

  • add the mpi variant to the middle layers (if it makes sense), and forward along the chain
  • make the dependency to which I want to forward the variant a direct dependency (i.e. add an edge to the DAG)

The rationale is that a package should only know about its direct dependencies, and should be able to forward a variant (or an expression involving a variant, maybe?) with a fine level of control only to the dependencies he cares about.

For point 2 I agree completely. I wonder if with some meta-programming we could enable something like:

class Foo(Package):
    ...
    # Define variants 'mpi' and 'debug' here
    ...
    # Forward them to multiple dependencies
    with forwarded_variants('mpi', 'debug'):
        depends_on('a')
        depends_on('b')

to group similar requests together.

@adamjstewart With #391 this:

depends_on('hdf5')
depends_on('hdf5+mpi', when='+mpi')
depends_on('hdf5~mpi', when='~mpi')

would have become:

depends_on('hdf5', forward='mpi')

@alalazo
Copy link
Member Author

alalazo commented Aug 11, 2017

@becker33 I forgot: I would be keen on reopening #391 if we find a way to solve the two issues you mentioned, and if @tgamblin agrees. I was wondering how that would play with the refactoring of the concretizer that is currently on-going.

@adamjstewart
Copy link
Member

I still vote for forwarding all variants to all dependencies by default, with the ability to override them in the spec if you need to. We can handle the work once in the concretizer instead of in every single package file.

@alalazo
Copy link
Member Author

alalazo commented Aug 11, 2017

@adamjstewart At a first glance, with what you propose I see no clear way to use an expression involving a variant selectively. Something like:

class Foo(Package)
    variant('openmp', ...)
    ....
    depends_on('lapack', forward=negate('openmp'))
    ...

to mean that I want to forward to lapack the opposite of openmp in this package - if I activate openmp here I deactivate it in the dependency, and vice-versa.

@adamjstewart
Copy link
Member

@alalazo I didn't think about that case, but perhaps:

depends_on('foo')
depends_on('foo+mpi', when='~mpi')
depends_on('foo~mpi', when='+mpi')

would work in the rare case that this needs to be the case? (I have never looked at the concretizer so I have no idea how much work any of this would be).

@citibeth
Copy link
Member

citibeth commented Aug 11, 2017 via email

@adamjstewart
Copy link
Member

Really... what is wrong with setting all: +mpi? Would that not solve the problems in this thread?

packages.yaml is great if you only need to build a single configuration of every package, but many people (myself included) need to build an entire stack of software with and without MPI (with 3 different MPIs, 4 different compilers, etc.).

As an aside, I think that commonly-used variants should default the same way for all packages that use them.

There are pros and cons of this. The only reason I enforced all packages using +mpi and ~X is because our concretization was so broken. It would be nice if it could be up to the package to define which options are most commonly used. But it's also nice not to have to worry about what the default is, although you wouldn't know there was an mpi variant unless you ran spack info, which shows the default.

@citibeth
Copy link
Member

citibeth commented Aug 11, 2017 via email

@alalazo alalazo closed this Nov 7, 2017
@alalazo alalazo deleted the fixes/netcdf branch November 7, 2017 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants