Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster database loading, faster in-memory hashing #1535

Merged
merged 5 commits into from
Sep 1, 2016

Conversation

tgamblin
Copy link
Member

@tgamblin tgamblin commented Aug 16, 2016

Fixes #1521: slow CLI with many installations.

This reduces an 40-second spack find invocation to 1 second on my machine, and it should scale much better to large numbers of specs than the previous implementation. I believe the bottleneck is reading the YAML file now (which should remain pretty fast), not processing the specs.

Steps taken:

  • Implemented more aggressive hash caching so that we don't re-hash specs repeatedly when we look things up in the database.
  • Implemented better load logic in the database:
    • database is now a proper Merkle DAG -- fewer redundant specs.
    • construction algorithm is much faster and makes three passes instead of redundant descent into dependencies.

@adamjstewart @sknigh @davydden @alalazo @citibeth care to try this out?

@tgamblin tgamblin changed the title Bugfix/faster install db gh1521 Faster database loading, faster in-memory hashing Aug 16, 2016
@citibeth
Copy link
Member

This sounds like great stuff. Not sure when I'll be able to try it out.
But definitely... on my next Spack update, I'll merge in this branch and
give it a whirl!

On Tue, Aug 16, 2016 at 4:22 PM, Todd Gamblin notifications@github.com
wrote:

Fixes #1521 #1521: slow CLI with
many installations.

This reduces an 40-second spack find invocation to 1 second on my
machine, and it should scale much better to large numbers of specs than the
previous implementation. I believe the bottleneck is reading the YAML file
now (which should remain pretty fast), not processing the specs.

Steps taken:

  • Implemented more aggressive hash caching so that we don't re-hash
    specs repeatedly when we look things up in the database.
  • Implemented better load logic in the database:
    • database is now a proper Merkle DAG -- fewer redundant specs.
    • construction algorithm is much faster and makes three passes
      instead of redundant descent into dependencies.

@adamjstewart https://github.com/adamjstewart @sknigh
https://github.com/sknigh @davydden https://github.com/davydden
@alalazo https://github.com/alalazo @citibeth

https://github.com/citibeth care to try this out?

You can view, comment on, or merge this pull request online at:

#1535
Commit Summary

  • Specs now cache result of "fast" in-memory hash.
  • Faster database loading.
  • Add option to copy only certain deptypes to Spec.copy()
  • Update tests to reflect new in-memory hashing vs. coarser dag_hash.

File Changes

  • M lib/spack/spack/database.py (54)
  • M lib/spack/spack/spec.py (70)
  • M lib/spack/spack/test/directory_layout.py (25)

Patch Links:


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1535, or mute the thread
https://github.com/notifications/unsubscribe-auth/AB1cdwTzXCidyBjcRmZvTxFt0RuCGDotks5qghv8gaJpZM4JlzxS
.

@sknigh
Copy link
Contributor

sknigh commented Aug 16, 2016

==> 251 installed packages.
-- linux-centos7-x86_64 / gcc@4.8.5 -----------------------------
R@3.3.0         flex@2.6.0          jdk@8u92-linux-x64    libsigsegv@2.10         openblas@0.2.18          r-R6@2.1.2          r-dbi@0.4-1          r-git2r@0.15.0      r-labeling@0.3        r-mime@0.5          r-png@0.1-7            r-rodbc@1.3-13          r-thdata@1.0-7          readline@6.3
autoconf@2.69   fontconfig@2.11.1   jemalloc@4.1.0        libtiff@4.0.3           openmpi@2.0.0            r-abind@1.4-3       r-devtools@1.11.1    r-glmnet@2.0-5      r-lattice@0.20-33     r-minqa@1.2.4       r-praise@1.0.0         r-roxygen2@5.0.1        r-threejs@0.2.2         ruby@2.2.0
automake@1.15   freetype@2.5.3      jpeg@9b               libtool@2.4.6           openmpi@2.0.0            r-assertthat@0.1    r-diagrammer@0.8.4   r-googlevis@0.6.0   r-lazyeval@0.2.0      r-multcomp@1.4-6    r-proto@0.3-10         r-rpostgresql@0.4-1     r-tibble@1.1            sqlite@3.8.5
binutils@2.25   gcc@6.1.0           lcms@2.6              libxau@1.0.8            openssl@1.0.2h           r-base64enc@0.1-3   r-dichromat@2.0-0    r-gridbase@0.4-7    r-leaflet@1.0.1       r-munsell@0.4.3     r-quantmod@0.4-5       r-rsqlite@1.0.0         r-tidyr@0.5.1           superlu-dist@4.3
bison@3.0.4     gettext@0.19.8.1    libaio@0.3.110-1      libxcb@1.11.1           pango@1.40.1             r-bh@1.60.0-2       r-digest@0.6.9       r-gridextra@2.2.1   r-lme4@1.1-12         r-mvtnorm@1.0-5     r-quantreg@5.26        r-rstan@2.10.1          r-ttr@0.23-1            swig@3.0.10
boost@1.61.0    git@2.8.1           libcerf@1.3           libxml2@2.9.2           parallel@20160422        r-boot@1.3-18       r-doparallel@1.0.10  r-gtable@0.2.0      r-lmtest@0.9-34       r-ncdf4@1.15        r-randomforest@4.6-12  r-rstudioapi@0.6        r-vcd@1.4-1             tar@1.29
boost@1.61.0    glib@2.42.1         libdwarf@20160507     llvm@3.8.0              parmetis@4.0.3           r-brew@1.0-6        r-dplyr@0.5.0        r-gtools@3.5.0      r-lubridate@1.5.6     r-networkd3@0.2.12  r-raster@2.5-8         r-sandwich@2.3-4        r-visnetwork@1.0.1      tcl@8.6.5
bzip2@1.0.6     gmp@6.1.1           libedit@3.1           lua@5.3.2               pcre@8.38                r-car@2.1-2         r-dt@0.1             r-htmltools@0.3.5   r-magic@1.5-6         r-nlme@3.1-128      r-rcolorbrewer@1.1-2   r-scales@0.4.0          r-whisker@0.3-2         tk@8.6.5
cairo@1.14.0    go@1.6.2            libelf@0.8.13         m4@1.4.17               perl@5.24.0              r-caret@6.0-70      r-dygraphs@0.9       r-htmlwidgets@0.6   r-magrittr@1.5        r-nloptr@1.0.4      r-rcpp@0.12.6          r-shiny@0.13.2          r-withr@1.0.1           tmux@2.2
cmake@2.8.10.2  go-bootstrap@1.4.2  libevent@2.0.21       mariadb@10.1.14         petsc@3.6.4              r-chron@2.3-47      r-e1071@1.6-7        r-httpuv@1.3.3      r-mapproj@1.2-4       r-nmf@0.20.6        r-rcppeigen@0.3.2.8.1  r-sp@1.2-3              r-xlconnect@0.2-12      unixodbc@2.3.4
cmake@3.0.2     harfbuzz@0.9.37     libffi@3.2.1          metis@5.1.0             pixman@0.32.6            r-class@7.3-14      r-filehash@2.3       r-httr@1.1.0        r-maps@3.1.1          r-nnet@7.3-12       r-registry@0.3         r-sparsem@1.7           r-xlconnectjars@0.2-12  valgrind@3.11.0
cmake@3.5.2     hdf5@1.10.0-patch1  libhio@1.3.0.1        mpc@1.0.3               pkg-config@0.29.1        r-cluster@2.0.4     r-foreach@1.4.3      r-igraph@1.0.1      r-maptools@0.8-39     r-np@0.60-2         r-reshape2@1.4.1       r-stanheaders@2.10.0-2  r-xlsx@0.5.7            xcb-proto@1.11
curl@7.50.1     hdf5@1.10.0-patch1  libjson-c@0.11        mpfr@3.1.4              postgresql@9.5.3         r-codetools@0.2-14  r-foreign@0.8-66     r-influencer@0.1.0  r-markdown@0.7.7      r-openssl@0.9.4     r-rgooglemaps@1.2.0.7  r-stringi@1.1.1         r-xlsxjars@0.6.1        xorg-util-macros@1.19.0
datamash@1.1.0  hwloc@1.11.3        libmng@2.0.2          mumps@5.0.1             protobuf@2.5.0           r-colorspace@1.2-6  r-gdata@2.17.0       r-inline@0.3.14     r-mass@7.3-45         r-packrat@0.4.7-1   r-rjava@0.9-8          r-stringr@1.0.0         r-xml@3.98-1            xproto@7.0.29
dbus@1.11.2     hwloc@1.11.3        libpciaccess@0.13.4   ncurses@6.0             python@2.7.11            r-crayon@1.3.2      r-geosphere@1.5-5    r-irlba@2.0.0       r-matrix@1.2-6        r-pbkrtest@0.4-6    r-rjson@0.2.15         r-survey@3.30-3         r-xtable@1.8-2          xz@5.2.2
expat@2.1.0     hypre@2.10.1        libpciaccess@0.13.4   netcdf@4.4.1            python@2.7.12            r-cubature@1.1-2    r-ggmap@2.6.1        r-iterators@1.0.8   r-matrixmodels@0.4-1  r-pkgmaker@0.22     r-rjsonio@1.3-0        r-survival@2.39-5       r-xts@0.9-7             zlib@1.2.8
fftw@3.3.4      icu@54.1            libpng@1.6.16         netlib-scalapack@2.0.2  python@3.5.2             r-curl@1.0          r-ggplot2@2.1.0      r-jpeg@0.1-8        r-memoise@1.0.0       r-plotrix@3.6-3     r-rmysql@0.10.9        r-tarifx@1.0.6          r-yaml@2.1.13           zoltan@3.83
flex@2.6.0      isl@0.14            libpthread-stubs@0.3  netlib-scalapack@2.0.2  r-BiocGenerics@bioc-3.3  r-datatable@1.9.6   r-ggvis@0.4.2        r-jsonlite@1.0      r-mgcv@1.8-13         r-plyr@1.8.4        r-rngtools@1.2.4       r-testthat@1.0.2        r-zoo@1.7-13

real    0m1.952s
user    0m1.769s
sys     0m0.181s

This will be really helpful.

@tgamblin tgamblin force-pushed the bugfix/faster-install-db-gh1521 branch from 735f5cb to 7a4ed4c Compare August 16, 2016 20:32
@tgamblin
Copy link
Member Author

I updated with flake8-clean commits.

@sknigh: Glad this works for you! I would be curious to see the top few lines of spack -p find to see where it's spending time.

@sknigh
Copy link
Contributor

sknigh commented Aug 16, 2016

[vagrant@cent7 spack]$ time spack -p find
==> 251 installed packages.
-- linux-centos7-x86_64 / gcc@4.8.5 -----------------------------
R@3.3.0         flex@2.6.0          jdk@8u92-linux-x64    libsigsegv@2.10         openblas@0.2.18          r-R6@2.1.2          r-dbi@0.4-1          r-git2r@0.15.0      r-labeling@0.3        r-mime@0.5          r-png@0.1-7            r-rodbc@1.3-13          r-thdata@1.0-7          readline@6.3
autoconf@2.69   fontconfig@2.11.1   jemalloc@4.1.0        libtiff@4.0.3           openmpi@2.0.0            r-abind@1.4-3       r-devtools@1.11.1    r-glmnet@2.0-5      r-lattice@0.20-33     r-minqa@1.2.4       r-praise@1.0.0         r-roxygen2@5.0.1        r-threejs@0.2.2         ruby@2.2.0
automake@1.15   freetype@2.5.3      jpeg@9b               libtool@2.4.6           openmpi@2.0.0            r-assertthat@0.1    r-diagrammer@0.8.4   r-googlevis@0.6.0   r-lazyeval@0.2.0      r-multcomp@1.4-6    r-proto@0.3-10         r-rpostgresql@0.4-1     r-tibble@1.1            sqlite@3.8.5
binutils@2.25   gcc@6.1.0           lcms@2.6              libxau@1.0.8            openssl@1.0.2h           r-base64enc@0.1-3   r-dichromat@2.0-0    r-gridbase@0.4-7    r-leaflet@1.0.1       r-munsell@0.4.3     r-quantmod@0.4-5       r-rsqlite@1.0.0         r-tidyr@0.5.1           superlu-dist@4.3
bison@3.0.4     gettext@0.19.8.1    libaio@0.3.110-1      libxcb@1.11.1           pango@1.40.1             r-bh@1.60.0-2       r-digest@0.6.9       r-gridextra@2.2.1   r-lme4@1.1-12         r-mvtnorm@1.0-5     r-quantreg@5.26        r-rstan@2.10.1          r-ttr@0.23-1            swig@3.0.10
boost@1.61.0    git@2.8.1           libcerf@1.3           libxml2@2.9.2           parallel@20160422        r-boot@1.3-18       r-doparallel@1.0.10  r-gtable@0.2.0      r-lmtest@0.9-34       r-ncdf4@1.15        r-randomforest@4.6-12  r-rstudioapi@0.6        r-vcd@1.4-1             tar@1.29
boost@1.61.0    glib@2.42.1         libdwarf@20160507     llvm@3.8.0              parmetis@4.0.3           r-brew@1.0-6        r-dplyr@0.5.0        r-gtools@3.5.0      r-lubridate@1.5.6     r-networkd3@0.2.12  r-raster@2.5-8         r-sandwich@2.3-4        r-visnetwork@1.0.1      tcl@8.6.5
bzip2@1.0.6     gmp@6.1.1           libedit@3.1           lua@5.3.2               pcre@8.38                r-car@2.1-2         r-dt@0.1             r-htmltools@0.3.5   r-magic@1.5-6         r-nlme@3.1-128      r-rcolorbrewer@1.1-2   r-scales@0.4.0          r-whisker@0.3-2         tk@8.6.5
cairo@1.14.0    go@1.6.2            libelf@0.8.13         m4@1.4.17               perl@5.24.0              r-caret@6.0-70      r-dygraphs@0.9       r-htmlwidgets@0.6   r-magrittr@1.5        r-nloptr@1.0.4      r-rcpp@0.12.6          r-shiny@0.13.2          r-withr@1.0.1           tmux@2.2
cmake@2.8.10.2  go-bootstrap@1.4.2  libevent@2.0.21       mariadb@10.1.14         petsc@3.6.4              r-chron@2.3-47      r-e1071@1.6-7        r-httpuv@1.3.3      r-mapproj@1.2-4       r-nmf@0.20.6        r-rcppeigen@0.3.2.8.1  r-sp@1.2-3              r-xlconnect@0.2-12      unixodbc@2.3.4
cmake@3.0.2     harfbuzz@0.9.37     libffi@3.2.1          metis@5.1.0             pixman@0.32.6            r-class@7.3-14      r-filehash@2.3       r-httr@1.1.0        r-maps@3.1.1          r-nnet@7.3-12       r-registry@0.3         r-sparsem@1.7           r-xlconnectjars@0.2-12  valgrind@3.11.0
cmake@3.5.2     hdf5@1.10.0-patch1  libhio@1.3.0.1        mpc@1.0.3               pkg-config@0.29.1        r-cluster@2.0.4     r-foreach@1.4.3      r-igraph@1.0.1      r-maptools@0.8-39     r-np@0.60-2         r-reshape2@1.4.1       r-stanheaders@2.10.0-2  r-xlsx@0.5.7            xcb-proto@1.11
curl@7.50.1     hdf5@1.10.0-patch1  libjson-c@0.11        mpfr@3.1.4              postgresql@9.5.3         r-codetools@0.2-14  r-foreign@0.8-66     r-influencer@0.1.0  r-markdown@0.7.7      r-openssl@0.9.4     r-rgooglemaps@1.2.0.7  r-stringi@1.1.1         r-xlsxjars@0.6.1        xorg-util-macros@1.19.0
datamash@1.1.0  hwloc@1.11.3        libmng@2.0.2          mumps@5.0.1             protobuf@2.5.0           r-colorspace@1.2-6  r-gdata@2.17.0       r-inline@0.3.14     r-mass@7.3-45         r-packrat@0.4.7-1   r-rjava@0.9-8          r-stringr@1.0.0         r-xml@3.98-1            xproto@7.0.29
dbus@1.11.2     hwloc@1.11.3        libpciaccess@0.13.4   ncurses@6.0             python@2.7.11            r-crayon@1.3.2      r-geosphere@1.5-5    r-irlba@2.0.0       r-matrix@1.2-6        r-pbkrtest@0.4-6    r-rjson@0.2.15         r-survey@3.30-3         r-xtable@1.8-2          xz@5.2.2
expat@2.1.0     hypre@2.10.1        libpciaccess@0.13.4   netcdf@4.4.1            python@2.7.12            r-cubature@1.1-2    r-ggmap@2.6.1        r-iterators@1.0.8   r-matrixmodels@0.4-1  r-pkgmaker@0.22     r-rjsonio@1.3-0        r-survival@2.39-5       r-xts@0.9-7             zlib@1.2.8
fftw@3.3.4      icu@54.1            libpng@1.6.16         netlib-scalapack@2.0.2  python@3.5.2             r-curl@1.0          r-ggplot2@2.1.0      r-jpeg@0.1-8        r-memoise@1.0.0       r-plotrix@3.6-3     r-rmysql@0.10.9        r-tarifx@1.0.6          r-yaml@2.1.13           zoltan@3.83
flex@2.6.0      isl@0.14            libpthread-stubs@0.3  netlib-scalapack@2.0.2  r-BiocGenerics@bioc-3.3  r-datatable@1.9.6   r-ggvis@0.4.2        r-jsonlite@1.0      r-mgcv@1.8-13         r-plyr@1.8.4        r-rngtools@1.2.4       r-testthat@1.0.2        r-zoo@1.7-13
         3637244 function calls (3570515 primitive calls) in 2.539 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   154237    0.203    0.000    0.221    0.000 reader.py:98(forward)
   220370    0.167    0.000    0.333    0.000 scanner.py:142(need_more_tokens)
   135772    0.138    0.000    1.532    0.000 scanner.py:113(check_token)
    12682    0.130    0.000    0.513    0.000 scanner.py:1276(scan_plain)
   487770    0.120    0.000    0.124    0.000 reader.py:86(peek)
   228319    0.106    0.000    0.132    0.000 scanner.py:276(stale_possible_simple_keys)
    61294    0.087    0.000    0.087    0.000 error.py:6(__init__)
    17145    0.085    0.000    0.429    0.000 parser.py:273(parse_node)
    24903    0.079    0.000    1.114    0.000 scanner.py:153(fetch_more_tokens)
    12682    0.078    0.000    0.195    0.000 scanner.py:1323(scan_plain_spaces)
    24903    0.073    0.000    0.209    0.000 scanner.py:753(scan_to_next_token)
304113/304112    0.064    0.000    0.064    0.000 {isinstance}
    54633    0.056    0.000    1.949    0.000 parser.py:94(check_event)
    61294    0.054    0.000    0.141    0.000 reader.py:113(get_mark)
  17145/2    0.053    0.000    2.145    1.073 composer.py:63(compose_node)
   203416    0.051    0.000    0.051    0.000 scanner.py:261(next_possible_simple_key)
29144/5828    0.050    0.000    0.105    0.000 spec.py:771(traverse_with_deptype)
     7976    0.043    0.000    0.137    0.000 scanner.py:546(fetch_value)
    37525    0.034    0.000    0.102    0.000 scanner.py:132(get_token)
    17145    0.030    0.000    0.066    0.000 constructor.py:55(construct_object)
    12720    0.029    0.000    0.083    0.000 composer.py:88(compose_scalar_node)
    39275    0.029    0.000    0.034    0.000 reader.py:93(prefix)
   2322/2    0.028    0.000    2.145    1.073 composer.py:117(compose_mapping_node)
    17145    0.028    0.000    0.042    0.000 resolver.py:140(resolve)
    10298    0.027    0.000    0.751    0.000 parser.py:427(parse_block_mapping_key)
    14227    0.027    0.000    0.054    0.000 scanner.py:292(save_possible_simple_key)
    34069    0.027    0.000    0.054    0.000 scanner.py:1431(scan_line_break)
   228319    0.025    0.000    0.025    0.000 {method 'keys' of 'dict' objects}

 ...

real    0m2.896s
user    0m2.562s
sys     0m0.329s

@adamjstewart
Copy link
Member

adamjstewart commented Aug 16, 2016

I'll try to test this out later. @sknigh Another thing worth testing is how long spack activate takes on particularly complicated Python packages. The most complicated Python package that I've had to install was py-sncosmo. With around 500 other packages in Spack, spack activate py-sncosmo took me over 2 minutes back in the day. spack deactivate py-sncosmo was around the same.

@tgamblin
Copy link
Member Author

@sknigh: thanks! Looks like at this point, it's spending most of the time in YAML routines. We could optimize that later if needed by having Spack install the C yaml parser and load it when it runs, but I hope this will do for now.

@sknigh
Copy link
Contributor

sknigh commented Aug 16, 2016

@adamjstewart I haven't been using activate, nor have I been using python modules with spack. I can see that load and unload are performing much better.

@sknigh
Copy link
Contributor

sknigh commented Aug 16, 2016

@adamjstewart I tried py-nose. Activate looks faster too

[vagrant@cent7 spack]$ time spack activate py-nose
==> Activated extension py-nose@1.3.7%gcc@4.8.5 arch=linux-centos7-x86_64-yrqxj2i for python@2.7.12~tk~ucs4%gcc@4.8.5

real    0m22.185s
user    0m21.306s
sys     0m0.871s

[vagrant@cent7 spack]$ time spack deactivate py-nose
==> Deactivated extension py-nose@1.3.7%gcc@4.8.5 arch=linux-centos7-x86_64-yrqxj2i for python@2.7.12~tk~ucs4%gcc@4.8.5

real    0m22.219s
user    0m21.484s
sys     0m0.729s

[vagrant@cent7 spack]$ git checkout develop
Switched to branch 'develop'
[vagrant@cent7 spack]$ time spack activate py-nose
==> Activated extension py-nose@1.3.7%gcc@4.8.5 arch=linux-centos7-x86_64-yrqxj2i for python@2.7.12~tk~ucs4%gcc@4.8.5

real    1m53.157s
user    1m41.255s
sys     0m11.810s

@adamjstewart
Copy link
Member

Now that's an improvement!

@sknigh
Copy link
Contributor

sknigh commented Aug 16, 2016

Not sure what to make of this. Perhaps it should be its own issue?

[vagrant@cent7 spack]$ spack reindex
[vagrant@cent7 spack]$ spack install py-sncosmo
==> Installing py-sncosmo
==> Installing py-scipy
==> python is already installed in /home/vagrant/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/python-2.7.12-za2y6a2hi4h364ohxob4jfuxt3f3cn4l
==> Error: Specs py-nose@1.3.7%gcc@4.8.5 arch=linux-centos7-x86_64^bzip2@1.0.6%gcc@4.8.5 arch=linux-centos7-x86_64^ncurses@6.0%gcc@4.8.5 arch=linux-centos7-x86_64^openssl@1.0.2h%gcc@4.8.5 arch=linux-centos7-x86_64^py-setuptools@20.7.0%gcc@4.8.5 arch=linux-centos7-x86_64^python@2.7.12%gcc@4.8.5~tk~ucs4 arch=linux-centos7-x86_64^readline@6.3%gcc@4.8.5 arch=linux-centos7-x86_64^sqlite@3.8.5%gcc@4.8.5 arch=linux-centos7-x86_64^zlib@1.2.8%gcc@4.8.5 arch=linux-centos7-x86_64 
and py-nose@1.3.7%gcc@4.8.5 arch=linux-centos7-x86_64^bzip2@1.0.6%gcc@4.8.5 arch=linux-centos7-x86_64^ncurses@6.0%gcc@4.8.5 arch=linux-centos7-x86_64^openssl@1.0.2h%gcc@4.8.5 arch=linux-centos7-x86_64^python@2.7.12%gcc@4.8.5~tk~ucs4 arch=linux-centos7-x86_64^readline@6.3%gcc@4.8.5 arch=linux-centos7-x86_64^sqlite@3.8.5%gcc@4.8.5 arch=linux-centos7-x86_64^zlib@1.2.8%gcc@4.8.5 arch=linux-centos7-x86_64 have the same SHA-1 prefix!

@adamjstewart
Copy link
Member

Yeah, I see issues like that all the time.

@adamjstewart
Copy link
Member

I'm wondering if explicitly installed packages are being hashed differently than packages that are installed as dependencies?

@adamjstewart
Copy link
Member

To get around it for now, uninstall py-nose and install py-sncosmo.

@tgamblin
Copy link
Member Author

I'm wondering if explicitly installed packages are being hashed differently than packages that are installed as dependencies?

@adamjstewart: they shouldn't be -- that's only recorded in the DB, not in spec.yaml.

@tgamblin
Copy link
Member Author

Not sure what to make of this. Perhaps it should be its own issue?

@sknigh: any chance you can you can get in a state that reproduces the error above, and attach the result of this:

cd <spack root>
tar czf specs.tar.gz opt/spack/*/*/*/.spack/spec.yaml

Then I can try to reproduce this.

@sknigh
Copy link
Contributor

sknigh commented Aug 16, 2016

[vagrant@cent7 spack]$ spack find py-scipy
==> 1 installed packages.
-- linux-centos7-x86_64 / gcc@4.8.5 -----------------------------
py-scipy@0.17.0
[vagrant@cent7 spack]$ spack reindex
[vagrant@cent7 spack]$ spack install py-sncosmo
==> Installing py-sncosmo
==> Error: Specs py-scipy@0.17.0%gcc@4.8.5 arch=linux-centos7-x86_64^bzip2@1.0.6%gcc@4.8.5 arch=linux-centos7-x86_64^ncurses@6.0%gcc@4.8.5 arch=linux-centos7-x86_64^openblas@0.2.18%gcc@4.8.5+fpic~openmp+shared arch=linux-centos7-x86_64^openssl@1.0.2h%gcc@4.8.5 arch=linux-centos7-x86_64^py-nose@1.3.7%gcc@4.8.5 arch=linux-centos7-x86_64^py-numpy@1.11.0%gcc@4.8.5+blas+lapack arch=linux-centos7-x86_64^py-setuptools@20.7.0%gcc@4.8.5 arch=linux-centos7-x86_64^python@2.7.12%gcc@4.8.5~tk~ucs4 arch=linux-centos7-x86_64^readline@6.3%gcc@4.8.5 arch=linux-centos7-x86_64^sqlite@3.8.5%gcc@4.8.5 arch=linux-centos7-x86_64^zlib@1.2.8%gcc@4.8.5 arch=linux-centos7-x86_64 and py-scipy@0.17.0%gcc@4.8.5 arch=linux-centos7-x86_64^bzip2@1.0.6%gcc@4.8.5 arch=linux-centos7-x86_64^ncurses@6.0%gcc@4.8.5 arch=linux-centos7-x86_64^openblas@0.2.18%gcc@4.8.5+fpic~openmp+shared arch=linux-centos7-x86_64^openssl@1.0.2h%gcc@4.8.5 arch=linux-centos7-x86_64^py-numpy@1.11.0%gcc@4.8.5+blas+lapack arch=linux-centos7-x86_64^python@2.7.12%gcc@4.8.5~tk~ucs4 arch=linux-centos7-x86_64^readline@6.3%gcc@4.8.5 arch=linux-centos7-x86_64^sqlite@3.8.5%gcc@4.8.5 arch=linux-centos7-x86_64^zlib@1.2.8%gcc@4.8.5 arch=linux-centos7-x86_64 have the same SHA-1 prefix!

specs.tar.gz

@adamjstewart
Copy link
Member

Is it a build dep problem?

@sknigh
Copy link
Contributor

sknigh commented Aug 17, 2016

Simpler example that only uses about 10 packages.
spack install glib
spack install R
-> R prints SHA-1 error when it tries to resolve glib and quits.

@adamjstewart I don't know how spack generates hashes. One possibility is that the input string is getting truncated down to the package name, causing a collision. It should be virtually impossible for two concrete specs to collide.

spack-SHA-collision.out.zip
specs.tar.gz

@tgamblin tgamblin changed the title Faster database loading, faster in-memory hashing [WIP] Faster database loading, faster in-memory hashing Aug 18, 2016
- Hash causes major slowdown for reading/setting up large DBs

- New version caches hash for concrete specs, which includes all specs in
  the install DB
- use a 3-pass algorithm to load the installed package DAG.

- avoid redundant hashing/comparing on load.
- can now pass these to Spec.copy() and Spec._dup():
  - deps=True
  - deps=False
  - deps=(list of deptypes)

- Makes it easy to filter out only part of a spec.
- Spack currently not hashing build deps (to allow more reuse of packages
  and less frequent re-installing)

- Fast in-memory hash should still hash *all* deptypes, and installed
  specs will only reflect link and run deps.

- We'll revert this when we can concretize more liberally based on what
  is already installed.
- Transaction logic had gotten complicated -- DB would not reindex when
  corrupt, rather the error would be reported (ugh).

- DB will now print the error and force a rebuild when errors are
  detected reading the old databse.
@tgamblin tgamblin force-pushed the bugfix/faster-install-db-gh1521 branch from 7a4ed4c to 69b6815 Compare September 1, 2016 18:41
@tgamblin
Copy link
Member Author

tgamblin commented Sep 1, 2016

@sknigh: I can't replicate this issue at the moment, but i think it is unrelated to this PR. I'm going to merge this and continue trying to figure out how to replicate that one. Do you only see that issue using this PR, or do you see that without it too?

@tgamblin tgamblin merged commit f5bc0cb into develop Sep 1, 2016
@tgamblin tgamblin changed the title [WIP] Faster database loading, faster in-memory hashing Faster database loading, faster in-memory hashing Sep 1, 2016
@tgamblin tgamblin deleted the bugfix/faster-install-db-gh1521 branch October 11, 2016 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants