Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{math}[foss/2019b,fosscuda/2019b] Keras v2.3.1, TensorFlow v2.1.0, libgpuarray v0.7.6, Theano v1.0.4, R-keras v2.2.5.0 w/ Python 3.7.4 #9538

Merged
merged 32 commits into from
Mar 1, 2020

Conversation

edmondac
Copy link
Contributor

@edmondac edmondac commented Dec 18, 2019

(created using eb --new-pr)

This depends on #9519 being merged first

…uarray-0.7.6-fosscuda-2019b.eb, Theano-1.0.4-fosscuda-2019b.eb, R-keras-2.2.5.0-fosscuda-2019b-Python-3.7.4-R-3.6.2.eb
@edmondac edmondac changed the title {math}[fosscuda/2019b] Keras v2.3.1, libgpuarray v0.7.6, Theano v1.0.4, ... w/ Python 3.7.4 WIP: {math}[fosscuda/2019b] Keras v2.3.1, libgpuarray v0.7.6, Theano v1.0.4, ... w/ Python 3.7.4 Dec 18, 2019
@edmondac
Copy link
Contributor Author

That patch is to make Bazel compile on my Ubuntu 19.10 machine. It comes from https://github.com/clearlinux-pkgs/bazel/blob/adefd9046582cb52f39579033132e6265ef6ddb0/rename-gettid-functions.patch

@easybuilders easybuilders deleted a comment from boegelbot Jan 11, 2020
@easybuilders easybuilders deleted a comment from boegelbot Jan 11, 2020
@easybuilders easybuilders deleted a comment from boegelbot Jan 11, 2020
@easybuilders easybuilders deleted a comment from boegelbot Jan 11, 2020
@easybuilders easybuilders deleted a comment from boegelbot Jan 11, 2020
@terjekv
Copy link
Collaborator

terjekv commented Feb 18, 2020

Test report by @terjekv
SUCCESS
Build succeeded for 10 out of 10 (8 easyconfigs in this PR)
ninhursaga.uio.no - Linux RHEL 8.1, Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/2faeb8881f5a76da5642a0d65ab7aea3 for a full test report.

@edmondac
Copy link
Contributor Author

Test report by @bear-rsg
SUCCESS
Build succeeded for 9 out of 9 (8 easyconfigs in this PR)
bear-pg0212u15a.bear.cluster - Linux centos linux 7.7.1908, Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, Python 2.7.5
See https://gist.github.com/b6f8e6181b7d47323825530f3d34e6f0 for a full test report.

@boegel
Copy link
Member

boegel commented Feb 20, 2020

Test report by @boegel
FAILED
Build succeeded for 9 out of 11 (8 easyconfigs in this PR)
node3170.skitty.os - Linux centos linux 7.7.1908, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/e25f0dc36f832b6b6af19edd74efb278 for a full test report.

@terjekv
Copy link
Collaborator

terjekv commented Feb 20, 2020

Test report by @boegel
FAILED
Build succeeded for 9 out of 11 (8 easyconfigs in this PR)
node3170.skitty.os - Linux centos linux 7.7.1908, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/e25f0dc36f832b6b6af19edd74efb278 for a full test report.

Checksum failure for R packages? Well I never... Shocking!

@Flamefire
Copy link
Contributor

Can we have an --add-alternate-checksum option to EB? Until then: @edmondac Can you just add that checksum and double-check with deleted archives?

@boegel
Copy link
Member

boegel commented Feb 20, 2020

@edmondac See fixed checksum in #9879

@edmondac
Copy link
Contributor Author

@boegel that should do it, I think

@Flamefire
Copy link
Contributor

Flamefire commented Feb 21, 2020

Running test on our system now.
Edit: Restarted due to FS failure

@edmondac
Copy link
Contributor Author

Test report by @bear-rsg
SUCCESS
Build succeeded for 11 out of 11 (8 easyconfigs in this PR)
bear-pg0212u15a.bear.cluster - Linux centos linux 7.7.1908, Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, Python 2.7.5
See https://gist.github.com/10654d29e0daa0fc88eabfaaaad31e04 for a full test report.

@Flamefire
Copy link
Contributor

Test report by @Flamefire
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in this PR)
taurusi6325.taurus.hrsk.tu-dresden.de - Linux RHEL 7.7, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, Python 2.7.5
See https://gist.github.com/b75473ee2418bec25dc0812ecc6f655d for a full test report.

@edmondac
Copy link
Contributor Author

OK @boegel we think we're happy with this :-)

@terjekv
Copy link
Collaborator

terjekv commented Feb 26, 2020

Test report by @Flamefire
FAILED
Build succeeded for 7 out of 10 (8 easyconfigs in this PR)
taurusml30 - Linux RHEL 7.6, 8335-GTX, Python 2.7.5
See https://gist.github.com/fa43829e112ba485821ce5bda243d52e for a full test report.

Hm, an issue with Java/1.8 on power9?

== 2020-02-26 18:59:06,386 build_log.py:169 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:124 in __init__): Module command 'module load Bazel/0.29.1-GCCcore-8.3.0' failed with exit code 1; stderr: Lmod hat den folgenden Fehler erkannt: Modul konnte nicht geladen werden: ""
     /sw/modules/ml/devel/Bazel/0.29.1-GCCcore-8.3.0.lua: /usr/share/lmod/lmod/libexec/MasterControl.lua:888: attempt to compare number with nil

Executing this command requires loading "Java/1.8" which failed while processing the following module(s):

    Module fullname             Module Filename
    ---------------             ---------------
    Bazel/0.29.1-GCCcore-8.3.0  /sw/modules/ml/devel/Bazel/0.29.1-GCCcore-8.3.0.lua
While processing the following module(s):
    Module fullname             Module Filename
    ---------------             ---------------
    Bazel/0.29.1-GCCcore-8.3.0  /sw/modules/ml/devel/Bazel/0.29.1-GCCcore-8.3.0.lua

; stdout: 
false

@Flamefire
Copy link
Contributor

This seems to be an LMod 7(.8.9) issue where it can't find the proper module alias. Retrying with latest LMod

@terjekv
Copy link
Collaborator

terjekv commented Feb 27, 2020

Oh, yeah, right. That was brought up on Slack I think. Hopefully it's an easy LMod upgrade away from working.

@Flamefire
Copy link
Contributor

Turns out to be a real LMod bug: TACC/Lmod#436

I have a work-around and it's now starting to build TensorFlow. Going to take a while...

@Flamefire
Copy link
Contributor

Test report by @Flamefire
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in this PR)
taurusml13 - Linux RHEL 7.6, 8335-GTX, Python 2.7.5
See https://gist.github.com/6eb4e41b1b52aa648189fd405ca27a15 for a full test report.

@Flamefire
Copy link
Contributor

Ok with the patches for R and the like (see my PRs) this works on Power9. So 👍 from me!

@terjekv
Copy link
Collaborator

terjekv commented Feb 28, 2020

Test report by @terjekv
SUCCESS
Build succeeded for 12 out of 12 (8 easyconfigs in this PR)
ninhursaga.uio.no - Linux RHEL 8.1, Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/1c662b21205fa30e3eabb5b262012246 for a full test report.

@easybuilders easybuilders deleted a comment from boegelbot Mar 1, 2020
@boegel
Copy link
Member

boegel commented Mar 1, 2020

Test report by @boegel
SUCCESS
Build succeeded for 10 out of 10 (8 easyconfigs in this PR)
node3306.joltik.os - Linux centos linux 7.7.1908, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz, Python 3.6.8
See https://gist.github.com/bba1c53c820776a56561ae232c20040d for a full test report.

@easybuilders easybuilders deleted a comment from boegelbot Mar 1, 2020
@boegel boegel modified the milestones: 4.x, next release (4.1.2?) Mar 1, 2020
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel boegel changed the title {math}[fosscuda/2019b] Keras v2.3.1, libgpuarray v0.7.6, Theano v1.0.4, ... w/ Python 3.7.4 {math}[foss/2019b,fosscuda/2019b] Keras v2.3.1, TensorFlow v2.1.0, libgpuarray v0.7.6, Theano v1.0.4, R-keras v2.2.5.0 w/ Python 3.7.4 Mar 1, 2020
@boegel
Copy link
Member

boegel commented Mar 1, 2020

Going in, thanks @edmondac!

@boegel boegel merged commit 9fa5606 into easybuilders:develop Mar 1, 2020
@edmondac edmondac deleted the 20191218094135_new_pr_Keras231 branch June 2, 2020 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants