Skip to content
This repository has been archived by the owner on Aug 23, 2018. It is now read-only.

Compilation fails when run on a server with 122GB of RAM #164

Closed
andreausu opened this issue May 29, 2017 · 8 comments · May be fixed by #179
Closed

Compilation fails when run on a server with 122GB of RAM #164

andreausu opened this issue May 29, 2017 · 8 comments · May be fixed by #179

Comments

@andreausu
Copy link

andreausu commented May 29, 2017

Hello,

we're running into a weird issue on our CI pipeline when we use an AWS server with 122GB of RAM (i3.4xlarge):

./node_modules/.bin/elm-make src/Main.elm 
[                                                  ] - 0 / 2Stack space overflow: current size 99136 bytes.
Use `+RTS -Ksize -RTS' to increase it.
elm-make: thread blocked indefinitely in an MVar operation

Using a i3.2xlarge instance instead that "only" has 61 GB of RAM works just fine.

The build is run inside a docker container so we are 100% sure that the software / environment is identical between the 2 nodes and the underlying OS as well since we're spinning those up using the same AMI and in a fully automated manner.

We've also tried limiting the available RAM via docker-compose configuration to no avail.

Do you have any idea of what's going on or how to debug this further?

Elm version: elm-make 0.18 (Elm Platform 0.18.0)
Base docker image: elixir:1.4.2 (Debian Jessie)

Best,
Andrea

@process-bot
Copy link

Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

@andys8
Copy link

andys8 commented Sep 15, 2017

I'm experiencing the same issue on CI with Jenkins running on a Kubernetes cluster on AWS infrastructure. I can't say for now which EC2 instance type is used.

@alienscience
Copy link

We have also hit this issue when running elm-make from Kubernetes on servers with 120GB RAM:

Stack space overflow: current size 99136 bytes.
Use `+RTS -Ksize -RTS' to increase it.
elm-make: thread blocked indefinitely in an MVar operation

This much memory is not available to elm-make, instead Kubernetes limits the build to 4GB. We have noticed when Kubernetes runs the build on a smaller node (62GB or less), the build succeeds without a stack overflow.

@andys8
Copy link

andys8 commented Oct 2, 2017

Out workaround for now is to build our own version of elm-make from source with the flag -rtsopts. This enables haskell runtime flags at for runtime: -N, -M and -K can be used to adjust CPU and memory.

@evancz
Copy link
Contributor

evancz commented Mar 7, 2018

I think this is the same as elm/compiler#1473 and is related to various oddities in Haskell (e.g. multi-threaded GC and CPU miscounting)

Anyway, there is tons of advice in that other issue, and it is becoming clearer how to work around the Haskell oddities in our binaries.

@evancz evancz closed this as completed Mar 7, 2018
@andys8
Copy link

andys8 commented Mar 7, 2018

@evancz There is no known workaround for the memory issues that is not recompiling the elm compiler. The cpu issue could be solved in the same way, by enabling rtsopts, but it isn't the same issue.

The way I understand it, the merged changes to the node-elm-compiler (rtfeldman/node-elm-compiler#65) are passing flags to the compiler, but will only work with enabled rtsopts which is not the case.

It would be a big enhancement and a simple solution to add the flag by default. Otherwise teams have to recompile the compiler, host it and make it available in ci builds. It makes things hard to promote elm in any way if it starts which a quirky ci setup like this. I would appreciate the changes a lot and it would make the cpu configuration easier, too.

@zwilias
Copy link
Member

zwilias commented Mar 7, 2018

That is probably what will happen.

The gist is that optimal (and in extreme cases like this, workable) RTS settings depend on specifics of the hardware. Potentially, a binary could "self configure" based on this information. If that is not possible in a reasonable timespan, providing sane defaults and the option to override without recompiling the binaries sounds like a good alternative.

As mentioned in this comment, we're looking into improving the situation 👍

@andys8
Copy link

andys8 commented Mar 7, 2018

Thanks for the update regarding the current state.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants