Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Che IDE locks up #9477

Closed
vseager opened this issue Apr 18, 2018 · 30 comments
Closed

Che IDE locks up #9477

vseager opened this issue Apr 18, 2018 · 30 comments
Assignees
Labels
kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code.

Comments

@vseager
Copy link

vseager commented Apr 18, 2018

Every now and then, the IDE completely locks up and I can't do anything for around 1-2 minutes, and then it will start working again. I have not been able to pinpoint any particular tasks which cause it to lock up, but it can be something as simple as typing in the terminal.

We are running on Che 6.4.0 and developing a Magento 2 website which has a very large codebase.

@davidwindell
Copy link
Contributor

davidwindell commented Apr 18, 2018

This is causing a lot of trouble for our team and people are losing faith in Che as a platform. Is someone able to take this on and get to the bottom of the browser crashing issues? We are happy to share workspaces if that helps.

@vparfonov vparfonov self-assigned this Apr 18, 2018
@vparfonov vparfonov added kind/bug Outline of a bug - must adhere to the bug report template. team/ide labels Apr 18, 2018
@vparfonov
Copy link
Contributor

We will take a look ASAP, however taking into account that there is no good way to reproduce it, it may take a while to fix it.

@davidwindell
Copy link
Contributor

Thank you, we're happy to help where we can. We have ~5 team members using Che IDE remotely. Throughout the day, the browser will completely lock up and refuse to respond for a good minute or longer. The server doesn't seem to be suffering any issues. Our projects and sometimes the files we are open are often quite large (Magento 2 for example and lots of minified JS files) although we're not sure if this relates. It seems to be worse since ~6.1/6.2

@vparfonov
Copy link
Contributor

@vseager and @davidwindell
We tried to work with magento2 sources and got java.lang.OutOfMemoryError: Java heap space.
Possibly problem can comes form jgit then it try to parse big file in magento2/.git dir.

screen shot 2018-04-20 at 15 53 43

profile

We did not detect browser freezing but maybe problem related in some way, try to check you JAVA_OPTS in running workspace.
We will continue to work on this issue for diagnostic browser freezing.

@skabashnyuk
Copy link
Contributor

you can set it with CHE_WORKSPACE_WSAGENT__JAVA__OPTIONS individually in workspace env, or che.env

@davidwindell
Copy link
Contributor

Thanks, we'll give that a shot and update in the next few days.

@vparfonov Whilst we are testing, could you please try opening a large minified file (wget https://code.jquery.com/jquery-2.2.4.min.js). That causes the exact browser freeze the team experience regularly.

@vparfonov
Copy link
Contributor

Yep reproduced on my side too, i think it's old issue related to your create early #5189 and bug in Orion https://bugs.eclipse.org/bugs/show_bug.cgi?id=354435

@davidwindell
Copy link
Contributor

Can you confirm how we would increase the heap size with CHE_WORKSPACE_WSAGENT__JAVA__OPTIONS please?

@skabashnyuk
Copy link
Contributor

@davidwindell you want to do that individually for concrete workspace or for all workspaces in one shot?

@davidwindell
Copy link
Contributor

@skabashnyuk all workspaces as they all run the same size payloads

@gazarenkov gazarenkov added the severity/P1 Has a major impact to usage or development of the system. label Apr 24, 2018
@skabashnyuk
Copy link
Contributor

skabashnyuk commented Apr 25, 2018

@davidwindell add to che.env (or as environment variable of che-master)

CHE_WORKSPACE_WSAGENT__JAVA__OPTIONS=-XX:MaxRAM=1G -XX:MaxRAMFraction=1 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Dsun.zip.disa
bleMemoryMapping=true -Xms50m -Dfile.encoding=UTF8 -Djava.security.egd=file:/dev/./urandom

Adjust MaxRAM to your needs.

Jgit is the cause of high memory usage. It caches git objects and you have quite a large one in your git repository.

2018-04-20 16 03 39

[ 9:23:11]sj:pack[2.2-develop]#: ls -lah
total 984104
drwxr-xr-x  6 sj  staff   192B 25 кві 09:22 .
drwxr-xr-x  4 sj  staff   128B 20 кві 15:46 ..
-r--r--r--  1 sj  staff    66K 25 кві 09:22 pack-745bdf4c71661782796e5923187f25494e70a2f4.idx
-r--r--r--  1 sj  staff   822K 25 кві 09:22 pack-745bdf4c71661782796e5923187f25494e70a2f4.pack
-r--r--r--  1 sj  staff    48M 20 кві 16:00 pack-f25adcf86efc9bfa97f35b8340e6829924801314.idx
-r--r--r--  1 sj  staff   429M 20 кві 16:00 pack-f25adcf86efc9bfa97f35b8340e6829924801314.pack

In other words, jgit trying to put 429M in memory. GC trying aggressively to clean up it. And this repeats over and over again.

@davidfestal
Copy link
Contributor

@skabashnyuk your last comment (#9477 (comment)) didn't mention the right person afaik. You were answering to @davidwindell I assume

@skabashnyuk
Copy link
Contributor

@davidfestal thx. Fixed

@davidwindell
Copy link
Contributor

Thanks, we'll give that a go. For reference however, we don't checkout Magento 2 itself from git (we just have the files in the working tree with a bunch of git ignores).

This is a typical project:

user@5c2938c72af5:/projects/m2test/.git/objects/pack$ ls -lah
total 110M
drwxr-xr-x   2 user root  4.0K Apr 25 10:21 .
drwxr-xr-x 203 user root  4.0K Apr 18 09:58 ..
-r--r--r--   1 user root  5.2K Apr 18 08:30 pack-4510e1839511d60fca507fa08c65620cbf7fe143.idx
-r--r--r--   1 user root  517K Apr 18 08:30 pack-4510e1839511d60fca507fa08c65620cbf7fe143.pack
-r--r--r--   1 user root   27K Dec 13 14:17 pack-4d13ab52c9df036166b67912f3ad8c377061095b.idx
-r--r--r--   1 user root  1.5M Dec 13 14:17 pack-4d13ab52c9df036166b67912f3ad8c377061095b.pack
-r--r--r--   1 user root  108K Nov 14 11:04 pack-619893c1c3a17c2b51bc1db2684f92665277e1cb.idx
-r--r--r--   1 user root  8.7M Apr 12 14:10 pack-619893c1c3a17c2b51bc1db2684f92665277e1cb.pack
-r--r--r--   1 user root  2.3M Feb 13 16:22 pack-ad28cf56e26c5aa545c0cc96eb650bd5487b204c.idx
-r--r--r--   1 user root   78M Feb 13 16:22 pack-ad28cf56e26c5aa545c0cc96eb650bd5487b204c.pack
-r--r--r--   1 user root   14K Apr 13 08:45 pack-b17fadfd3673ce7f935ff2174c4eac5f5b726b50.idx
-r--r--r--   1 user root 1003K Apr 13 08:45 pack-b17fadfd3673ce7f935ff2174c4eac5f5b726b50.pack
-r--r--r--   1 user root   78K Apr 11 16:13 pack-c9d309fa9cd7739f536203ebe75deaa235f47537.idx
-r--r--r--   1 user root  3.8M Apr 11 16:13 pack-c9d309fa9cd7739f536203ebe75deaa235f47537.pack
-r--r--r--   1 user root  189K Mar 26 14:56 pack-cb3ea2fba5d4122bcbcd068e567d45d4bb0e93f9.idx
-r--r--r--   1 user root  8.3M Mar 26 14:56 pack-cb3ea2fba5d4122bcbcd068e567d45d4bb0e93f9.pack
-r--r--r--   1 user root  5.0K Mar 28 13:30 pack-cd8eb3ff5d095e8aa9549fdb8de100f0a37e5a95.idx
-r--r--r--   1 user root  905K Mar 28 13:30 pack-cd8eb3ff5d095e8aa9549fdb8de100f0a37e5a95.pack
-r--r--r--   1 user root   11K Mar 27 15:44 pack-d85e8709dc81c9d142a5505f64faff9b543046f5.idx
-r--r--r--   1 user root  2.8M Mar 27 15:44 pack-d85e8709dc81c9d142a5505f64faff9b543046f5.pack
-r--r--r--   1 user root   24K Apr 25 10:21 pack-e6c384b5d36aa1f6bc1d5fa6619e2c26f4f94151.idx
-r--r--r--   1 user root  2.5M Apr 25 10:21 pack-e6c384b5d36aa1f6bc1d5fa6619e2c26f4f94151.pack

@davidwindell
Copy link
Contributor

Ok, so, now we get:

2018-04-25 11:46:03,279[ted-scheduler-6]  [ERROR] [o.e.c.a.w.s.i.FileTreeWalker 184]    - Error while walking file tree
	at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at org.eclipse.che.api.watcher.server.impl.FileTreeWalker.walk(FileTreeWalker.java:138)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at org.apache.lucene.index.IndexWriter.deleteDocuments(IndexWriter.java:1671)
	at org.eclipse.che.api.search.server.impl.LuceneSearcher.delete(LuceneSearcher.java:431)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:910)
	at org.eclipse.che.api.search.server.impl.LuceneSearcher.delete(LuceneSearcher.java:431)

And it can't even load the project tree at all?

@skabashnyuk
Copy link
Contributor

skabashnyuk commented Apr 25, 2018

can you show your top and ps aux output?
What version of che? make sure you have this fix #8112

@davidwindell
Copy link
Contributor

This is Che 6.4.0 so has the fix above.

top - 12:06:40 up 7 days,  1:42,  0 users,  load average: 5.02, 6.24, 5.47
Tasks:  17 total,   1 running,  16 sleeping,   0 stopped,   0 zombie
%Cpu(s): 42.6 us, 28.2 sy,  0.0 ni, 29.1 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 31399556 total,   765572 free,  6468100 used, 24165884 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 23434628 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  217 user      20   0 4825400 1.098g  17008 S  40.5  3.7  21:49.43 java
    1 user      20   0    4508    804    724 S   0.0  0.0   0:00.03 sh
   12 root      20   0   47008   3688   3260 S   0.0  0.0   0:00.00 sudo
   16 root      20   0   65520   5672   4972 S   0.0  0.0   0:00.00 sshd
   34 user      20   0    6044    696    620 S   0.0  0.0   0:00.00 tail
   35 user      20   0    4508    716    636 S   0.0  0.0   0:00.02 sh
   41 user      20   0   12288   5628   3452 S   0.0  0.0   0:00.78 bootstrapper
   46 user      20   0    4516   1636   1512 S   0.0  0.0   0:00.00 sh
  102 user      20   0   10444   5844   4168 S   0.0  0.0   0:00.20 che-exec-agent
  107 user      20   0    4516   1616   1496 S   0.0  0.0   0:00.00 sh
  155 user      20   0   12788   5680   3972 S   0.0  0.0   0:00.84 che-websocket-t
  161 user      20   0    4516   1640   1508 S   0.0  0.0   0:00.00 sh
  322 user      20   0   21108   5104   3408 S   0.0  0.0   0:00.05 bash
  579 user      20   0   21100   4580   3124 S   0.0  0.0   0:00.01 bash
  597 user      20   0   21100   4592   3132 S   0.0  0.0   0:00.02 bash
  610 user      20   0   21100   4864   3180 S   0.0  0.0   0:00.02 bash
  620 user      20   0   40376   3608   3156 R   0.0  0.0   0:00.02 top
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
user         1  0.0  0.0   4508   804 ?        Ss   11:22   0:00 /bin/sh -c /home/user/edge.sh && tail -f /dev/null
root        12  0.0  0.0  47008  3688 ?        S    11:22   0:00 sudo /usr/sbin/sshd -D
root        16  0.0  0.0  65520  5672 ?        S    11:22   0:00 /usr/sbin/sshd -D
user        34  0.0  0.0   6044   696 ?        S    11:22   0:00 tail -f /dev/null
user        35  0.0  0.0   4508   716 ?        Ss   11:22   0:00 /bin/sh -c /tmp/bootstrapper/bootstrapper -machine-name dev-machine -runtime-id workspacemwofms1amjkityit:default
user        41  0.0  0.0  12288  5628 ?        Sl   11:22   0:00 /tmp/bootstrapper/bootstrapper -machine-name dev-machine -runtime-id workspacemwofms1amjkityit:default:d0c3e1a5-d
user        46  0.0  0.0   4516  1636 ?        Ss   11:22   0:00 /bin/sh -c # # Copyright (c) 2012-2018 Red Hat, Inc. # All rights reserved. This program and the accompanying mat
user       102  0.0  0.0  10444  5844 ?        Sl   11:22   0:00 /home/user/che/exec-agent/che-exec-agent -addr :4412 -cmd /bin/bash -logs-dir /home/user/che/exec-agent/logs
user       107  0.0  0.0   4516  1616 ?        Ss   11:22   0:00 /bin/sh -c # # Copyright (c) 2012-2018 Red Hat, Inc. # All rights reserved. This program and the accompanying mat
user       155  0.0  0.0  12788  5892 ?        Sl   11:22   0:00 /home/user/che/terminal/che-websocket-terminal -addr :4411 -cmd /bin/bash
user       161  0.0  0.0   4516  1640 ?        Ss   11:22   0:00 /bin/sh -c # # Copyright (c) 2012-2018 Red Hat, Inc. # All rights reserved. This program and the accompanying mat
user       217 48.7  3.6 4825400 1151252 ?     Sl   11:22  21:55 /usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -Djava.util.logging.config.file=/home/user/che/ws-agent/conf/loggi
user       322  0.0  0.0  21108  5104 pts/0    Ss+  11:22   0:00 /bin/bash
user       579  0.0  0.0  21100  4580 pts/1    Ss+  11:45   0:00 /bin/bash
user       597  0.0  0.0  21100  4592 pts/2    Ss+  11:46   0:00 /bin/bash
user       610  0.0  0.0  21100  4864 pts/3    Ss   12:06   0:00 /bin/bash
user       621  0.0  0.0  36084  3260 pts/3    R+   12:07   0:00 ps aux

@skabashnyuk
Copy link
Contributor

ps aux output is truncated. can you show me a full arg line for java processes? What if you set -XX:MaxRAM=2G

@skabashnyuk
Copy link
Contributor

Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:910)
at org.eclipse.che.api.search.server.impl.LuceneSearcher.delete(LuceneSearcher.java:431

BTW what are you doing in this moment?

@vparfonov vparfonov added severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code. and removed severity/P1 Has a major impact to usage or development of the system. labels May 11, 2018
@artaleks9
Copy link
Contributor

Still actual for ver.6.5.0

@vparfonov
Copy link
Contributor

@artaleks9 Please check this again

@garagatyi
Copy link

Is it really a blocker for one and half a month?

@vparfonov
Copy link
Contributor

We keep it blocker for mark it as serious problem, but for now we can't fix it

@garagatyi
Copy link

But shouldn't we downgrade it to a P1 severity?
FYI @eivantsov @slemeur

@gazarenkov
Copy link
Contributor

@garagatyi yes, it is legitimate blocker (see the description).

Just for my understanding: is there some real difference for you? :)

@garagatyi
Copy link

Just spotted this issue during my work and think that issues that are not being worked on spoil the idea of the blocker label which is meant to be a sign that someone should work on the issue if it is a blocker.
At least it is my understanding of the label.
And yes, there is a difference for me since I am into the understanding of processes in the project and syncing my understanding of things with what is going on in reality.

@gazarenkov
Copy link
Contributor

@garagatyi
Thanks for your care about the project, really appreciate!

I hope team will do the best to solve this problem as soon as possible (I believe you can imagine it is not that easy).
About the process: Everyone is free to have own opinion about severity but I'd let ppl owning it decide.
(it happened I have the same vote since I can see using Che for day-by-day development here and always consider such a cases as the highest priority and such issues as the most critical even if the application in general is not "blocked").

@slemeur
Copy link
Contributor

slemeur commented Jun 27, 2018

I agree, I would like to see this blocker fixed as soon as possible. But for that, should it start by taking it into our sprints so we can dig into the problem?

It has been taken in one of our last sprint, but it does not seem we had any progress/result on it. Do we have an explanation of what's happening and "why we can't fix it?" cc @vparfonov ?

@jakgsl
Copy link

jakgsl commented Oct 11, 2018

Also having semi-regular full lock ups of the IDE, requiring tab closure and re-opening.

@che-bot
Copy link
Contributor

che-bot commented Sep 7, 2019

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2019
@che-bot che-bot closed this as completed Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code.
Projects
None yet
Development

No branches or pull requests