-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement membarrier #267
Comments
Yes please!! Having issues running C# ASPNET Core containers on gvisor because of this:
Any workarounds known? |
As far as I know, coreclr does not require membarrier. It attempts to use it, but has a fallback if it is unsupported: https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L141-L143 Are you certain that membarrier is the problem? These log messages do not necessarily indicate that they are the cause of a failure. Perhaps you can open a new issue with additional information about the failure. |
Sure I’ll open a new issue thanks for the swift response! |
Great idea, I have created a new issue here: #1036 Thank you! |
EDIT: My issue was unrelated in the end - Please feel free to ignore my comment |
Getting this issue when using ffmpeg in GCP cloud run. |
ffmpeg does not require membarrier in any configuration that I am aware of, so I suspect that warning is simply misleading and failures are due to some other reason. It simply tries to use membarrier (emitting this warning), then falls back to another mechanism. Please provide additional logs if you think this really is related to membarrier. |
Updates #267 PiperOrigin-RevId: 278402684
I see the following in the logs when I deploy a simple ASP.NET Core 3.0 app on Cloud Run:
It doesn't seem to cause issues but it makes you think that something is wrong. |
Having the same issue with a NodeJS App. |
Same issue trying to run NodeJS too. Any workarounds or fixes? UPDATE |
Hi can someone advise on how to trace which nodejs module is at fault?
…On Tue 3 Dec 2019, 21:17 Karolis Scerbiakas, ***@***.***> wrote:
Same issue trying to run NodeJS too!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#267?email_source=notifications&email_token=AAGJQUE4M4DDN73HO5EH2QLQW3EFLA5CNFSM4HRUO4U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF227UI#issuecomment-561360849>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJQUBGIHELQLPZHP2BQB3QW3EFLANCNFSM4HRUO4UQ>
.
|
@andresamayadiaz @kscerbiakas @whollacsek Are all of you encountering the issue when using Cloud Run? What exactly are the issues you encounter? Again, it's likely not the lack of membarrier causing the issue but rather something else (or a combination of things). If we can get enough info on the issues folks are encountering we could start narrowing down the issue better. |
Yes I am encountering the issue on Cloud Run with a Next.js based app. I
get the membarrier message followed by a signal 11 error message (crash).
On a previous version of my app there's only the membarrier message and the
app wasn't crashing so I didn't give it my attention but now it's crashing
right on start up of the container. When trying to access the cloud run
instance I get either 500 or 503 http code from Google Frontend. My guess
is that the crash let to a tcp termination between Google Frontend and the
Cloud Run instance.
If you can guide me through how to debug this I might be able to give you
guys more context.
…On Wed 4 Dec 2019, 23:42 Ian Lewis, ***@***.***> wrote:
@andresamayadiaz <https://github.com/andresamayadiaz> @kscerbiakas
<https://github.com/kscerbiakas> @whollacsek
<https://github.com/whollacsek> Are all of you encountering the issue
when using Cloud Run? What exactly are the issues you encounter?
Again, it's likely not the lack of membarrier causing the issue but rather
something else (or a combination of things). If we can get enough info on
the issues folks are encountering we could start narrowing down the issue
better.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#267?email_source=notifications&email_token=AAGJQUGZBJR52VIUOLNFESDQXA54VA5CNFSM4HRUO4U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF65HMI#issuecomment-561894321>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJQUEB6SGHND2QP7GYOXDQXA54VANCNFSM4HRUO4UQ>
.
|
@whollacsek Are you using an Alpine-based Node.JS image? If so, I suspect you are hitting nodejs/docker-node#1158. While that doesn't seem to be a gVisor issue (as it occurs on vanilla AWS VMs), I've been trying to track down the issue. Unfortunately, I haven't been able to reproduce crashes with any of my test apps. Do you (or anyone else experiencing segfaults (signal 11)) have an image you can share that reproduces this issue? Alternatively, any strace logs you can provide would be very helpful.
Some workarounds that may resolve this issue:
|
Thanks for all the suggestions. I'm indeed using node:10-alpine. I'll try
them and report back to you tomorrow.
…On Thu 5 Dec 2019, 17:25 Michael Pratt, ***@***.***> wrote:
@whollacsek <https://github.com/whollacsek> Are you using an Alpine-based
Node.JS image? If so, I suspect you are hitting nodejs/docker-node#1158
<nodejs/docker-node#1158>.
While that doesn't seem to be a gVisor issue (as it occurs on vanilla AWS
VMs), I've been trying to track down the issue. Unfortunately, I haven't
been able to reproduce crashes with any of my test apps.
Do you (or anyone else experiencing segfaults (signal 11)) have an image
you can share that reproduces this issue?
Alternatively, any strace logs you can provide would be very helpful.
- Ideally, use runsc with debugging enabled
<https://gvisor.dev/docs/user_guide/debugging/> and provide the .boot
log file.
- If you can't reproduce with runsc locally, you can install strace
<https://cloud.google.com/run/docs/troubleshooting/tracing-system-calls>
in your image and run with strace on Cloud Run. I think on Alpine it would
be apk update && apk add strace to install strace.
------------------------------
Some workarounds that may resolve this issue:
- Pin to the previous version of the Node base image. Assuming you are
on node:10-alpine, pin to node:10.16.1-alpine.
- Try upgrading to Node 13. In my investigation thus far, Node 13
seems to have several changes that will make it behave better with musl
libc (used by Alpine). It's possible that Node 13 won't encounter issues.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#267?email_source=notifications&email_token=AAGJQUHRAYPNPEBNCOFDMNLQXETQBA5CNFSM4HRUO4U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGBI45I#issuecomment-562204277>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJQUCOKSKWVKKNZ7LJBYTQXETQBANCNFSM4HRUO4UQ>
.
|
@prattmic I have tried to switch images to But one more question: how to improve performance ? I receive too much of |
Switching to node:10-slim do the trick, no more membarrier or sig 11
messages
…On Fri, 6 Dec 2019 at 08:59, Karolis Scerbiakas ***@***.***> wrote:
@prattmic <https://github.com/prattmic> I have tried to switch images to
node:10.17.0-alpine3.9 and node:10-stretch it seems to be working fine.
Fingers crossed. Today I didn't receive any membarriers whatsoever. Thank
you!
But one more question: how to improve performance ? I receive too much of Rate
exceeded any tips?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#267?email_source=notifications&email_token=AAGJQUCAT6VRNRZLJIMY2XLQXIA6ZA5CNFSM4HRUO4U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGDKOJQ#issuecomment-562472742>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJQUBKWYCMMXGCUZIUE73QXIA6ZANCNFSM4HRUO4UQ>
.
|
Any chance either of you could provide strace logs from a crashing instance? nodejs/docker-node#1158 still has no repro or additional information, so anything more we can get will help resolve the underlying issue. |
I'm facing same issue on |
I get this error too when using NodeJS with https://github.com/lovell/sharp I can't get my Cloud Run up at all. Reverted to an earlier version without this node module and now things are running again. Edit: It turned out the issue was caused by node-alpine base image. |
I see that there are many people pointing finger to Alpine. Maybe musl libc, which Alpine requires, had some specific use of membarrier? |
Another user had issues and saw membarrier not supported log messages on Cloud Run. They are using Python which I think should fall back but their base Docker image is alpine. Maybe another case of alpine being the cause? |
Alpine Linux uses musl libc, which does invoke both |
MEMBARRIER_CMD_PRIVATE_EXPEDITED is not implemented by interrupting all threads in the thread group because the actual implementation on Linux interrupts all running threads sharing the MM, which is sufficiently different that doing the wrong thing would risk silently corrupting application memory. Updates #267 PiperOrigin-RevId: 333018403
I vaguely remember that it may not have worked with older versions but can't remember. In this case anyway, the python program was apparently was not taking connections and clients were timing out rather than crashing so that leads me to think it's likely a red herring. |
Updates #267 PiperOrigin-RevId: 333018403
Updates #267 PiperOrigin-RevId: 333018403
Updates #267 PiperOrigin-RevId: 335713923
http://man7.org/linux/man-pages/man2/membarrier.2.html
Note: Very few applications have a hard requirement for
membarrier
. If you encounter a warning about unimplementedmembarrier
, the application most likely attempted to usemembarrier
, triggering the warning, and then fell back to another mechanism.The text was updated successfully, but these errors were encountered: