Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

progress: add log limits and clipping #1754

Merged
merged 4 commits into from
Nov 6, 2020

Conversation

tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented Oct 26, 2020

fixes #1539

If the build produces a lot of logs it can slow down the build (especially on remote client) and reach grpc message size limit. Add some reasonable limits per process: 1MB max output and 100KB/s. If the process reaches this limit the progress output will be clipped. The limits are configurable with env if anyone wants to abuse the logic.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Same defaults already used on buildkitd side

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@hinshun
Copy link
Collaborator

hinshun commented Oct 26, 2020

When output is clipped, will the full logs be retrievable anywhere?

@tonistiigi
Copy link
Member Author

@hinshun Not atm. This is mostly for cases where you would do dd if=/dev/urandom of=/dev/stdout or when running a program that prints out tty codes in a loop without checking that tty is not available. I don't think regular use-cases will hit this.

Could keep the logs somewhere else but managing how their storage is managed will become a very complicated problem that almost nobody will ever need. If there is a special case the env overrides can be used.

@tonistiigi tonistiigi added this to the v0.8.0 milestone Nov 2, 2020
@tiborvass
Copy link
Collaborator

@tonistiigi shouldn't we add a client-side warning in the progress, in addition to the last log line? Could be a followup

@tonistiigi
Copy link
Member Author

Not sure what you mean. There is no special "warning" logic in progress. The note about logs being clipped is shown in client side.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@maver1ck
Copy link

Can we revert this fix ? So many people complaining about this in the internet.

@hexagon-chunt-ricb
Copy link

hexagon-chunt-ricb commented Apr 26, 2022

@tonistiigi This anti-feature is wasting tens of thousands of developer man-hours across the globe. Nonchalantly cutting off the logs after just 1MB and making the options to change it extremely obscure and difficult to find (depending on OS, toolchains, etc.) is a surefire way to cause grief. There is a reason why the Linux shell doesn't put such limits in place - neither should virtualization tools that build on top of it.

@wolfheros
Copy link

what a wast time design, never thought log gonna be a problem for building an image.

@tonistiigi
Copy link
Member Author

@hexagon-chunt-ricb Builds should provide a consistent and fast experience. A build hitting these issues that seemed to work on your fast local machine(although being slower than it is supposed to be) can run out of memory on a different machine, clip out the whole builder output(not per step) when run on CI, and take orders of magnitude longer when run against a remote BuildKit instance. This feature was implemented based on actual user reports that got an unrelated error or unexpectedly slow build experience because they never realized their builds were producing a lot of output that they never had any use for.

If you receive this message, you should not instantly think, "how do I increase the limits" but "what fixes should I apply to my Dockerfile so that it doesn't cause this issue". If you need to access over 2MB of log files per step, add a tee to the command, then you can do anything you want with these files, eg. export them with -o. If you don't care about the experience of other people/infra reusing your dockerfile, you can add your own limits.

The issues about "I can't understand where my process failed because the log was clipped" were fixed a couple of releases ago. Defaults were also increased.

@hexagon-chunt-ricb
Copy link

If you receive this message, you should not instantly think, "how do I increase the limits" but "what fixes should I apply to my Dockerfile so that it doesn't cause this issue".

With all due respect... Have you never set up or built something that had more than 1MB of build logs? Linux kernel, large C++ projects, etc.? Frankly, your reply is baffling to me. All I want is for my log to be emitted out of docker properly, without it being cut off and without any workarounds. If anything, the people who experience infrastructure slowdowns due to large logs are the ones who should fix their dockerfiles and commands, not me.

@tonistiigi
Copy link
Member Author

@hexagon-chunt-ricb I'm not saying that you shouldn't be using programs if they generate huge amounts of output but handle these cases differently so they will not cause problems(wherever they run) are separate from cases that are reached accidentally. BuildKit doesn't just exec your process. Among everything else it implements a client-server API and task deduplication and synchronization. For example, if your parallel builds share build steps or become equal after a cache checksum calculation these steps only run once with multiple clients seeing the identical logs, including the historic logs. This means that logs need to be buffered. Without limits, a misbehaving builder will just run out of memory that is a security vector above else. If someone implements some smart on-disk buffering from certain size then this part is not so critical. The other points above though remain and misbehaving build will eventually fill your whole disk instead then.

Linux kernel

Yeah, I do. https://github.com/linuxkit/linuxkit/tree/master/kernel Dockerfile should be the most popular one. https://gist.github.com/tonistiigi/0f2724c8dbafd04645ea52b1b2e410c3 are the logs. It's a big file but no clipping afaics. The biggest step that is the most of this output generates ~190KB of logs, so less than 10% of the default limit.

@namedgraph
Copy link

@tonistiigi "we need this feature!" -- said nobody ever...

@Zrnik
Copy link

Zrnik commented Sep 17, 2022

Because instead of seeing whats wrong in the log, fix it and then go with my life i spend hours looking for this bullshit...

@tonistiigi
Copy link
Member Author

What's your practical example where you hit this but needed to take hours to change the command so logs show up (or change the limits if you prefer that). You say "what's wrong and fix it" that would imply that you are debugging an error. But that case was fixed and released a long time ago in #2577 . If your process ends with non-zero exit code you will always get the last logs, no matter what clipping conditions were hit(we will read the logs out of a buffer if needed).

@jonl-percsolutions-com
Copy link

What's your practical example where you hit this but needed to take hours to change the command so logs show up (or change the limits if you prefer that). You say "what's wrong and fix it" that would imply that you are debugging an error. But that case was fixed and released a long time ago in #2577 . If your process ends with non-zero exit code you will always get the last logs, no matter what clipping conditions were hit(we will read the logs out of a buffer if needed).

@tonistiigi,

Here is your practical example:

We use jenkins with docker build nodes for running builds. We received a ticket for support from an engineers build which is receiving the following output:

#33 1496.9 [output clipped, log limit 1MiB reached]
#33 1967.1 ERROR: [Synth 8-2715] syntax error near <<< [/home/jenkins/gateware-repo/src/pnt_top/pnt_top.sv:160]
#33 1967.2 ERROR: [Common 17-69] Command failed: Synthesis failed - please see the console or run log file for details
#33 DONE 1978.8s

According to the engineer, this is not enough information to solve the build issues he is having. Both the engineer and I had to search to find a solution for the clipped logs, but still, it's not clear how to unclip the logs as the environment variables are not mentioned or documented anywhere, without having to dig through comments and code.

Still, 2 years after implementation and many complaints the feature is not properly documented, and it appears that people are met with arrogance and condescension instead of explanation.

The least that could be done is, in the error, specify the fix [output clipped, log limit 1MiB reached, set BUILDKIT_STEP_LOG_MAX_SIZE to override. See documentation @]

How much longer would that have taken?

@tonistiigi
Copy link
Member Author

@jonl-percsolutions-com The issue you are describing has been fixed and is linked from the comment you are quoting. I don't think you are using the latest version of buildkit if you get that output.

Your suggestion to make the error message more helpful seems like a good idea.

@hexagon-chunt-ricb
Copy link

The issue you are describing has been fixed

Fyi this issue is not fixed on the latest Docker Desktop installation for Windows, which is Docker Desktop 4.12.0 (85629). I still frequently run into it and have to work around it. I don't know what version of buildkit they're bundling or if the bug is really fixed, but I still get it despite all claims to the contrary. As I said before, this anti-feature will continue to waste developer hours until it is removed.

@crazy-max
Copy link
Member

crazy-max commented Oct 17, 2022

The issue you are describing has been fixed

Fyi this issue is not fixed on the latest Docker Desktop installation for Windows, which is Docker Desktop 4.12.0 (85629). I still frequently run into it and have to work around it. I don't know what version of buildkit they're bundling or if the bug is really fixed, but I still get it despite all claims to the contrary. As I said before, this anti-feature will continue to waste developer hours until it is removed.

BuildKit backed in the latest Docker Engine (20.10.19) is still using BuildKit v0.8: https://github.com/moby/moby/blob/c964641a0d76733c071d606c72541b263da84a3e/vendor.conf#L36

I think we should backport #1934 and #2577 to v0.8 branch. WDYT @tonistiigi @jedevc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

grpc: received message larger than max (4564294 vs. 4194304)