Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid launching too many goroutines during caching layer #157

Merged
merged 2 commits into from
Oct 2, 2020

Conversation

ktock
Copy link
Member

@ktock ktock commented Oct 2, 2020

#156

Currently reader.Cache() creates goroutines for each file contained in the
layer. This can lead to too many goroutines and possibly consume too much
resource.

This commit fixes it by limiting the number of goroutines using semaphore.

cc: @strigazi

return gr.cacheWithReader(root, r, filter, cacheOpts.cacheOpts...)
eg, egCtx := errgroup.WithContext(context.Background())
eg.Go(func() error {
return gr.cacheWithReader(egCtx, eg, semaphore.NewWeighted(4),
Copy link
Member

@AkihiroSuda AkihiroSuda Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semaphore.NewWeighted(4)

Should this correspond to GOMAXPROCS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, is it really allowed to recursively call eg.Go from eg.Go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing this.

Should this correspond to GOMAXPROCS?

Fixed it on the latest push.

Curious, is it really allowed to recursively call eg.Go from eg.Go?

According to golang/go#27837 (comment), this seems an allowed pattern.

@strigazi
Copy link
Contributor

strigazi commented Oct 2, 2020

Thanks for the promt reply and super fast proposal!

The situation is better with this patch.
good:

  • goroutines stay low stay stable after a rpull of ghcr.io/strigazi/ciadm:v0.3.9-esgz-bash-version (less than 200, most of them for fuse functions).
  • I see CPU at 100% on 1 of the 16 cores (before it was 100% on all cores i.e. >1500% on top).

bad :(

  • After some time, the snapshotter craches with
runtime: goroutine stack exceeds 1000000000-byte limit
runtime: sp=0xc027f10330 stack=[0xc027f10000, 0xc047f10000]
fatal error: stack overflow

@ktock
Copy link
Member Author

ktock commented Oct 2, 2020

@strigazi Thanks for trying this patch!

After some time, the snapshotter craches

What's the stack trace showing?

This might be related to the recursive eg.Go call. (But this seems allowed pattern according to golang/go#27837 (comment))
Another possibility is that the recursive call of gr.cacheWithReader is too deep because of the deep directory hierarchy in that layer.

@strigazi
Copy link
Contributor

strigazi commented Oct 2, 2020

@strigazi Thanks for trying this patch!

After some time, the snapshotter craches

What's the stack trace showing?

This might be related to the recursive eg.Go call. (But this seems allowed pattern according to golang/go#27837 (comment))
Another possibility is that the recursive call of gr.cacheWithReader is too deep because of the deep directory hierarchy in that layer.

https://gist.githubusercontent.com/strigazi/56c0857f77c0a6153bd3602e31dd6646/raw/cfa06adcfec5ed5c2a5f135327e4109d0cea65dc/containerd-stargz-grpc-001.txt

ktock added 2 commits October 2, 2020 22:15
Currently `reader.Cache()` creates goroutines for each file contained in the
layer. This can lead to too many goroutines and possibly consume too much
resource.

This commit fixes it by limiting the number of goroutines using semaphore.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
If a tar layer contains "./" at root (which points to root itself), this will
cause infinite recursive call during caching layer and lead to panic (`goroutine
stack exceeds`). This commit fixes it.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
@ktock
Copy link
Member Author

ktock commented Oct 2, 2020

@strigazi Thanks for providing the log. That panic was because of an infinite recursive call of gr.cacheWithReader caused by ./ entry on the root which points to the root directory itself. I updated the patch to avoid this issue.

$ cat ./img/f560fdf781d7ff7e85bf15288fcb9941ed3a7f7434e024f62dd488b1deb452ee/layer.tar | tar -xO stargz.index.json | jq '.entries[].name' | head -n 10
".prefetch.landmark"
"./"
"./root/"
"./root/.tcshrc"
"./root/anaconda-ks.cfg"
"./root/original-ks.cfg"
"./root/.bash_profile"
"./root/.cshrc"
"./root/.bash_logout"
"./root/.bashrc"

@strigazi
Copy link
Contributor

strigazi commented Oct 2, 2020

Thanks @ktock , it looks good now!

Low number of goroutines for the images that I tried, CPU utilization is normal.

@AkihiroSuda AkihiroSuda merged commit 1a64941 into containerd:master Oct 2, 2020
@ktock ktock deleted the limit-gr branch October 7, 2020 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants