Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concurrent map writes in lake/journal.Store #5279

Merged
merged 1 commit into from
Sep 16, 2024

Conversation

nwt
Copy link
Member

@nwt nwt commented Sep 16, 2024

journal.Store contains a zson.UnmarshalZNGContext that can be used unsynchronized from multiple goroutines, causing a panic on concurrent map writes. Fix by removing the UnmarshalZNGContext from Store and instead allowing each goroutine to allocate its own in Store.load.

Fixes #5261.

journal.Store contains a zson.UnmarshalZNGContext that can be used
unsynchronized from multiple goroutines, causing a panic on concurrent
map writes.  Fix by removing the UnmarshalZNGContext from Store and
instead allowing each goroutine to allocate its own in Store.load.
@nwt nwt requested a review from a team September 16, 2024 18:08
@philrz
Copy link
Contributor

philrz commented Sep 16, 2024

I've confirmed that this branch seems to address the crash shown in #5261. While the repro script shown above still causes a crash on my Macbook on the first try when run on the tip of main (commit e100767 at the moment), with this branch at commit 88a9125 I can run it many times in a row without problems.

$ zed -version
Version: v1.17.0-61-g88a9125b

$ ./repro.sh 
pool created: foo5 2mANuIPAxTgVcwJaQx10oqoySXa
pool created: foo2 2mANuHnP25bigww7RAHkkbg45uj
pool created: foo1 2mANuGufNnk1QKCfdjyw8zRVZPS
pool created: foo7 2mANuLYAXBYKbNtv0JXj9wJBrWP
pool created: foo3 2mANuFpNrUwb05hbC6hernT02hI
pool created: foo6 2mANuFAZAAYEZXNTuBIu0SIjOuz
pool created: foo4 2mANuGOtAScZL9uDTtEOHpC7Yhv
pool created: foo8 2mANuKeFyIXIutsKabQf69Aj6O4
pool 2mANuGOtAScZL9uDTtEOHpC7Yhv renamed from foo4 to bar4
pool 2mANuHnP25bigww7RAHkkbg45uj renamed from foo2 to bar2
pool 2mANuFAZAAYEZXNTuBIu0SIjOuz renamed from foo6 to bar6
pool 2mANuIPAxTgVcwJaQx10oqoySXa renamed from foo5 to bar5
pool 2mANuKeFyIXIutsKabQf69Aj6O4 renamed from foo8 to bar8
pool 2mANuFpNrUwb05hbC6hernT02hI renamed from foo3 to bar3
pool 2mANuGufNnk1QKCfdjyw8zRVZPS renamed from foo1 to bar1
pool 2mANuLYAXBYKbNtv0JXj9wJBrWP renamed from foo7 to bar7
pool deleted: bar1
pool deleted: bar2
pool deleted: bar3
pool deleted: bar4
pool deleted: bar5
pool deleted: bar6
pool deleted: bar7
pool deleted: bar8

$ ./repro.sh 
pool created: foo4 2mANuZA1uUIkBQhpLrdh1fTALBO
pool created: foo3 2mANuZBrnl0Rf5VyAJue82BlJ1R
pool created: foo1 2mANuVtmDtCUtP1KMrnKjCXIPNU
pool created: foo6 2mANubHIwkfwCwVO1PbHD6XMvj0
pool created: foo5 2mANuUV0qYiDiBBBsKyLtzn8V6t
pool created: foo2 2mANuUyTiNe9AhvjXQQrms7ijhh
pool created: foo8 2mANuXXXkQyz5Oj1JgU9xiAQ2cz
pool created: foo7 2mANuYPYmDz4xDrZJkNa7npT0P9
pool 2mANuZA1uUIkBQhpLrdh1fTALBO renamed from foo4 to bar4
pool 2mANuUV0qYiDiBBBsKyLtzn8V6t renamed from foo5 to bar5
pool 2mANuYPYmDz4xDrZJkNa7npT0P9 renamed from foo7 to bar7
pool 2mANuUyTiNe9AhvjXQQrms7ijhh renamed from foo2 to bar2
pool 2mANuZBrnl0Rf5VyAJue82BlJ1R renamed from foo3 to bar3
pool 2mANuXXXkQyz5Oj1JgU9xiAQ2cz renamed from foo8 to bar8
pool 2mANuVtmDtCUtP1KMrnKjCXIPNU renamed from foo1 to bar1
pool 2mANubHIwkfwCwVO1PbHD6XMvj0 renamed from foo6 to bar6
pool deleted: bar1
pool deleted: bar2
pool deleted: bar3
pool deleted: bar4
pool deleted: bar5
pool deleted: bar6
pool deleted: bar7
pool deleted: bar8

...

@nwt nwt merged commit 53331c6 into main Sep 16, 2024
3 checks passed
@nwt nwt deleted the fix-concurrent-map-writes-in-journal.Store.load branch September 16, 2024 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zed lake service crash during concurrent pool renames
3 participants