You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running silkworm commit b520fba with pre-downloaded snapshots on a 16 Gb (about 13.5 Gb free) Debian or Ubuntu VM with options:
--prune=htrc
--snapshots.no_downloader # snapshots are pre-downloaded
--sentry.remote.addr=127.0.0.1:9091 # to disable sentry
Results in a forced crash by OOM killer:
kernel: Out of memory: Killed process 45258 (silkworm) total-vm:3843295636kB, anon-rss:15170316kB, file-rss:3068kB, shmem-rss:0kB, UID:1000 pgtables:71252kB oom_score_adj:0
This happened 5 out of 5 tries after 33 min in execution stage at around 4.5M blocks (e.g. at blocks: 4479463, 4483121, 4477025)
Lowering the execution batch size helps sometimes using this option:
--batchsize=128MB
but it still crashes sometimes (4 out of 5 tries, at blocks: 2431423, 2433164, 2426881, 2433253).
Possible solutions (from easy to hard):
update README to advice 32 Gb recommended, and 16 Gb minimal with --batchsize=128MB prescription.
update the default value dynamically based on available RAM
investigate what causes the crash and why (a leak? a spike?) and propose solutions
replace --batchsize with something more meaningful to the user (e.g. total max RAM for execution)
preallocate the required memory for execution on startup
The text was updated successfully, but these errors were encountered:
battlmonstr
changed the title
Execution OOM crash in 16 Gb at 4.5M
Execution OOM crash in 16 Gb at 2-5M
Jun 3, 2024
--batchsize=128MB is not taken into account as it should in standalone silkworm. The execution stage uses a heuristic formula based on block.header.gas_used which results in a bad RAM estimation. The CAPI uses a different estimation method (current_batch_state_size()). It is mentioned in execution: improve stage Execution according to C API execute functions #2078
Given --batchsize=128MB the execution stage actually eats at least 1.3 Gb (including 1 Gb by Buffer::accounts_ and 220 Mb by Buffer::storage_). It then crashes because it needs even more RAM to continue execution. The crash correlates with Buffer::accounts_ growth (rehash_and_grow_if_necessary()) from 1.9 Gb to 3.8 Gb.
The flat_hash_map has a builtin internal policy (in rehash_and_grow_if_necessary()) to grow 2x when its size reaches 25/32 (78%) of capacity (the current doc mentions 7/8 = 87.5%, but it might refer to a newer abseil version). IntraBlockState::objects_.size() can be used to predict if the capacity might need to grow before a block is committed into the Buffer state. This will avoid an OOM.
current_batch_state_size() calculation can be simplified using a formula from here.
Running silkworm commit b520fba with pre-downloaded snapshots on a 16 Gb (about 13.5 Gb free) Debian or Ubuntu VM with options:
Results in a forced crash by OOM killer:
This happened 5 out of 5 tries after 33 min in execution stage at around 4.5M blocks (e.g. at blocks: 4479463, 4483121, 4477025)
Lowering the execution batch size helps sometimes using this option:
but it still crashes sometimes (4 out of 5 tries, at blocks: 2431423, 2433164, 2426881, 2433253).
Possible solutions (from easy to hard):
--batchsize=128MB
prescription.--batchsize
with something more meaningful to the user (e.g. total max RAM for execution)The text was updated successfully, but these errors were encountered: