Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hour reaper fail, database size grow too large #3728

Closed
ghost opened this issue Jun 29, 2021 · 3 comments · Fixed by #3777
Closed

Hour reaper fail, database size grow too large #3728

ghost opened this issue Jun 29, 2021 · 3 comments · Fixed by #3777
Labels

Comments

@ghost
Copy link

ghost commented Jun 29, 2021

horizon config

stellar-horizon serve  --db-url postgres://horizon:password@localhost/horizon --captive-core-config-append-path /data2/xlm/stellar-captive-core-stub.toml --network-passphrase "Public Global Stellar Network ; September 2015" --ingest=true --per-hour-rate-limit 999999999 --history-retention-count 30000 --stellar-core-binary-path /usr/bin/stellar-core  --history-archive-urls https://history.stellar.org/prd/core-live/core_live_001
CATCHUP_RECENT=30000
time="2021-06-29T07:23:31.337+08:00" level=info msg="reaper: clearing" new_elder=36068890 pid=19878
time="2021-06-29T07:23:41.343+08:00" level=error msg="reaper failed: Error clearing history_operations: canceling statement due to user request" pid=19878
time="2021-06-29T08:23:42.337+08:00" level=info msg="reaper: clearing" new_elder=36069553 pid=19878
time="2021-06-29T08:23:52.347+08:00" level=error msg="reaper failed: Error clearing history_operations: canceling statement due to user request" pid=19878
time="2021-06-29T09:23:53.348+08:00" level=info msg="reaper: clearing" new_elder=36070216 pid=19878
time="2021-06-29T09:24:03.575+08:00" level=error msg="reaper failed: Error clearing history_operations: canceling statement due to user request" pid=19878
time="2021-06-29T10:24:04.337+08:00" level=info msg="reaper: clearing" new_elder=36070878 pid=19878
time="2021-06-29T10:24:14.423+08:00" level=error msg="reaper failed: Error clearing history_operations: canceling statement due to user request" pid=19878
time="2021-06-29T11:24:15.337+08:00" level=info msg="reaper: clearing" new_elder=36071543 pid=19878
time="2021-06-29T11:24:25.457+08:00" level=error msg="reaper failed: Error clearing history_operations: canceling statement due to user request" pid=19878
time="2021-06-29T12:24:26.337+08:00" level=info msg="reaper: clearing" new_elder=36072206 pid=19878
time="2021-06-29T12:24:36.535+08:00" level=error msg="reaper failed: Error clearing history_operations: canceling statement due to user request" pid=19878
time="2021-06-29T13:24:37.337+08:00" level=info msg="reaper: clearing" new_elder=36072870 pid=19878
time="2021-06-29T13:24:47.555+08:00" level=error msg="reaper failed: Error clearing history_operations: canceling statement due to user request" pid=19878
time="2021-06-29T14:24:48.336+08:00" level=info msg="reaper: clearing" new_elder=36073532 pid=19878
time="2021-06-29T14:24:58.340+08:00" level=error msg="reaper failed: Error clearing history_operations: canceling statement due to user request" pid=19878
@ghost ghost added the bug label Jun 29, 2021
@ghost ghost changed the title Hour reaper fail,database grow too large Hour reaper fail, database size grow too large Jun 29, 2021
@ghost
Copy link
Author

ghost commented Jun 29, 2021

I run stellar-horizon db reap is too slow

INFO[2021-06-29T15:09:44.356+08:00] reaper: clearing                              new_elder=36074024 pid=27711
INFO[2021-06-29T15:43:26.376+08:00] reaper succeeded                              new_elder=36074024 pid=27711

@leevlad
Copy link

leevlad commented Jun 30, 2021

I am also seeing this exact behavior on all of my Stellar Horizon deployments.

stellar-horizon[3330]: time="2021-06-30T06:49:06.535Z" level=info msg="reaper: clearing" new_elder=36041350 pid=3330
stellar-horizon[3330]: time="2021-06-30T06:49:16.537Z" level=error msg="reaper failed: Error clearing history_effects: canceling statement due to user request" pid=3330
stellar-horizon[3330]: time="2021-06-30T07:49:17.536Z" level=info msg="reaper: clearing" new_elder=36042008 pid=3330
stellar-horizon[3330]: time="2021-06-30T07:49:27.543Z" level=error msg="reaper failed: Error clearing history_effects: canceling statement due to user request" pid=3330
stellar-horizon[3330]: time="2021-06-30T08:49:27.547Z" level=info msg="reaper: clearing" new_elder=36042669 pid=3330
stellar-horizon[3330]: time="2021-06-30T08:49:37.536Z" level=error msg="reaper failed: Error clearing history_effects: canceling statement due to user request" pid=3330

If you look at my logs as well as logs from the user above, you see that the reaper times out after 10 seconds.

I looked around a bit, and I believe this bug was introduced in v2.3.0 here:
2348575

Hard-coding a timeout of 10 seconds made everyone's auto reapers timeout forever. Perhaps with the only exception of those who are running really powerful machines where the reaper can run within 10 seconds, which is still not ideal because missing a single reaper tick will make the next one less likely to succeed due to having a larger data set to reap, eventually cascading into a 100% reaper failure rate. This effectively disables HISTORY_RETENTION_COUNT configuration and will cause the Horizon DB size to grow indefinitely.

Perhaps it would be better to not use a shared context for all tickers here:

func (a *App) Tick(ctx context.Context) error {
var wg sync.WaitGroup
log.Debug("ticking app")
// update ledger state, operation fee state, and stellar-core info in parallel
wg.Add(3)
go func() { a.UpdateLedgerState(ctx); wg.Done() }()
go func() { a.UpdateFeeStatsState(ctx); wg.Done() }()
go func() { a.UpdateStellarCoreInfo(ctx); wg.Done() }()
wg.Wait()
wg.Add(2)
go func() { a.reaper.Tick(ctx); wg.Done() }()
go func() { a.submitter.Tick(ctx); wg.Done() }()
wg.Wait()
log.Debug("finished ticking app")
return ctx.Err()
}

And instead use a separate context for the reaper ticker, which should have a timeout higher than 10 seconds, which can also be configured via CLI/env var params?

@ghost
Copy link
Author

ghost commented Jul 1, 2021

sorry, I not find same issue. this issue need close

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant