Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stellar Horizon: select failed: sql: Scan error on column index 21, name \"ledger_close_time\": unsupported Scan #3751

Closed
leevlad opened this issue Jul 8, 2021 · 8 comments
Assignees
Labels

Comments

@leevlad
Copy link

leevlad commented Jul 8, 2021

What version are you using?

horizon_build_info{goversion="go1.16.5",version="2.5.2-739936fd2f12260b7d2e99e240b6e3bdda16a562"} 1

What did you do?

Requested /accounts/<public_address>/payments?limit=200&order=desc&join=transactions from Stellar Horizon via the HTTP API.

What did you expect to see?

A 200 response or a 404 response.

What did you see instead?

curl -i localhost:80/accounts/<redacted>/payments?limit=200&order=desc&join=transactions
HTTP/1.1 500 Internal Server Error
Server: nginx/1.14.0 (Ubuntu)
Date: Thu, 08 Jul 2021 21:32:12 GMT
Content-Type: application/problem+json; charset=utf-8
Content-Length: 354
Connection: keep-alive
Cache-Control: no-cache, no-store, max-age=0
Vary: Origin
X-Ratelimit-Limit: 101
X-Ratelimit-Remaining: 100
X-Ratelimit-Reset: 1

{
  "type": "https://stellar.org/horizon-errors/server_error",
  "title": "Internal Server Error",
  "status": 500,
  "detail": "An error occurred while processing this request.  This is usually due to a bug within the server software.  Trying this request again may succeed if the bug is transient. Otherwise, please contact the system administrator."
}

Logs

The following is in the Stellar Horizon logs from this request:

time="2021-07-08T21:34:03.923Z" level=error msg="select failed: sql: Scan error on column index 21, name \"ledger_close_time\": unsupported Scan, storing driver.Value type <nil> into type *time.Time" pid=9663 req=<redacted>/EiyHFDKXU8-000020 stack="[main.go:43 session.go:350 session.go:318 metrics.go:374 transaction.go:43 operation.go:306 operation.go:124 handler.go:199 handler.go:270 middleware.go:259 server.go:2069 chain.go:31 mux.go:425 server.go:2069 http.go:73 server.go:2069 cors.go:190 server.go:2069 compress.go:190 server.go:2069 middleware.go:228 server.go:2069 middleware.go:111 server.go:2069 middleware.go:84 server.go:2069 xff_middleware.go:51 server.go:2069 middleware.go:48 server.go:2069 request_id.go:76 server.go:2069]"

time="2021-07-08T21:34:03.923Z" level=info msg="Finished request" app_name=undefined app_version=undefined bytes=354 client_name=undefined client_version=undefined duration=0.122165316 host=localhost ip=127.0.0.1 ip_port="127.0.0.1:42786" method=GET path="/accounts/<public_address>/payments?limit=200&order=desc&join=transactions" pid=9663 referer=undefined req=<undefined>/EiyHFDKXU8-000020 route="/accounts/{account_id}/payments" status=500 streaming=false x_forwarder_for=

Stellar Horizon Config

# /etc/default/stellar-horizon

NETWORK_PASSPHRASE="Public Global Stellar Network ; September 2015"
DATABASE_URL="dbname=horizon user=stellar host=/var/run/postgresql"
STELLAR_CORE_DATABASE_URL="dbname=stellar user=stellar host=/var/run/postgresql"
STELLAR_CORE_URL="http://127.0.0.1:11626"
FRIENDBOT_SECRET=
PORT=8000
ADMIN_PORT=8001
SENTRY_DSN=
LOGGLY_TOKEN=
PER_HOUR_RATE_LIMIT=9999999
INGEST=true
HISTORY_RETENTION_COUNT=100000
APPLY_MIGRATIONS=true
HISTORY_ARCHIVE_URLS="https://history.stellar.org/prd/core-live/core_live_001,https://history.stellar.org/prd/core-live/core_live_002,https://history.stellar.org/prd/core-live/core_live_003"
ENABLE_CAPTIVE_CORE_INGESTION=false
LOG_FILE=/var/log/stellar/stellar-horizon.log

Additional Context

I am running several nodes with Stellar-Core and Stellar-Horizon on each of them, all running on identical machines with identical software environments (Ubuntu 18.04). Starting yesterday I began seeing one of my nodes throwing 500 errors when fetching account payments via the API. This same call returns 200-level success response on the other nodes.

This issue has not resolved itself after 2 days.

@leevlad leevlad added the bug label Jul 8, 2021
@leevlad
Copy link
Author

leevlad commented Jul 8, 2021

I noticed that I was on a slightly outdated version of stellar-core: v17.1.0. After upgrading to v17.2.0 the issue persists.

@bartekn bartekn self-assigned this Jul 9, 2021
@bartekn
Copy link
Contributor

bartekn commented Jul 9, 2021

Thanks for the report. I think your DB got corrupted somehow. This can only happen when there's no row with ledger data in history_ledgers corresponding to the row in history_transactions. This shouldn't happen normally because even if Horizon reaps old ledgers it removes all data associated with a given ledger. I'm inspecting the code now and I noticed that we don't do that in a DB transaction so it could happen if Horizon was stopped in a middle of the reap process. Could you provide some logs with reaper keyword?

@leevlad
Copy link
Author

leevlad commented Jul 9, 2021

Thanks for looking into it @bartekn.

I think your analysis might be correct because in fact my reaper process never succeeds. I have reported this here: #3728 (comment)

To mirror the logs from the comment mentioned above, here is my reaper log from a few days ago:

stellar-horizon[3330]: time="2021-06-30T06:49:06.535Z" level=info msg="reaper: clearing" new_elder=36041350 pid=3330
stellar-horizon[3330]: time="2021-06-30T06:49:16.537Z" level=error msg="reaper failed: Error clearing history_effects: canceling statement due to user request" pid=3330
stellar-horizon[3330]: time="2021-06-30T07:49:17.536Z" level=info msg="reaper: clearing" new_elder=36042008 pid=3330
stellar-horizon[3330]: time="2021-06-30T07:49:27.543Z" level=error msg="reaper failed: Error clearing history_effects: canceling statement due to user request" pid=3330
stellar-horizon[3330]: time="2021-06-30T08:49:27.547Z" level=info msg="reaper: clearing" new_elder=36042669 pid=3330
stellar-horizon[3330]: time="2021-06-30T08:49:37.536Z" level=error msg="reaper failed: Error clearing history_effects: canceling statement due to user request" pid=3330

@leevlad
Copy link
Author

leevlad commented Jul 9, 2021

Presumably disabling the reaper, or at least allowing it to succeed within 10s will help mitigate this issue for the time being. Is there a flag we can set to do that? Or should I just increase --history-retention-count to something very large?

@bartekn
Copy link
Contributor

bartekn commented Jul 9, 2021

@leevlad to mitigate this I think you can remove HISTORY_RETENTION_COUNT and then run reaper process (stellar-horizon db reap) in a cron job every hour. We should have a fix for this in the next or the following release.

@ire-and-curses
Copy link
Contributor

Great bug report! Thanks for this @leevlad

@bartekn
Copy link
Contributor

bartekn commented Jul 27, 2021

This is fixed by #3777.

@bartekn bartekn closed this as completed Jul 27, 2021
@gituser
Copy link

gituser commented Aug 5, 2021

I've got same issue.

I ended up nuking whole horizon DB as reap process was taking forever and starting from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants