Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add index fail for "too many sst files are ingesting" #44137

Closed
seiya-annie opened this issue May 24, 2023 · 1 comment · Fixed by #44140
Closed

add index fail for "too many sst files are ingesting" #44137

seiya-annie opened this issue May 24, 2023 · 1 comment · Fixed by #44140
Assignees
Labels
affects-7.1 feature/developing the related feature is in development severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@seiya-annie
Copy link

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

add index
flashback to the time before add index
during flashback inject tidb failure and tikv-pd lantency

2. What did you expect to see? (Required)

add index successfully

3. What did you see instead (Required)

add index fail
[2023/05/23 23:11:34.790 +00:00] [WARN] [index.go:1020] ["[ddl] run add index job failed, convert job to rollback"] [job="ID:291, Type:add index, State:running, SchemaState:write reorganization, SchemaID:89, TableID:109, RowCount:10000000, ArgLen:6, start time: 2023-05-23 22:29:45.107 +0000 UTC, Err:[Lightning:KV:ServerIsBusy]too many sst files are ingesting, ErrCount:1, SnapshotVersion:441681441038270473, UniqueWarnings:0"] [error="[ddl:8214]Cancelled DDL job"]

4. What is your TiDB version? (Required)

7.1.0

@seiya-annie seiya-annie added the type/bug The issue is confirmed as a bug. label May 24, 2023
@tangenta tangenta self-assigned this May 24, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels May 24, 2023
@tangenta
Copy link
Contributor

Normally, TiDB should retry even if "too many sst" is reported. However, there is a panic during retry:

[2023/05/23 23:11:34.790 +00:00] [ERROR] [misc.go:116] ["panic in the recoverable goroutine"] [label=ddl] [funcInfo=onCreateIndex] [r="\"invalid memory address or nil pointer dereference\""] [stack="github.com/pingcap/tidb/util.Recover\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/misc.go:120\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:884\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:260\nruntime.sigpanic\n\t/usr/local/go/src/runtime/signal_unix.go:837\ngithub.com/pingcap/tidb/br/pkg/lightning/backend.(*OpenedEngine).Flush\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/backend.go:270\ngithub.com/pingcap/tidb/ddl/ingest.(*engineInfo).Flush\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine.go:83\ngithub.com/pingcap/tidb/ddl/ingest.(*litBackendCtx).Flush\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/backend.go:163\ngithub.com/pingcap/tidb/ddl/ingest.(*CheckpointManager).Sync\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/checkpoint.go:219\ngithub.com/pingcap/tidb/ddl.(*ingestBackfillScheduler).close\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/backfilling_scheduler.go:334\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:884\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:260\nruntime.sigpanic\n\t/usr/local/go/src/runtime/signal_unix.go:837\ngithub.com/pingcap/tidb/br/pkg/lightning/backend.(*OpenedEngine).LocalWriter\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/backend.go:275\ngithub.com/pingcap/tidb/ddl/ingest.(*engineInfo).newWriterContext\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine.go:198\ngithub.com/pingcap/tidb/ddl/ingest.(*engineInfo).CreateWriter\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine.go:173\ngithub.com/pingcap/tidb/ddl.newAddIndexIngestWorker\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:1616\ngithub.com/pingcap/tidb/ddl.(*ingestBackfillScheduler).createWorker\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/backfilling_scheduler.go:404\ngithub.com/pingcap/tidb/resourcemanager/pool/workerpool.(*WorkerPool[...]).runAWorker\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/resourcemanager/pool/workerpool/workerpool.go:107\ngithub.com/pingcap/tidb/resourcemanager/pool/workerpool.NewWorkerPool[...]\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/resourcemanager/pool/workerpool/workerpool.go:91\ngithub.com/pingcap/tidb/ddl.(*ingestBackfillScheduler).setupWorkers\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/backfilling_scheduler.go:311\ngithub.com/pingcap/tidb/ddl.(*ddlCtx).writePhysicalTableRecord\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/backfilling.go:710\ngithub.com/pingcap/tidb/ddl.(*worker).addPhysicalTableIndex\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:1788\ngithub.com/pingcap/tidb/ddl.(*worker).addTableIndex\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:1836\ngithub.com/pingcap/tidb/ddl.runReorgJobAndHandleErr.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:1004\ngithub.com/pingcap/tidb/ddl.(*worker).runReorgJob.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/reorg.go:253"]

This is because the engine is closed but it doesn't get cleaned from backend context:

[2023/05/23 22:30:10.854 +00:00] [INFO] [engine.go:120] ["[ddl-ingest] flush all writer and get closed engine"] ["job ID"=291] ["index ID"=3]

As a result, the engine is not opened in the next retry.

https://github.com/pingcap/tidb/blob/ad0957fbc1a16d6ffdbfb75410f76c7ea508d62c/ddl/ingest/engine_mgr.go#LL46C3-L46C3

@tangenta tangenta added feature/developing the related feature is in development and removed may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels May 24, 2023
@ti-chi-bot ti-chi-bot bot closed this as completed in adbcb4e May 24, 2023
@jebter jebter added the sig/sql-infra SIG: SQL Infra label Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.1 feature/developing the related feature is in development severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants