add index fail for "too many sst files are ingesting" #44137

seiya-annie · 2023-05-24T07:00:54Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

add index
flashback to the time before add index
during flashback inject tidb failure and tikv-pd lantency

2. What did you expect to see? (Required)

add index successfully

3. What did you see instead (Required)

add index fail
[2023/05/23 23:11:34.790 +00:00] [WARN] [index.go:1020] ["[ddl] run add index job failed, convert job to rollback"] [job="ID:291, Type:add index, State:running, SchemaState:write reorganization, SchemaID:89, TableID:109, RowCount:10000000, ArgLen:6, start time: 2023-05-23 22:29:45.107 +0000 UTC, Err:[Lightning:KV:ServerIsBusy]too many sst files are ingesting, ErrCount:1, SnapshotVersion:441681441038270473, UniqueWarnings:0"] [error="[ddl:8214]Cancelled DDL job"]

4. What is your TiDB version? (Required)

7.1.0

tangenta · 2023-05-24T07:21:23Z

Normally, TiDB should retry even if "too many sst" is reported. However, there is a panic during retry:

[2023/05/23 23:11:34.790 +00:00] [ERROR] [misc.go:116] ["panic in the recoverable goroutine"] [label=ddl] [funcInfo=onCreateIndex] [r="\"invalid memory address or nil pointer dereference\""] [stack="github.com/pingcap/tidb/util.Recover\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/misc.go:120\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:884\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:260\nruntime.sigpanic\n\t/usr/local/go/src/runtime/signal_unix.go:837\ngithub.com/pingcap/tidb/br/pkg/lightning/backend.(*OpenedEngine).Flush\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/backend.go:270\ngithub.com/pingcap/tidb/ddl/ingest.(*engineInfo).Flush\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine.go:83\ngithub.com/pingcap/tidb/ddl/ingest.(*litBackendCtx).Flush\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/backend.go:163\ngithub.com/pingcap/tidb/ddl/ingest.(*CheckpointManager).Sync\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/checkpoint.go:219\ngithub.com/pingcap/tidb/ddl.(*ingestBackfillScheduler).close\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/backfilling_scheduler.go:334\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:884\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:260\nruntime.sigpanic\n\t/usr/local/go/src/runtime/signal_unix.go:837\ngithub.com/pingcap/tidb/br/pkg/lightning/backend.(*OpenedEngine).LocalWriter\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/backend.go:275\ngithub.com/pingcap/tidb/ddl/ingest.(*engineInfo).newWriterContext\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine.go:198\ngithub.com/pingcap/tidb/ddl/ingest.(*engineInfo).CreateWriter\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine.go:173\ngithub.com/pingcap/tidb/ddl.newAddIndexIngestWorker\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:1616\ngithub.com/pingcap/tidb/ddl.(*ingestBackfillScheduler).createWorker\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/backfilling_scheduler.go:404\ngithub.com/pingcap/tidb/resourcemanager/pool/workerpool.(*WorkerPool[...]).runAWorker\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/resourcemanager/pool/workerpool/workerpool.go:107\ngithub.com/pingcap/tidb/resourcemanager/pool/workerpool.NewWorkerPool[...]\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/resourcemanager/pool/workerpool/workerpool.go:91\ngithub.com/pingcap/tidb/ddl.(*ingestBackfillScheduler).setupWorkers\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/backfilling_scheduler.go:311\ngithub.com/pingcap/tidb/ddl.(*ddlCtx).writePhysicalTableRecord\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/backfilling.go:710\ngithub.com/pingcap/tidb/ddl.(*worker).addPhysicalTableIndex\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:1788\ngithub.com/pingcap/tidb/ddl.(*worker).addTableIndex\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:1836\ngithub.com/pingcap/tidb/ddl.runReorgJobAndHandleErr.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:1004\ngithub.com/pingcap/tidb/ddl.(*worker).runReorgJob.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/reorg.go:253"]

This is because the engine is closed but it doesn't get cleaned from backend context:

[2023/05/23 22:30:10.854 +00:00] [INFO] [engine.go:120] ["[ddl-ingest] flush all writer and get closed engine"] ["job ID"=291] ["index ID"=3]

As a result, the engine is not opened in the next retry.

https://github.com/pingcap/tidb/blob/ad0957fbc1a16d6ffdbfb75410f76c7ea508d62c/ddl/ingest/engine_mgr.go#LL46C3-L46C3

close #44044, close #44137

seiya-annie added the type/bug The issue is confirmed as a bug. label May 24, 2023

tangenta self-assigned this May 24, 2023

tangenta added affects-7.1 severity/major labels May 24, 2023

ti-chi-bot bot added may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels May 24, 2023

tangenta mentioned this issue May 24, 2023

ddl/ingest: create new local backend if necessary #44140

Merged

12 tasks

ti-chi-bot bot closed this as completed in #44140 May 24, 2023

ti-chi-bot bot closed this as completed in adbcb4e May 24, 2023

ti-chi-bot mentioned this issue May 24, 2023

ddl/ingest: create new local backend if necessary (#44140) #44152

Merged

12 tasks

jebter added the sig/sql-infra SIG: SQL Infra label Jul 3, 2023

ti-chi-bot bot pushed a commit that referenced this issue Jul 7, 2023

ddl/ingest: create new local backend if necessary (#44140) (#44152)

df1d07e

close #44044, close #44137

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add index fail for "too many sst files are ingesting" #44137

add index fail for "too many sst files are ingesting" #44137

seiya-annie commented May 24, 2023

tangenta commented May 24, 2023

add index fail for "too many sst files are ingesting" #44137

add index fail for "too many sst files are ingesting" #44137

Comments

seiya-annie commented May 24, 2023

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

tangenta commented May 24, 2023