Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when dropping measurement and writing #8713

Closed
jwilder opened this issue Aug 16, 2017 · 0 comments · Fixed by #8714
Closed

Deadlock when dropping measurement and writing #8713

jwilder opened this issue Aug 16, 2017 · 0 comments · Fixed by #8714
Assignees

Comments

@jwilder
Copy link
Contributor

jwilder commented Aug 16, 2017

Bug report

Version: 1.3.3/master

There is a deadlock that can occur when writing points and dropping the measurement being written to concurrently.

The deadlock occurs due to out of order locking of the Engine and MeasurementFields lock. One goroutine (snapshotting the cache), acquires an RLock on the Engine and ends up needing to acquire an RLock on MeasurementFields to refresh the index. A second goroutine running a delete acquires a Lock on MeasurementFields after the first goroutine takes the Engine RLock. It subsequently tries take a Lock on the engine to disable compactions which blocks. Goroutine 1 then tries to acquire the RLock on MeasurementFields and they deadlock.

1 @ 0x102e3aa 0x102e48e 0x103ea41 0x103e644 0x106ee19 0x1431eb1 0x15cf7fd 0x15d2d3b 0x161dba8 0x15e655d 0x15d1de6 0x15d19da 0x15d2306 0x1612c11 0x105a931
#	0x103e643	sync.runtime_Semacquire+0x33														/usr/local/go/src/runtime/sema.go:47
#	0x106ee18	sync.(*RWMutex).RLock+0x48														/usr/local/go/src/sync/rwmutex.go:43
#	0x1431eb0	github.com/influxdata/influxdb/tsdb.(*MeasurementFieldSet).CreateFieldsIfNotExists+0x30							/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/shard.go:1320
#	0x15cf7fc	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).addToIndexFromKey+0xfc							/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:817
#	0x15d2d3a	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).onFileStoreReplace+0x3da							/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1287
#	0x161dba7	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).(github.com/influxdata/influxdb/tsdb/engine/tsm1.onFileStoreReplace)-fm+0x47	/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:182
#	0x15e655c	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Replace+0x148c								/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:554
#	0x15d1de5	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).writeSnapshotAndCommit+0x255							/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1162
#	0x15d19d9	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).WriteSnapshot+0x2f9								/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1127
#	0x15d2305	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactCache+0x295								/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1191
#	0x1612c10	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).enableSnapshotCompactions.func1+0x60						/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:299

1 @ 0x102e3aa 0x102e48e 0x103ea41 0x103e644 0x106ef3e 0x15cab7e 0x15d0e45 0x15d0997 0x15d1306 0x1613bf5 0x14321d9 0x15d11ac 0x142a68d 0x143e654 0x143e788 0x105a931
#	0x103e643	sync.runtime_Semacquire+0x33								/usr/local/go/src/runtime/sema.go:47
#	0x106ef3d	sync.(*RWMutex).Lock+0x6d								/usr/local/go/src/sync/rwmutex.go:91
#	0x15cab7d	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).enableLevelCompactions+0x3d	/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:228
#	0x15d0e44	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteSeriesRange+0x484	/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1018
#	0x15d0996	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteSeries+0x66		/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:930
#	0x15d1305	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteMeasurement+0xd5	/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1045
#	0x1613bf4	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteMeasurement.func1+0x44	/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1030
#	0x14321d8	github.com/influxdata/influxdb/tsdb.(*MeasurementFieldSet).DeleteWithLock+0x78		/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/shard.go:1350
#	0x15d11ab	github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteMeasurement+0x13b	/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1031
#	0x142a68c	github.com/influxdata/influxdb/tsdb.(*Shard).DeleteMeasurement+0x8c			/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/shard.go:504
#	0x143e653	github.com/influxdata/influxdb/tsdb.(*Store).DeleteMeasurement.func1+0xe3		/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/store.go:581
#	0x143e787	github.com/influxdata/influxdb/tsdb.(*Store).walkShards.func1+0x47			/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/store.go:630

@jwilder jwilder self-assigned this Aug 16, 2017
jwilder added a commit that referenced this issue Aug 16, 2017
The OnReplace func ends up trying to acquire locks on MeasurementFields.  When
its called via snapshotting, this can deadlock because the snapshotting goroutine
also holds an RLock on the engine.  If a delete measurement calls is run at the
right time, it will lock the MeasurementFields and try to acquire a lock on the engine
to disable compactions.  This creates a deadlock.

To fix this, the OnReplace callback is moved to a function param to allow only Replace
calls as part of a compaction to invoke it as opposed to both snapshotting and compactions.

Fixes #8713
@ghost ghost removed the ready label Aug 16, 2017
jwilder added a commit that referenced this issue Aug 16, 2017
The OnReplace func ends up trying to acquire locks on MeasurementFields.  When
its called via snapshotting, this can deadlock because the snapshotting goroutine
also holds an RLock on the engine.  If a delete measurement calls is run at the
right time, it will lock the MeasurementFields and try to acquire a lock on the engine
to disable compactions.  This creates a deadlock.

To fix this, the OnReplace callback is moved to a function param to allow only Replace
calls as part of a compaction to invoke it as opposed to both snapshotting and compactions.

Fixes #8713
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants