Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in WALEntry.Encode and Values.Deduplicate #8095

Merged
merged 2 commits into from
Mar 6, 2017
Merged

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Mar 3, 2017

Under high query load, a race exists in the cache and the WAL. Since
writes currently hit the cache first, they are availble for query before
they hit the WAL. If the WAL is writing and accessign the Value slice
at the same time that a query is run that needs to dedup the same slice,
a race occurs.

To fix this, the cache now just copies the values instead of storing the
slice passed in. Another way to fix this might be to have the writes go
to the wal before the cache. I think the latter would be better, but it
introduces some larger write path issues that we'd need to also address.
e.g. if the cache was full, writes to the WAL would need to be rejected
to avoid filling the disk.

Copying the slice in the cache is simpler for now and does not appear to
dramatically affect performance.

WARNING: DATA RACE
Write at 0x00c420224010 by goroutine 33:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.Values.Deduplicate()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/encoding.gen.go:83 +0x2ae
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*entry).deduplicate()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:124 +0xda
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Values()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:472 +0x647
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).buildFloatCursor()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1678 +0xf0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).buildCursor()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1664 +0x3e2
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createVarRefSeriesIterator()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1518 +0x37df
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetGroupIterators()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1491 +0x2a3
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetIterators.func1()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1446 +0x195

Previous read at 0x00c420224010 by goroutine 15:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*WriteWALEntry).Encode()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/wal.go:583 +0x7da
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*WAL).writeToLog()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/wal.go:328 +0x194
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*WAL).WritePoints()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/wal.go:239 +0x90
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).WritePoints()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:713 +0xcf2
  github.com/influxdata/influxdb/tsdb.(*Shard).WritePoints()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/shard.go:417 +0x2cc
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrentQuery.func2()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/shard_test.go:550 +0x955

Goroutine 33 (running) created at:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetIterators()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1447 +0x59c
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createVarRefIterator.func1()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1359 +0x122
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createVarRefIterator()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1400 +0x306
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).CreateIterator()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1265 +0x6d2
  github.com/influxdata/influxdb/tsdb.(*Shard).CreateIterator()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/shard.go:697 +0x111
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrentQuery.func1()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/shard_test.go:498 +0xab2

Goroutine 15 (running) created at:
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrentQuery()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/shard_test.go:579 +0x8fb
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:610 +0xc9
Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)
Required only if applicable

You can erase any checkboxes below this note if they are not applicable to your Pull Request.

@jwilder jwilder added this to the 1.2.1 milestone Mar 3, 2017
@jwilder jwilder requested a review from e-dard March 3, 2017 22:45
@jwilder jwilder changed the title Fix race in WALEntry.Encode and Value.Deduplicate Fix race in WALEntry.Encode and Values.Deduplicate Mar 3, 2017
jwilder added 2 commits March 6, 2017 09:38
Under high query load, a race exists in the cache and the WAL.  Since
writes currently hit the cache first, they are availble for query before
they hit the WAL.  If the WAL is writing and accessign the Value slice
at the same time that a query is run that needs to dedup the same slice,
a race occurs.

To fix this, the cache now just copies the values instead of storing the
slice passed in.  Another way to fix this might be to have the writes go
to the wal before the cache.  I think the latter would be better, but it
introduces some larger write path issues that we'd need to also address.
e.g. if the cache was full, writes to the WAL would need to be rejected
to avoid filling the disk.

Copying the slice in the cache is simpler for now and does not appear to
dramatically affect performance.
There is a race where the field type can be deleted while a new type
is written and during a query.  When this happens, an iterator for
the new type is created but old data make still exist in the cache
for TSM files causing a panic.
@jwilder jwilder merged commit 80f9a29 into 1.2 Mar 6, 2017
@jwilder jwilder deleted the jw-cache-race2 branch March 6, 2017 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants