Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Influxd: Compaction-related panic on HEAD-2016/04/26 #6595

Closed
rw opened this issue May 10, 2016 · 13 comments
Closed

Influxd: Compaction-related panic on HEAD-2016/04/26 #6595

rw opened this issue May 10, 2016 · 13 comments
Assignees
Milestone

Comments

@rw
Copy link
Contributor

rw commented May 10, 2016

During a compaction run (with the compaction intervals in the config file set to 5s), influxd panic'ed. cc @toddboom @jwilder

$ ./influxd run --config=influxdb.conf

 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

[run] 2016/05/10 16:13:04 InfluxDB starting, version 0.10.0, branch master, commit 662ead753b56dacee72f67d093d920729f73a3da
[run] 2016/05/10 16:13:04 Go version go1.6.2, GOMAXPROCS set to 8
[run] 2016/05/10 16:13:04 Using configuration at: influxdb.conf
[store]2016/05/10 16:13:04 Using data dir: /mnt/nvme/influxdb_data/db/data
[tsm1wal] 2016/05/10 16:13:04 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/05/10 16:13:04 tsm1 WAL writing to /mnt/nvme/influxdb_data/db/wal/_internal/monitor/1
[tsm1wal] 2016/05/10 16:13:04 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/05/10 16:13:04 tsm1 WAL writing to /mnt/nvme/influxdb_data/db/wal/benchmark_db/default/2
[cacheloader] 2016/05/10 16:13:04 reading file /mnt/nvme/influxdb_data/db/wal/_internal/monitor/1/_00001.wal, size 55128
[filestore] 2016/05/10 16:13:04 /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000073-000000001.tsm (#4) opened in 5.758746ms
[filestore] 2016/05/10 16:13:04 /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000072-000000003.tsm (#3) opened in 5.034206ms
[filestore] 2016/05/10 16:13:04 /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000068-000000003.tsm (#2) opened in 5.738256ms
[filestore] 2016/05/10 16:13:04 /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000064-000000004.tsm (#1) opened in 5.708621ms
[filestore] 2016/05/10 16:13:04 /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000048-000000005.tsm (#0) opened in 7.144912ms
[cacheloader] 2016/05/10 16:13:04 reading file /mnt/nvme/influxdb_data/db/wal/benchmark_db/default/2/_00343.wal, size 10871632
[cacheloader] 2016/05/10 16:13:04 reading file /mnt/nvme/influxdb_data/db/wal/_internal/monitor/1/_00002.wal, size 0
[tsm1] 2016/05/10 16:13:04 /mnt/nvme/influxdb_data/db/data/_internal/monitor/1 database index loaded in 2.680021ms
[store]2016/05/10 16:13:04 /mnt/nvme/influxdb_data/db/data/_internal/monitor/1 opened in 20.046772ms
[cacheloader] 2016/05/10 16:13:04 reading file /mnt/nvme/influxdb_data/db/wal/benchmark_db/default/2/_00344.wal, size 722946
[cacheloader] 2016/05/10 16:13:04 reading file /mnt/nvme/influxdb_data/db/wal/benchmark_db/default/2/_00345.wal, size 0
[tsm1] 2016/05/10 16:13:05 /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2 database index loaded in 527.878343ms
[store]2016/05/10 16:13:05 /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2 opened in 959.814821ms
[subscriber] 2016/05/10 16:13:05 opened service
[monitor] 2016/05/10 16:13:05 Starting monitor system
[monitor] 2016/05/10 16:13:05 'build' registered for diagnostics monitoring
[monitor] 2016/05/10 16:13:05 'runtime' registered for diagnostics monitoring
[monitor] 2016/05/10 16:13:05 'network' registered for diagnostics monitoring
[monitor] 2016/05/10 16:13:05 'system' registered for diagnostics monitoring
[cluster] 2016/05/10 16:13:05 Starting cluster service
[shard-precreation] 2016/05/10 16:13:05 Starting precreation service with check interval of 10m0s, advance period of 30m0s
[snapshot] 2016/05/10 16:13:05 Starting snapshot service
[copier] 2016/05/10 16:13:05 Starting copier service
[admin] 2016/05/10 16:13:05 Starting admin service
[monitor] 2016/05/10 16:13:05 Storing statistics in database '_internal' retention policy 'monitor', at interval 10s
[admin] 2016/05/10 16:13:05 Listening on HTTP: [::]:8083
[continuous_querier] 2016/05/10 16:13:05 Starting continuous query service
[httpd]2016/05/10 16:13:05 Starting HTTP service
[httpd]2016/05/10 16:13:05 Authentication enabled: false
[httpd]2016/05/10 16:13:05 Listening on HTTP: [::]:8086
[retention] 2016/05/10 16:13:05 Starting retention policy enforcement service with check interval of 30m0s
[run] 2016/05/10 16:13:05 Listening for signals
2016/05/10 16:13:05 Sending anonymous usage statistics to m.influxdb.com
[tsm1] 2016/05/10 16:13:09 beginning full compaction of group 0, 5 TSM files
[tsm1] 2016/05/10 16:13:09 compacting full group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000048-000000005.tsm (#0)
[tsm1] 2016/05/10 16:13:09 compacting full group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000064-000000004.tsm (#1)
[tsm1] 2016/05/10 16:13:09 compacting full group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000068-000000003.tsm (#2)
[tsm1] 2016/05/10 16:13:09 compacting full group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000072-000000003.tsm (#3)
[tsm1] 2016/05/10 16:13:09 compacting full group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000073-000000001.tsm (#4)
[tsm1] 2016/05/10 16:13:10 beginning level 1 compaction of group 0, 2 TSM files
[tsm1] 2016/05/10 16:13:10 compacting level 1 group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000073-000000001.tsm (#0)
[tsm1] 2016/05/10 16:13:10 compacting level 1 group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000075-000000001.tsm (#1)
[tsm1] 2016/05/10 16:13:11 compacted level 1 group (0) into /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000075-000000002.tsm.tmp (#0)
[tsm1] 2016/05/10 16:13:11 compacted level 1 group 0 of 2 files into 1 files in 61.043111ms
[tsm1] 2016/05/10 16:13:21 beginning level 1 compaction of group 0, 2 TSM files
[tsm1] 2016/05/10 16:13:21 compacting level 1 group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000001-000000001.tsm (#0)
[tsm1] 2016/05/10 16:13:21 compacting level 1 group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000001.tsm (#1)
[tsm1] 2016/05/10 16:13:21 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/05/10 16:13:21 compacting full group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000001-000000001.tsm (#0)
[tsm1] 2016/05/10 16:13:21 compacting full group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000001.tsm (#1)
[tsm1] 2016/05/10 16:13:21 compacted level 1 group (0) into /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm.tmp (#0)
[tsm1] 2016/05/10 16:13:21 compacted level 1 group 0 of 2 files into 1 files in 2.402858ms
[tsm1] 2016/05/10 16:13:21 error replacing new TSM files: rename /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm.tmp /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm: no such file or directory
[tsm1] 2016/05/10 16:13:32 compacted full group (0) into /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000073-000000002.tsm.tmp (#0)
[tsm1] 2016/05/10 16:13:32 compacted full 5 files into 1 files in 22.734636333s
[tsm1] 2016/05/10 16:13:32 beginning level 2 compaction of group 0, 2 TSM files
[tsm1] 2016/05/10 16:13:32 compacting level 2 group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000073-000000002.tsm (#0)
[tsm1] 2016/05/10 16:13:32 compacting level 2 group (0) /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000075-000000002.tsm (#1)
[tsm1] 2016/05/10 16:13:41 beginning level 1 compaction of group 0, 2 TSM files
[tsm1] 2016/05/10 16:13:41 compacting level 1 group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000003-000000001.tsm (#0)
[tsm1] 2016/05/10 16:13:41 compacting level 1 group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000004-000000001.tsm (#1)
[tsm1] 2016/05/10 16:13:41 compacted level 1 group (0) into /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000004-000000002.tsm.tmp (#0)
[tsm1] 2016/05/10 16:13:41 compacted level 1 group 0 of 2 files into 1 files in 5.079504ms
[tsm1] 2016/05/10 16:13:42 beginning level 2 compaction of group 0, 2 TSM files
[tsm1] 2016/05/10 16:13:42 compacting level 2 group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm (#0)
[tsm1] 2016/05/10 16:13:42 compacting level 2 group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000004-000000002.tsm (#1)
panic: runtime error: slice bounds out of range

goroutine 98 [running]:
panic(0xb4a800, 0xc82000e090)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*indirectIndex).UnmarshalBinary(0xc823764080, 0x7fb7df834a15, 0x2b90, 0x2b98, 0x0, 0x0)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:671 +0x515
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*mmapAccessor).init(0xc824c3c140, 0x0, 0x0, 0x0, 0x0)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:981 +0x569
github.com/influxdata/influxdb/tsdb/engine/tsm1.NewTSMReaderWithOptions(0x0, 0x0, 0xc820960010, 0x0, 0x0, 0x0)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:157 +0x2df
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).compact(0xc8201cc090, 0xc820080701, 0xc824b40220, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:459 +0x306
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).CompactFast(0xc8201cc090, 0xc824b40220, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:486 +0x71
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMLevel.func1(0xc820074400, 0xc82011b600, 0x2, 0x1, 0x0, 0xc824b40220, 0x2, 0x2)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:577 +0x5fd
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMLevel
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:603 +0x265

@rw
Copy link
Contributor Author

rw commented May 10, 2016

Also the database is corrupted. When restarting influxd this happens:

$ ./influxd run --config=influxdb.conf

 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

[run] 2016/05/10 16:20:43 InfluxDB starting, version 0.10.0, branch master, commit 662ead753b56dacee72f67d093d920729f73a3da
[run] 2016/05/10 16:20:43 Go version go1.6.2, GOMAXPROCS set to 8
[run] 2016/05/10 16:20:43 Using configuration at: influxdb.conf
[store]2016/05/10 16:20:43 Using data dir: /mnt/nvme/influxdb_data/db/data
[tsm1wal] 2016/05/10 16:20:43 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/05/10 16:20:43 tsm1 WAL writing to /mnt/nvme/influxdb_data/db/wal/_internal/monitor/1
[tsm1wal] 2016/05/10 16:20:43 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/05/10 16:20:43 tsm1 WAL writing to /mnt/nvme/influxdb_data/db/wal/benchmark_db/default/2
[filestore] 2016/05/10 16:20:43 /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000004-000000002.tsm (#1) opened in 109.476µs
panic: runtime error: slice bounds out of range

goroutine 21 [running]:
panic(0xb4a800, 0xc82000e090)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*indirectIndex).UnmarshalBinary(0xc8201f2000, 0x7f49f5913a15, 0x2b90, 0x2b98, 0x0, 0x0)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:671 +0x515
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*mmapAccessor).init(0xc8201e0050, 0x0, 0x0, 0x0, 0x0)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:981 +0x569
github.com/influxdata/influxdb/tsdb/engine/tsm1.NewTSMReaderWithOptions(0x0, 0x0, 0xc82007e080, 0x0, 0x0, 0x0)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:157 +0x2df
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Open.func1(0xc82009c380, 0xc8200725a0, 0x0, 0xc82007e080)
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:325 +0x9b
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Open
        /Users/todd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:335 +0x749

@jwilder
Copy link
Contributor

jwilder commented May 11, 2016

@rw Which config setting was set to 5s? Was it cache-snapshot-write-cold-duration or compact-full-write-cold-duration? If it was the latter and you were still writing to the shard, you will see errors like the one below.

[tsm1] 2016/05/10 16:13:21 error replacing new TSM files: rename /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm.tmp /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm: no such file or directory

From the logs just above that:

[tsm1] 2016/05/10 16:13:21 beginning level 1 compaction of group 0, 2 TSM files
[tsm1] 2016/05/10 16:13:21 compacting level 1 group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000001-000000001.tsm (#0)
[tsm1] 2016/05/10 16:13:21 compacting level 1 group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000001.tsm (#1)
[tsm1] 2016/05/10 16:13:21 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/05/10 16:13:21 compacting full group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000001-000000001.tsm (#0)
[tsm1] 2016/05/10 16:13:21 compacting full group (0) /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000001.tsm (#1)
[tsm1] 2016/05/10 16:13:21 compacted level 1 group (0) into /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm.tmp (#0)
[tsm1] 2016/05/10 16:13:21 compacted level 1 group 0 of 2 files into 1 files in 2.402858ms
[tsm1] 2016/05/10 16:13:21 error replacing new TSM files: rename /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm.tmp /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/000000002-000000002.tsm: no such file or directory
[tsm1] 2016/05/10 16:13:32 compacted full group (0) into /mnt/nvme/influxdb_data/db/data/benchmark_db/default/2/000000073-000000002.tsm.tmp (#0)

That looks like a full compaction ran right about the same time as a level 1 compaction did and they both tried to compact the same files into the same destination file causing the panic. I've seen this before, but have not seen it create a TSM file that failed to open.

@jwilder jwilder added this to the 1.0.0 milestone May 11, 2016
@toddboom
Copy link
Contributor

@jwilder I'll let Robert confirm, but I'm pretty sure this was after writing to the server had stopped.

@jwilder
Copy link
Contributor

jwilder commented May 11, 2016

@toddboom Looks like it's the _internal DB that had the issue. That will write every 10s unless storage is disabled.

@rw
Copy link
Contributor Author

rw commented May 11, 2016

@jwilder I set both of those parameters actually :-) To 10s IIRC. Possibly a race condition with the _internal writes that happen every 10s?

@rw
Copy link
Contributor Author

rw commented May 11, 2016

@jwilder Also, no writes happened during those runs of the database.

@jwilder
Copy link
Contributor

jwilder commented May 11, 2016

@rw You can probably move /mnt/nvme/influxdb_data/db/data/_internal/monitor/1/ out of way and the DB should start up again.

@julienmathevet
Copy link

julienmathevet commented May 13, 2016

I don't know if it's related. I have also compactor issue, and influxdb won't restart:

[run] 2016/05/13 08:32:20 InfluxDB starting, version 0.13.0~rc2, branch master, commit 590b3f9fc76904c0035ef1e9731ad1c27903ef76
[run] 2016/05/13 08:32:20 Go version go1.6.2, GOMAXPROCS set to 24
[run] 2016/05/13 08:32:20 Using configuration at: /etc/influxdb.toml
[store] 2016/05/13 08:32:20 Using data dir: /data/data
[tsm1wal] 2016/05/13 08:32:20 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/05/13 08:32:20 tsm1 WAL writing to /data/wal/_internal/monitor/1
[tsm1wal] 2016/05/13 08:32:20 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/05/13 08:32:20 tsm1 WAL writing to /data/wal/_internal/monitor/3
[cacheloader] 2016/05/13 08:32:20 reading file /data/wal/_internal/monitor/3/_00001.wal, size 1700879
[filestore] 2016/05/13 08:32:20 /data/data/_internal/monitor/1/000000001-000000001.tsm (#0) opened in 122.131µs
[cacheloader] 2016/05/13 08:32:20 reading file /data/wal/_internal/monitor/1/_00004.wal, size 0
[tsm1wal] 2016/05/13 08:32:20 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/05/13 08:32:20 tsm1 WAL writing to /data/wal/results/default/2
[shard] 2016/05/13 08:32:20 /data/data/_internal/monitor/1 database index loaded in 4.051656ms
[store] 2016/05/13 08:32:20 /data/data/_internal/monitor/1 opened in 6.408679ms
[filestore] 2016/05/13 08:32:20 /data/data/results/default/2/000000032-000000004.tsm (#1) opened in 27.552789ms
[filestore] 2016/05/13 08:32:20 /data/data/results/default/2/000000016-000000004.tsm (#0) opened in 29.453147ms
[cacheloader] 2016/05/13 08:32:20 reading file /data/wal/results/default/2/_00065.wal, size 56193
[cacheloader] 2016/05/13 08:32:20 reading file /data/wal/results/default/2/_00066.wal, size 0
[tsm1] 2016/05/13 08:32:20 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/05/13 08:32:20 compacting full group (0) /data/data/results/default/2/000000016-000000004.tsm (#0)
[tsm1] 2016/05/13 08:32:20 compacting full group (0) /data/data/results/default/2/000000032-000000004.tsm (#1)
[cacheloader] 2016/05/13 08:32:20 reading file /data/wal/_internal/monitor/3/_00002.wal, size 0
[shard] 2016/05/13 08:32:20 /data/data/_internal/monitor/3 database index loaded in 862.128µs
[store] 2016/05/13 08:32:20 /data/data/_internal/monitor/3 opened in 182.801127ms
panic: interface conversion: tsm1.Value is *tsm1.StringValue, not *tsm1.FloatValue

goroutine 35 [running]:
panic(0xb68180, 0xc821cd7ec0)
    /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/influxdata/influxdb/tsdb/engine/tsm1.encodeFloatBlock(0x0, 0x0, 0x0, 0xc8223d2500, 0x3e8, 0x450, 0x0, 0x0, 0x0, 0x0, ...)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/encoding.go:367 +0x8cd
github.com/influxdata/influxdb/tsdb/engine/tsm1.Values.Encode(0xc8223d2500, 0x3e8, 0x450, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/encoding.go:193 +0x408
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).chunk(0xc8200e2300, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:974 +0xe3
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).combine(0xc8200e2300, 0xc82003a300, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:965 +0xd0a
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).merge(0xc8200e2300)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:845 +0x13f
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).Next(0xc8200e2300, 0xc800437cff)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:813 +0xbe5
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).write(0xc82019b530, 0xc8206e2100, 0x38, 0x7f2a2fbfe798, 0xc8200e2300, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:560 +0x53b
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).writeNewFiles(0xc82019b530, 0x20, 0x5, 0x7f2a2fbfe798, 0xc8200e2300, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:507 +0x3a1
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).compact(0xc82019b530, 0xc82000db00, 0xc8208b41c0, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:473 +0x4fe
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).CompactFull(0xc82019b530, 0xc8208b41c0, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:478 +0x71
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMFull.func1(0xc820218e70, 0xc82011fad0, 0x0, 0xc8208b41c0, 0x2, 0x2)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:754 +0x4c6
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMFull
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:772 +0x27e

@daviesalex
Copy link
Contributor

daviesalex commented May 16, 2016

(replying to comment in # #6595)

FWIW, we got this moving from 0.12.1-1 to 0.14.0~n201605160800. Further up (per @jwilder) we have this in the log:

[tsm1] 2016/05/16 16:38:25 beginning full compaction of group 0, 2 TSM files

Full log at https://gist.github.com/daviesalex/36f3fe30a433dd6a27b2f60cf1598e7f

Panic for us:

[retention] 2016/05/16 16:40:58 Starting retention policy enforcement service with check interval of 10m0s
[udp] 2016/05/16 16:40:58 Started listening on UDP: :8089
[udp] 2016/05/16 16:40:58 Started listening on UDP: :8090
[run] 2016/05/16 16:40:58 Listening for signals
panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.IntegerValue

goroutine 1852 [running]:
panic(0xb76bc0, 0xc936def740)
    /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/influxdata/influxdb/tsdb/engine/tsm1.encodeIntegerBlock(0x0, 0x0, 0x0, 0xc95ec7c000, 0x2ff, 0x400, 0x0, 0x0, 0x0, 0x0, ...)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/encoding.go:771 +0x72b
github.com/influxdata/influxdb/tsdb/engine/tsm1.Values.Encode(0xc95ec7c000, 0x2ff, 0x400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/encoding.go:195 +0x22c
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).chunk(0xc8ca12ef00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:992 +0x4af
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).combine(0xc8ca12ef00, 0xc986cd9400, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:965 +0xd0a
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).merge(0xc8ca12ef00)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:845 +0x13f
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).Next(0xc8ca12ef00, 0xc83e5d0531)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:813 +0xbe5
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).write(0xc822aa8ab0, 0xc926539630, 0x4a, 0x7fe690183820, 0xc8ca12ef00, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:560 +0x53b
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).writeNewFiles(0xc822aa8ab0, 0x212, 0x5, 0x7fe690183820, 0xc8ca12ef00, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:507 +0x3a1
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).compact(0xc822aa8ab0, 0xc8222e2500, 0xc9a352dfc0, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:473 +0x4fe
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).CompactFull(0xc822aa8ab0, 0xc9a352dfc0, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:478 +0x71
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMFull.func1(0xc966b63f60, 0xc820124580, 0x0, 0xc9a352dfc0, 0x2, 0x2)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:760 +0x4c6
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMFull
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:778 +0x27e

I am testing a downgrade from 0.14.0~n201605160800 to 0.13 stable. Thats starting now, will report back...

@jwilder
Copy link
Contributor

jwilder commented May 16, 2016

@daviesalex Would you be able to send me those two files?

[tsm1] 2016/05/16 16:38:25 compacting full group (0) /data/influxdb-data/metrics/tg_udp/default/154/000000512-000000005.tsm (#0)
[tsm1] 2016/05/16 16:38:25 compacting full group (0) /data/influxdb-data/metrics/tg_udp/default/154/000000530-000000004.tsm (#1)

@daviesalex
Copy link
Contributor

Interestingly, those files do not exist:

[root@carf-metrics-influx01 ~]# ls -l /data/influxdb-data/metrics/tg_udp/default/154/
total 16072324
-rw-r--r-- 1 influxdb influxdb 2148847001 May 15 22:29 000000080-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  143393769 May 15 22:29 000000080-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148845114 May 16 01:34 000000160-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  146913016 May 16 01:34 000000160-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148849200 May 16 04:39 000000240-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  157682907 May 16 04:39 000000240-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148853218 May 16 07:44 000000320-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  160693208 May 16 07:44 000000320-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148793242 May 16 10:48 000000400-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  225583528 May 16 10:49 000000400-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148772801 May 16 13:53 000000480-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  245320640 May 16 13:54 000000480-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 1529095578 May 16 17:14 000000530-000000005.tsm
-rw-r--r-- 1 influxdb influxdb  586820540 May 16 16:23 000000536-000000003.tsm
-rw-r--r-- 1 influxdb influxdb  369536724 May 16 16:39 000000540-000000002.tsm
[root@carf-metrics-influx01 ~]#
[root@carf-metrics-influx01 ~]#
[root@carf-metrics-influx01 ~]# ls -l /data/influxdb-data/metrics/tg_udp/default/154/000000512-000000005.tsm
ls: cannot access /data/influxdb-data/metrics/tg_udp/default/154/000000512-000000005.tsm: No such file or directory
[root@carf-metrics-influx01 ~]# ls -l /data/influxdb-data/metrics/tg_udp/default/154/
total 16072324
-rw-r--r-- 1 influxdb influxdb 2148847001 May 15 22:29 000000080-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  143393769 May 15 22:29 000000080-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148845114 May 16 01:34 000000160-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  146913016 May 16 01:34 000000160-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148849200 May 16 04:39 000000240-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  157682907 May 16 04:39 000000240-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148853218 May 16 07:44 000000320-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  160693208 May 16 07:44 000000320-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148793242 May 16 10:48 000000400-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  225583528 May 16 10:49 000000400-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 2148772801 May 16 13:53 000000480-000000006.tsm
-rw-r--r-- 1 influxdb influxdb  245320640 May 16 13:54 000000480-000000007.tsm
-rw-r--r-- 1 influxdb influxdb 1529095578 May 16 17:14 000000530-000000005.tsm
-rw-r--r-- 1 influxdb influxdb  586820540 May 16 16:23 000000536-000000003.tsm
-rw-r--r-- 1 influxdb influxdb  369536724 May 16 16:39 000000540-000000002.tsm
[root@carf-metrics-influx01 ~]#

@daviesalex
Copy link
Contributor

Reverting back to 0.13 stable starts, no panic, and shows us all our data.

This is a nasty bug in the nightly - do you want a third ticket?

@jwilder
Copy link
Contributor

jwilder commented May 17, 2016

@easyrasta @daviesalex I created a new issue for the interface conversion panic: #6652

The latest nightly reverts the commit that introduced that panic.

@jwilder jwilder self-assigned this May 19, 2016
@jwilder jwilder modified the milestones: 1.0.0 beta, 1.0.0 May 26, 2016
@jwilder jwilder removed their assignment Jun 27, 2016
@jwilder jwilder self-assigned this Jul 15, 2016
jwilder added a commit that referenced this issue Jul 22, 2016
Normally, compactions do not conflict on the files they are compacting.
If the full cold threshold is set very low, it can cause conflicts where
two compactions compact the same files.  The full compaction was the
only place this could happen as it's planning is greedy.

To make this safer for concurrent execution, the compaction tracks which
files are current being compacted and prevents any new compactions from
starting if the file set overlaps.

Fixes #6595
jwilder added a commit that referenced this issue Jul 26, 2016
Normally, compactions do not conflict on the files they are compacting.
If the full cold threshold is set very low, it can cause conflicts where
two compactions compact the same files.  The full compaction was the
only place this could happen as it's planning is greedy.

To make this safer for concurrent execution, the compaction tracks which
files are current being compacted and prevents any new compactions from
starting if the file set overlaps.

Fixes #6595
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants