-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.9.5-nightly] panic: runtime error: index out of range when compiled for ARMv5 #4218
Comments
@tfalencar - are there data underneath this system? Are you trying to upgrade? Any ideas @benbjohnson ? |
Thank you for your attention @otoolep. No data or upgrade. This is a 'clean/new' setup. Please check output of the tests command that I've added (at: https://gist.github.com/tfalencar/a9c326a39359b91dce90), I think this should help track down the issue. Oh, and to cross compile Go 1.5 I used the following command, just in case one cannot have access to go 1.5 binary in the arm platform: |
A bit more info: Interestingly, running 'go test -run=TestDatabase . -v' results: file /root/gocodez/bin/influxd: ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically linked (uses shared libs), not stripped so I went to see what are the dependencies: readelf -d /root/gocodez/bin/influxd then checking what the arm linux has: find / -name 'libpthread*' find / -name 'libc.so*' file /lib/arm-linux-gnueabi/libc.so.6 file /lib/arm-linux-gnueabi/libc-2.13.so Quite a 'shot in the dark', but seeing the libc compiled for 2.6, I'm wondering if its related to this go bug: golang/go#5466 However, the arm kernel here is quite recent: uname -v Anyone has any tips on how I could debug the influxDB tests using gdb? |
@toddboom any insights from your own ARM builds? @rossmcdonald anything jump out at you? |
As far as I can tell (by looking in the stack trace etc) this is originating from a memory issue on BoltDB for ARM. |
BoltDB uses "unsafe" package in multiple locations and I believe this is what is causing the problems. I will create another issue in BoltDB project and reference it here. It would be great if anybody else could let us know of a successful build to help track the regression ? See my comment here: boltdb/bolt#327 |
I did a git checkout v0.9.2, and compiled again with go build ./...; go install ./..., ran the newly built influxd run, and the same problem occurs (same stack trace). |
@tfalencar Thanks for keeping us updated. If the issue is with boltdb, then I'm not sure if any of the current 0.9.x releases will work. On the bright side, we are in the process of rewriting the storage backend with a custom engine which should land in master sometime in the next couple of weeks. This should avoid the boltdb dependency altogether, which should hopefully resolve the issue you're seeing. |
thanks for the feedback @rossmcdonald. I'm looking forward to the new release. If there's any other info or way I could help just let me know. |
I also got this error, not on ARM, but on x86. The issue occurred while trying to import a bunch of data from influx 0.8 into a database that already existed (had been writing to it for a while as part of the recommended 0.8 -> 0.9 upgrade process). It appeared to be specific to a certain range of data — excluding that range from the import allowed the import to continue. My suspicion is that a db shard managed to become corrupted somehow, and boltdb handled it badly. Blowing away the database and re-importing the same data did not cause the issue to recur. |
@tfalencar The new storage engine release has been committed to master. It's disabled by default, but can be enabled following the instructions here: https://influxdb.com/docs/v0.9/introduction/tsm_installation.html It would be great to see if you are still hitting the same issue without BoltDB getting in the way. @nornagon Thanks for letting us know. It sounds like BoltDB has some instability problems, which is one of the reasons for the new storage engine. |
@rossmcdonald I have just compiled influxdb on ARM following the instructions here: https://github.com/influxdb/influxdb/blob/master/CONTRIBUTING.md After changing the config file, in the [data] section to engine = "tsm1" I still get the very same error upon first startup, i.e. it seems BoltDB is still in use. Any ideas how to further force tsm1? Here is the config and the output: |
Did you blow away any existing databases? BoltDB is also used as storage by the distributed consensus system. On Sunday, October 11, 2015, a-bali notifications@github.com wrote:
|
@otoolep It is a fresh install, influxdb's directory does not even exist before running. |
OK, the trace seen above is in the storage for distributed consensus. Changing the engine will have no effect on that issue. @benbjohnson may have a better idea if this can be resolved. |
Sending to @benbjohnson for comment. |
Is it possible to at least disable this distributed consensus feature? |
Disabling the code that manages consensus is not possible. |
Not without very significant work. |
@a-bali The consensus feature can't currently be disabled. re: the original panic, it's trying to access page |
Hello, I have been experiencing this issue with the same scenario on 0.9.3 and 0.9.5, with both engines bz1 and tsm1.
The series contain aprox. 4.5Mio datapoints.
Thanks |
I just wanted to mention that after checking tsdb issues, I upgraded to 0.9.6 and the issue is now gone. |
Thanks for the update, @aglagla |
@tfalencar is this issue still present for you with 0.9.6? |
Hello @beckettsean . Thanks for following up on this; We ended up switching to non-arm platform for Influxdb and �at this time I'm not with the hardware; But if you believe the problem is solved please feel free to close the ticket and I'll report again in case it presents any problem in my future tests. Thanks again! |
This is just to confirm that I have the same issue with the current version - see log here. |
AFAIK, current version is 0.9.6.1 and fixed a similar issue on AMD64 Rgds, Alexis
|
Closing since this code no longer exists sinc |
@jwilder I'm not sure if it's relevant any more if you've removed code relating to this issue, however I believe I got to the bottom of the odd ARMv5 behaviour in boltdb - boltdb/bolt#578 - thought you might like to know anyway! :) |
Hello,
Please bear with me as I never programmed in go; So maybe the solution for this is really simple..
I've compiled influxDB from sources twice, once for x86_64, and another time for armv5, both based on the following latest commit:
commit 8f4b354 Date: Tue Sep 22 21:52:19 2015 -0700
While I had no problems with x86_64, I'm getting a 'panic: runtime error: index out of range' when trying to start the service (more details below) on the armv5 platform.
Here is the output of x86-64 tests: https://gist.github.com/tfalencar/8bf760702c120511f288
Here is the output of armv5 tests: https://gist.github.com/tfalencar/a9c326a39359b91dce90
Comparing the two outputs we see tests failing in different points for the armv5.
Details of the environment which succeeds:
~/gocodez/bin$ uname -a
Linux thiago-ThinkPad-T520 3.19.0-28-generic #30-Ubuntu SMP Mon Aug 31 15:52:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
~/go$ go version
go version go1.5 linux/amd64
~/gocodez/bin/influxd version
InfluxDB v0.9 (git: unknown unknown)
Details of the environment which fails:
~/gocodez/bin$ uname -a
8 Thu Jun 25 15:31:05 CEST 2015 armv5tejl GNU/Linux
go version go1.5 linux/arm
~/gocodez/bin/influxd version
InfluxDB v0.9 (git: unknown unknown)
The same commit, but compiled for armv5 (using go 1.5 armv5, with host and target the same during influxdb compilation) throws error (complete output at bottom) when attempting to start the database service (albeit no errors during compilation as well).
The path I took for compiling is similar but not identical to the one described here:
https://www.kuerbis.org/2015/03/influxdb-0-9-auf-dem-raspberry-pi-installieren/
Differently from the article I cross compiled go1.5 myself (using bootstrap go1.4). The other difference is that for now I'm using root user to run the service. Lastly, the source code I used is a bit newer (in the article 0.9.0-rc16 was used). I followed the compilation explanation from CONTRIBUTING from InfluxDB as well.
The complete output of the error when attempting to ./influxdb start is below (also in this gist: https://gist.github.com/tfalencar/686060b464c8ebc7514b)
Any ideas what could be causing this?
2015/09/24 11:47:10 InfluxDB starting, version 0.9, branch unknown, commit unknown
2015/09/24 11:47:10 Go version go1.5, GOMAXPROCS set to 1
2015/09/24 11:47:10 no configuration provided, using default settings
[metastore] 2015/09/24 11:47:10 Using data dir: /root/.influxdb/meta
panic: runtime error: index out of range
goroutine 1 [running]:
github.com/boltdb/bolt.(_Bucket).pageNode(0x10cfde80, 0x73676f00, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/boltdb/bolt/bucket.go:693 +0x2f8
github.com/boltdb/bolt.(_Cursor).Last(0x10c3f6ac, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/boltdb/bolt/cursor.go:51 +0xb4
github.com/hashicorp/raft-boltdb.(_BoltStore).LastIndex(0x10cfe600, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/hashicorp/raft-boltdb/bolt_store.go:108 +0x178
github.com/hashicorp/raft.NewRaft(0x10c10a80, 0xb637bea0, 0x10c76180, 0xb637bec0, 0x10cfe600, 0xb637bef0, 0x10cfe600, 0xb637bf18, 0x10cfe730, 0xb637bdf8, ...)
/root/gocodez/src/github.com/hashicorp/raft/raft.go:181 +0x148
github.com/influxdb/influxdb/meta.(_localRaft).open(0x10d04180, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/state.go:170 +0xfd8
github.com/influxdb/influxdb/meta.(_Store).openRaft(0x10c76180, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/store.go:418 +0x48
github.com/influxdb/influxdb/meta.(_Store).Open.func1(0x10c76180, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/store.go:217 +0x1e8
github.com/influxdb/influxdb/meta.(_Store).Open(0x10c76180, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/store.go:196 +0x228
github.com/influxdb/influxdb/cmd/influxd/run.(_Server).Open.func1(0x10c4eea0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:350 +0x94c
github.com/influxdb/influxdb/cmd/influxd/run.(_Server).Open(0x10c4eea0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:309 +0x28
github.com/influxdb/influxdb/cmd/influxd/run.(_Command).Run(0x10c42dc0, 0x10c0a108, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/run/command.go:114 +0xa58
main.(*Main).Run(0x10c3ff80, 0x10c0a108, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/main.go:81 +0x65c
main.main()
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/main.go:42 +0x32c
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/home/thiago/goarm/src/runtime/asm_arm.s:1036 +0x4
goroutine 5 [syscall]:
os/signal.loop()
/home/thiago/goarm/src/os/signal/signal_unix.go:22 +0x14
created by os/signal.init.1
/home/thiago/goarm/src/os/signal/signal_unix.go:28 +0x30
goroutine 8 [IO wait]:
net.runtime_pollWait(0xb637bbf0, 0x72, 0x10c0a0b0)
/home/thiago/goarm/src/runtime/netpoll.go:157 +0x60
net.(_pollDesc).Wait(0x10c435b8, 0x72, 0x0, 0x0)
/root/go_arm/src/net/fd_poll_runtime.go:73 +0x34
net.(_pollDesc).WaitRead(0x10c435b8, 0x0, 0x0)
/root/go_arm/src/net/fd_poll_runtime.go:78 +0x30
net.(_netFD).accept(0x10c43580, 0x0, 0xb637bc88, 0x10cfe5b0)
/root/go_arm/src/net/fd_unix.go:408 +0x21c
net.(_TCPListener).AcceptTCP(0x10c1f170, 0x10c44570, 0x0, 0x0)
/root/go_arm/src/net/tcpsock_posix.go:254 +0x4c
net.(_TCPListener).Accept(0x10c1f170, 0x0, 0x0, 0x0, 0x0)
/root/go_arm/src/net/tcpsock_posix.go:264 +0x34
github.com/influxdb/influxdb/tcp.(_Mux).Serve(0x10d040c0, 0xb637bd18, 0x10c1f170, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/tcp/mux.go:48 +0x30
created by github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open.func1
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:347 +0x93c
goroutine 9 [chan receive]:
github.com/influxdb/influxdb/tcp.(_listener).Accept(0x10c1f178, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x5c
github.com/influxdb/influxdb/meta.(_raftLayer).Accept(0x10cfda00, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/store.go:2059 +0x54
github.com/hashicorp/raft.(*NetworkTransport).listen(0x10c147d0)
/root/gocodez/src/github.com/hashicorp/raft/net_transport.go:346 +0x50
created by github.com/hashicorp/raft.NewNetworkTransport
/root/gocodez/src/github.com/hashicorp/raft/net_transport.go:138 +0x274
The text was updated successfully, but these errors were encountered: