Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.5-nightly] panic: runtime error: index out of range when compiled for ARMv5 #4218

Closed
tfalencar opened this issue Sep 24, 2015 · 29 comments
Assignees
Labels

Comments

@tfalencar
Copy link

Hello,

Please bear with me as I never programmed in go; So maybe the solution for this is really simple..

I've compiled influxDB from sources twice, once for x86_64, and another time for armv5, both based on the following latest commit:

commit 8f4b354 Date: Tue Sep 22 21:52:19 2015 -0700

While I had no problems with x86_64, I'm getting a 'panic: runtime error: index out of range' when trying to start the service (more details below) on the armv5 platform.

Here is the output of x86-64 tests: https://gist.github.com/tfalencar/8bf760702c120511f288
Here is the output of armv5 tests: https://gist.github.com/tfalencar/a9c326a39359b91dce90

Comparing the two outputs we see tests failing in different points for the armv5.

Details of the environment which succeeds:


~/gocodez/bin$ uname -a
Linux thiago-ThinkPad-T520 3.19.0-28-generic #30-Ubuntu SMP Mon Aug 31 15:52:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
~/go$ go version
go version go1.5 linux/amd64
~/gocodez/bin/influxd version
InfluxDB v0.9 (git: unknown unknown)


Details of the environment which fails:


~/gocodez/bin$ uname -a
8 Thu Jun 25 15:31:05 CEST 2015 armv5tejl GNU/Linux
go version go1.5 linux/arm
~/gocodez/bin/influxd version
InfluxDB v0.9 (git: unknown unknown)


The same commit, but compiled for armv5 (using go 1.5 armv5, with host and target the same during influxdb compilation) throws error (complete output at bottom) when attempting to start the database service (albeit no errors during compilation as well).

The path I took for compiling is similar but not identical to the one described here:
https://www.kuerbis.org/2015/03/influxdb-0-9-auf-dem-raspberry-pi-installieren/

Differently from the article I cross compiled go1.5 myself (using bootstrap go1.4). The other difference is that for now I'm using root user to run the service. Lastly, the source code I used is a bit newer (in the article 0.9.0-rc16 was used). I followed the compilation explanation from CONTRIBUTING from InfluxDB as well.

The complete output of the error when attempting to ./influxdb start is below (also in this gist: https://gist.github.com/tfalencar/686060b464c8ebc7514b)

Any ideas what could be causing this?


2015/09/24 11:47:10 InfluxDB starting, version 0.9, branch unknown, commit unknown
2015/09/24 11:47:10 Go version go1.5, GOMAXPROCS set to 1
2015/09/24 11:47:10 no configuration provided, using default settings
[metastore] 2015/09/24 11:47:10 Using data dir: /root/.influxdb/meta
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/boltdb/bolt.(_Bucket).pageNode(0x10cfde80, 0x73676f00, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/boltdb/bolt/bucket.go:693 +0x2f8
github.com/boltdb/bolt.(_Cursor).Last(0x10c3f6ac, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/boltdb/bolt/cursor.go:51 +0xb4
github.com/hashicorp/raft-boltdb.(_BoltStore).LastIndex(0x10cfe600, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/hashicorp/raft-boltdb/bolt_store.go:108 +0x178
github.com/hashicorp/raft.NewRaft(0x10c10a80, 0xb637bea0, 0x10c76180, 0xb637bec0, 0x10cfe600, 0xb637bef0, 0x10cfe600, 0xb637bf18, 0x10cfe730, 0xb637bdf8, ...)
/root/gocodez/src/github.com/hashicorp/raft/raft.go:181 +0x148
github.com/influxdb/influxdb/meta.(_localRaft).open(0x10d04180, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/state.go:170 +0xfd8
github.com/influxdb/influxdb/meta.(_Store).openRaft(0x10c76180, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/store.go:418 +0x48
github.com/influxdb/influxdb/meta.(_Store).Open.func1(0x10c76180, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/store.go:217 +0x1e8
github.com/influxdb/influxdb/meta.(_Store).Open(0x10c76180, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/store.go:196 +0x228
github.com/influxdb/influxdb/cmd/influxd/run.(_Server).Open.func1(0x10c4eea0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:350 +0x94c
github.com/influxdb/influxdb/cmd/influxd/run.(_Server).Open(0x10c4eea0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:309 +0x28
github.com/influxdb/influxdb/cmd/influxd/run.(_Command).Run(0x10c42dc0, 0x10c0a108, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/run/command.go:114 +0xa58
main.(*Main).Run(0x10c3ff80, 0x10c0a108, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/main.go:81 +0x65c
main.main()
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/main.go:42 +0x32c

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/home/thiago/goarm/src/runtime/asm_arm.s:1036 +0x4
goroutine 5 [syscall]:
os/signal.loop()
/home/thiago/goarm/src/os/signal/signal_unix.go:22 +0x14
created by os/signal.init.1
/home/thiago/goarm/src/os/signal/signal_unix.go:28 +0x30

goroutine 8 [IO wait]:
net.runtime_pollWait(0xb637bbf0, 0x72, 0x10c0a0b0)
/home/thiago/goarm/src/runtime/netpoll.go:157 +0x60
net.(_pollDesc).Wait(0x10c435b8, 0x72, 0x0, 0x0)
/root/go_arm/src/net/fd_poll_runtime.go:73 +0x34
net.(_pollDesc).WaitRead(0x10c435b8, 0x0, 0x0)
/root/go_arm/src/net/fd_poll_runtime.go:78 +0x30
net.(_netFD).accept(0x10c43580, 0x0, 0xb637bc88, 0x10cfe5b0)
/root/go_arm/src/net/fd_unix.go:408 +0x21c
net.(_TCPListener).AcceptTCP(0x10c1f170, 0x10c44570, 0x0, 0x0)
/root/go_arm/src/net/tcpsock_posix.go:254 +0x4c
net.(_TCPListener).Accept(0x10c1f170, 0x0, 0x0, 0x0, 0x0)
/root/go_arm/src/net/tcpsock_posix.go:264 +0x34
github.com/influxdb/influxdb/tcp.(_Mux).Serve(0x10d040c0, 0xb637bd18, 0x10c1f170, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/tcp/mux.go:48 +0x30
created by github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open.func1
/root/gocodez/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:347 +0x93c

goroutine 9 [chan receive]:
github.com/influxdb/influxdb/tcp.(_listener).Accept(0x10c1f178, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x5c
github.com/influxdb/influxdb/meta.(_raftLayer).Accept(0x10cfda00, 0x0, 0x0, 0x0, 0x0)
/root/gocodez/src/github.com/influxdb/influxdb/meta/store.go:2059 +0x54
github.com/hashicorp/raft.(*NetworkTransport).listen(0x10c147d0)
/root/gocodez/src/github.com/hashicorp/raft/net_transport.go:346 +0x50
created by github.com/hashicorp/raft.NewNetworkTransport
/root/gocodez/src/github.com/hashicorp/raft/net_transport.go:138 +0x274

@otoolep
Copy link
Contributor

otoolep commented Sep 24, 2015

@tfalencar - are there data underneath this system? Are you trying to upgrade?

Any ideas @benbjohnson ?

@tfalencar
Copy link
Author

Thank you for your attention @otoolep. No data or upgrade. This is a 'clean/new' setup. Please check output of the tests command that I've added (at: https://gist.github.com/tfalencar/a9c326a39359b91dce90), I think this should help track down the issue.

Oh, and to cross compile Go 1.5 I used the following command, just in case one cannot have access to go 1.5 binary in the arm platform:
GOROOT_BOOTSTRAP=$HOME/go1.4 GOOS=linux GOARCH=arm GOARM=5 ./all.bash

@tfalencar tfalencar changed the title panic: runtime error: index out of range (v0.9 compiled for ARM) panic: runtime error: index out of range (v0.9.5 compiled for ARMv5) Sep 25, 2015
@tfalencar
Copy link
Author

A bit more info:

Interestingly, running 'go test -run=TestDatabase . -v' results:
PASS
ok github.com/influxdb/influxdb 0.800s

file /root/gocodez/bin/influxd: ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically linked (uses shared libs), not stripped

so I went to see what are the dependencies:

readelf -d /root/gocodez/bin/influxd
[...]
0x00000001 (NEEDED) Shared library: [libpthread.so.0]
0x00000001 (NEEDED) Shared library: [libc.so.6]

then checking what the arm linux has:

find / -name 'libpthread*'
/lib/arm-linux-gnueabi/libpthread.so.0
/lib/arm-linux-gnueabi/libpthread-2.13.so
/usr/lib/arm-linux-gnueabi/libpthread.so
/usr/lib/arm-linux-gnueabi/libpthread.a
/usr/lib/arm-linux-gnueabi/libpthread_nonshared.a

find / -name 'libc.so*'
/lib/arm-linux-gnueabi/libc.so.6
/usr/lib/arm-linux-gnueabi/libc.so

file /lib/arm-linux-gnueabi/libc.so.6
/lib/arm-linux-gnueabi/libc.so.6: symbolic link to `libc-2.13.so'

file /lib/arm-linux-gnueabi/libc-2.13.so
/lib/arm-linux-gnueabi/libc-2.13.so: ELF 32-bit LSB shared object, ARM, version 1 (SYSV), dynamically linked (uses shared libs), BuildID[sha1]=0x7bff39dd847218e07f750d9dee8d32ce1c18b1ae, for GNU/Linux 2.6.26, stripped

Quite a 'shot in the dark', but seeing the libc compiled for 2.6, I'm wondering if its related to this go bug: golang/go#5466

However, the arm kernel here is quite recent: uname -v
#8 Thu Jun 25 15:31:05 CEST 2015
uname -r
4.0.0-axo-G02.2

Anyone has any tips on how I could debug the influxDB tests using gdb?

@beckettsean
Copy link
Contributor

@toddboom any insights from your own ARM builds?

@rossmcdonald anything jump out at you?

@beckettsean beckettsean changed the title panic: runtime error: index out of range (v0.9.5 compiled for ARMv5) [0.9.5-nightly] panic: runtime error: index out of range when compiled for ARMv5 Sep 25, 2015
@tfalencar
Copy link
Author

As far as I can tell (by looking in the stack trace etc) this is originating from a memory issue on BoltDB for ARM.

@tfalencar
Copy link
Author

BoltDB uses "unsafe" package in multiple locations and I believe this is what is causing the problems. I will create another issue in BoltDB project and reference it here. It would be great if anybody else could let us know of a successful build to help track the regression ?

See my comment here: boltdb/bolt#327

@tfalencar
Copy link
Author

I did a git checkout v0.9.2, and compiled again with go build ./...; go install ./..., ran the newly built influxd run, and the same problem occurs (same stack trace).
Also tried reverting to an earlier version of boltdb (commit e929eba364), same results.

@rossmcdonald
Copy link
Contributor

@tfalencar Thanks for keeping us updated. If the issue is with boltdb, then I'm not sure if any of the current 0.9.x releases will work. On the bright side, we are in the process of rewriting the storage backend with a custom engine which should land in master sometime in the next couple of weeks. This should avoid the boltdb dependency altogether, which should hopefully resolve the issue you're seeing.

@tfalencar
Copy link
Author

thanks for the feedback @rossmcdonald. I'm looking forward to the new release. If there's any other info or way I could help just let me know.

@rossmcdonald rossmcdonald self-assigned this Sep 30, 2015
@nornagon
Copy link

nornagon commented Oct 2, 2015

I also got this error, not on ARM, but on x86. The issue occurred while trying to import a bunch of data from influx 0.8 into a database that already existed (had been writing to it for a while as part of the recommended 0.8 -> 0.9 upgrade process).

It appeared to be specific to a certain range of data — excluding that range from the import allowed the import to continue. My suspicion is that a db shard managed to become corrupted somehow, and boltdb handled it badly. Blowing away the database and re-importing the same data did not cause the issue to recur.

@rossmcdonald
Copy link
Contributor

@tfalencar The new storage engine release has been committed to master. It's disabled by default, but can be enabled following the instructions here:

https://influxdb.com/docs/v0.9/introduction/tsm_installation.html

It would be great to see if you are still hitting the same issue without BoltDB getting in the way.

@nornagon Thanks for letting us know. It sounds like BoltDB has some instability problems, which is one of the reasons for the new storage engine.

@a-bali
Copy link

a-bali commented Oct 11, 2015

@rossmcdonald I have just compiled influxdb on ARM following the instructions here:

https://github.com/influxdb/influxdb/blob/master/CONTRIBUTING.md

After changing the config file, in the [data] section to engine = "tsm1" I still get the very same error upon first startup, i.e. it seems BoltDB is still in use. Any ideas how to further force tsm1?

Here is the config and the output:
https://gist.github.com/a-bali/f4a73b4c49e415fbe263

@otoolep
Copy link
Contributor

otoolep commented Oct 11, 2015

Did you blow away any existing databases?

BoltDB is also used as storage by the distributed consensus system.

On Sunday, October 11, 2015, a-bali notifications@github.com wrote:

@rossmcdonald https://github.com/rossmcdonald I have just compiled
influxdb on ARM following the instructions here:

https://github.com/influxdb/influxdb/blob/master/CONTRIBUTING.md

After changing the config file, in the [data] section to engine = "tsm1" I
still get the very same error upon first startup, i.e. it seems BoltDB is
still in use. Any ideas how to further force tsm1?


Reply to this email directly or view it on GitHub
#4218 (comment).

@a-bali
Copy link

a-bali commented Oct 11, 2015

@otoolep It is a fresh install, influxdb's directory does not even exist before running.

@otoolep
Copy link
Contributor

otoolep commented Oct 11, 2015

OK, the trace seen above is in the storage for distributed consensus. Changing the engine will have no effect on that issue.

@benbjohnson may have a better idea if this can be resolved.

@otoolep
Copy link
Contributor

otoolep commented Oct 11, 2015

Sending to @benbjohnson for comment.

@a-bali
Copy link

a-bali commented Oct 16, 2015

Is it possible to at least disable this distributed consensus feature?

@otoolep
Copy link
Contributor

otoolep commented Oct 16, 2015

Disabling the code that manages consensus is not possible.

@otoolep
Copy link
Contributor

otoolep commented Oct 16, 2015

Not without very significant work.

@benbjohnson
Copy link
Contributor

@a-bali The consensus feature can't currently be disabled.

re: the original panic, it's trying to access page 0x73676f00 which would indicate a database size of ~8GB (assuming a 4KB page size). I'll take a look at boltdb/bolt#327 on my Raspberry Pi and see if I can find the underlying issue.

@jwilder jwilder added the panic label Nov 5, 2015
@aglagla
Copy link

aglagla commented Jan 7, 2016

Hello,

I have been experiencing this issue with the same scenario on 0.9.3 and 0.9.5, with both engines bz1 and tsm1.
I get this "index out of range" error (and crash) every time I run a query such as :

select field1 / field2 from series where tag=...

The series contain aprox. 4.5Mio datapoints.
I wonder if this has any relationship with the fact that I run the query shortly after loading the data, while influxdb is compacting data and writing to indexes.
Extra info:

name: build
-----------
Branch  Build Time          Commit                      Version
0.9.5   2015-11-25T23:01:58+0000    9eab56311373ee6f788ae5dfc87e2240038f0eb4    0.9.5.1

name: runtime
-------------
GOARCH  GOMAXPROCS  GOOS    version
amd64   8       linux   go1.4.2

Thanks

@aglagla
Copy link

aglagla commented Jan 7, 2016

I just wanted to mention that after checking tsdb issues, I upgraded to 0.9.6 and the issue is now gone.
Thanks, and sorry for the noise.

@beckettsean
Copy link
Contributor

Thanks for the update, @aglagla

@beckettsean
Copy link
Contributor

@tfalencar is this issue still present for you with 0.9.6?

@tfalencar
Copy link
Author

Hello @beckettsean . Thanks for following up on this; We ended up switching to non-arm platform for Influxdb and �at this time I'm not with the hardware; But if you believe the problem is solved please feel free to close the ticket and I'll report again in case it presents any problem in my future tests. Thanks again!

@a-bali
Copy link

a-bali commented Jan 19, 2016

This is just to confirm that I have the same issue with the current version - see log here.

@aglagla
Copy link

aglagla commented Jan 19, 2016

AFAIK, current version is 0.9.6.1 and fixed a similar issue on AMD64

Rgds,

Alexis
On 19 Jan 2016 20:28, "a-bali" notifications@github.com wrote:

This is just to confirm that I have the same issue with the current
version - see log here https://gist.github.com/7d8ff589bf23e6d89c10.


Reply to this email directly or view it on GitHub
#4218 (comment)
.

@jwilder
Copy link
Contributor

jwilder commented May 6, 2016

Closing since this code no longer exists sinc 0.12.

@jwilder jwilder closed this as completed May 6, 2016
@lorenzo-stoakes
Copy link

@jwilder I'm not sure if it's relevant any more if you've removed code relating to this issue, however I believe I got to the bottom of the odd ARMv5 behaviour in boltdb - boltdb/bolt#578 - thought you might like to know anyway! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants