Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: 64bit unaligned in arm32 #577

Closed
nekohasekai opened this issue Oct 14, 2023 · 20 comments
Closed

panic: 64bit unaligned in arm32 #577

nekohasekai opened this issue Oct 14, 2023 · 20 comments
Labels

Comments

@nekohasekai
Copy link

Version: f3bb364

fatal error: sync: unlock of unlocked mutex

goroutine 11 [running]:
sync.fatal({0xb6e53f9e, 0x1e})
        $GOROOT/src/runtime/panic.go:1061 +0x24
sync.(*Mutex).unlockSlow(0x8e0a7034, 0xffffffff)
        $GOROOT/src/sync/mutex.go:229 +0x38
sync.(*Mutex).Unlock(0x8e0a7034)
        $GOROOT/src/sync/mutex.go:223 +0x44
go.etcd.io/bbolt.(*Tx).close(0x8e145cc0)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/tx.go:314 +0x1e4
go.etcd.io/bbolt.(*Tx).rollback(0x8e145cc0)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/tx.go:299 +0x13c
go.etcd.io/bbolt.(*DB).Update.func1()
        pkg/mod/go.etcd.io/bbolt@v0.0.0/db.go:858 +0x28
panic({0xb7d08ea8, 0xb7e7c630})
        $GOROOT/src/runtime/panic.go:914 +0x26c
runtime/internal/atomic.panicUnaligned()
        $GOROOT/src/runtime/internal/atomic/unaligned.go:8 +0x2c
runtime/internal/atomic.Xadd64(0x8e0a6fa4, 0x1)
        $GOROOT/src/runtime/internal/atomic/atomic_arm.s:258 +0x14
go.etcd.io/bbolt.(*TxStats).IncPageCount(...)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/tx.go:683
go.etcd.io/bbolt.(*TxStats).add(0x8e0a6fa4, 0x8e145cf0)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/tx.go:642 +0xec
go.etcd.io/bbolt.(*Tx).close(0x8e145cc0)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/tx.go:322 +0x25c
go.etcd.io/bbolt.(*Tx).Commit(0x8e145cc0)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/tx.go:230 +0x450
go.etcd.io/bbolt.(*DB).Update(0x8e0a6f20, 0x8e04cf90)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/db.go:873 +0x100
go.etcd.io/bbolt.(*batch).run(0x8e150e20)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/db.go:981 +0x110
sync.(*Once).doSlow(0x8e150e28, 0x8e04cfe4)
        $GOROOT/src/sync/once.go:74 +0xc0
sync.(*Once).Do(0x8e150e28, 0x8e04cfe4)
        $GOROOT/src/sync/once.go:65 +0x40
go.etcd.io/bbolt.(*batch).trigger(...)
        pkg/mod/go.etcd.io/bbolt@v0.0.0/db.go:963
created by time.goFunc
        $GOROOT/src/time/sleep.go:176 +0x30

Related code:

bbolt/tx.go

Lines 26 to 43 in ef06562

type Tx struct {
writable bool
managed bool
db *DB
meta *common.Meta
root Bucket
pages map[common.Pgid]*common.Page
stats TxStats
commitHandlers []func()
// WriteFlag specifies the flag for write-related methods like WriteTo().
// Tx opens the database file with the specified flag to copy the data.
//
// By default, the flag is unset, which works well for mostly in-memory
// workloads. For databases that are much larger than available RAM,
// set the flag to syscall.O_DIRECT to avoid trashing the page cache.
WriteFlag int
}

@ahrtr
Copy link
Member

ahrtr commented Oct 14, 2023

It should a known issue of golang. FYI. https://pkg.go.dev/sync/atomic#pkg-note-BUG

@ahrtr
Copy link
Member

ahrtr commented Oct 14, 2023

@nekohasekai the arch is arm32, what's the OS?

@nekohasekai
Copy link
Author

@ahrtr Nexus 7 2013 Android 7.1.2

@ahrtr
Copy link
Member

ahrtr commented Oct 14, 2023

@ahrtr Nexus 7 2013 Android 7.1.2

thx for the feedback. It seems a known issue.

On non-Linux ARM, the 64-bit functions use instructions unavailable before the ARMv6k core.

@nekohasekai
Copy link
Author

Do you have any plans to make it 64bit aligned? This problem does not exist in the stable version of bbolt, and there are a large number of users on these 32-bit devices.

@ahrtr
Copy link
Member

ahrtr commented Oct 14, 2023

Do you have any plans to make it 64bit aligned? This problem does not exist in the stable version of bbolt, and there are a large number of users on these 32-bit devices.

  • Can you reproduce this issue in v1.3.7?
  • I don't have such device at hand. Would you mind to find out starting from which commit on the master branch you see this issue?

@nekohasekai
Copy link
Author

It does not exist in 1.3.7.

Here is a simple way to reproduce this issue on Linux amd64 using qemu-user-static:

Code:

// cmd/bug/main.go

package main

import (
	"os"

	"go.etcd.io/bbolt"
)

func main() {
	db, _ := bbolt.Open("test.db", 0600, &bbolt.Options{})
	defer db.Close()
	defer os.RemoveAll("test.db")
	db.Batch(func(tx *bbolt.Tx) error {
		tx.CreateBucketIfNotExists([]byte("test"))
		return nil
	})
}
GOOS=linux GOARCH=arm go build ./cmd/bug
qemu-arm ./bug

@ahrtr
Copy link
Member

ahrtr commented Oct 14, 2023

It seems that it's related to #373

@nekohasekai
Copy link
Author

It is introduced in 8ebdaf8

@ahrtr
Copy link
Member

ahrtr commented Oct 20, 2023

Confirmed that it has nothing to do with #516. Instead, it's caused by #373.

After I partially reverted the change in #373 as below, then couldn't reproduce this issue anymore.

$ git diff
diff --git a/tx.go b/tx.go
index 8b86030..905960a 100644
--- a/tx.go
+++ b/tx.go
@@ -639,18 +639,18 @@ type TxStats struct {
 }
 
 func (s *TxStats) add(other *TxStats) {
-       s.IncPageCount(other.GetPageCount())
-       s.IncPageAlloc(other.GetPageAlloc())
-       s.IncCursorCount(other.GetCursorCount())
-       s.IncNodeCount(other.GetNodeCount())
-       s.IncNodeDeref(other.GetNodeDeref())
-       s.IncRebalance(other.GetRebalance())
-       s.IncRebalanceTime(other.GetRebalanceTime())
-       s.IncSplit(other.GetSplit())
-       s.IncSpill(other.GetSpill())
-       s.IncSpillTime(other.GetSpillTime())
-       s.IncWrite(other.GetWrite())
-       s.IncWriteTime(other.GetWriteTime())
+       s.PageCount += other.PageCount
+       s.PageAlloc += other.PageAlloc
+       s.CursorCount += other.CursorCount
+       s.NodeCount += other.NodeCount
+       s.NodeDeref += other.NodeDeref
+       s.Rebalance += other.Rebalance
+       s.RebalanceTime += other.RebalanceTime
+       s.Split += other.Split
+       s.Spill += other.Spill
+       s.SpillTime += other.SpillTime
+       s.Write += other.Write
+       s.WriteTime += other.WriteTime
 }

@ahrtr
Copy link
Member

ahrtr commented Oct 20, 2023

@nekohasekai could you try #582?

@nekohasekai
Copy link
Author

Confirm fixed.

@ahrtr
Copy link
Member

ahrtr commented Oct 21, 2023

It is introduced in 8ebdaf8

Eventually I understood the reason why the issue couldn't be reproduced even it still contains the change in #373. Because one field "filesz int" was removed in the commit 8ebdaf8, and somehow caused the first field PageCount in isn't 64bit aligned anymore.

Adding "filesz int" back can fix this issue, but we should think about a more reliable solution.

@nekohasekai
Copy link
Author

You can either migrate atomic calls to int64 variables to atomic.Int64 structs (requires a higher version of Golang, in older versions you can use the inefficient atomic.Value), or change the embedding method of the entire structure from struct to pointer.

@ahrtr
Copy link
Member

ahrtr commented Oct 21, 2023

Using atomic.Int64 can resolve this issue, but it will also makes the TxStats uncopyable (Note atomic.Int64 has a _ noCopy field). It also requires updating some other public methods, e.g. (*TxStats) Sub. It may have impact on user applications as well.

The #582 seems to be the simplest and safest solution. The only concern is it removes some public methods (see below), which were introduced in #373, and has already included in v1.3.7. They were added not long ago (about 9 months ago), and it might be fine. We need to add a breaking change item in the changelog though.

bbolt/tx.go

Lines 677 to 794 in 233156a

func (s *TxStats) GetPageCount() int64 {
return atomic.LoadInt64(&s.PageCount)
}
// IncPageCount increases PageCount atomically and returns the new value.
func (s *TxStats) IncPageCount(delta int64) int64 {
return atomic.AddInt64(&s.PageCount, delta)
}
// GetPageAlloc returns PageAlloc atomically.
func (s *TxStats) GetPageAlloc() int64 {
return atomic.LoadInt64(&s.PageAlloc)
}
// IncPageAlloc increases PageAlloc atomically and returns the new value.
func (s *TxStats) IncPageAlloc(delta int64) int64 {
return atomic.AddInt64(&s.PageAlloc, delta)
}
// GetCursorCount returns CursorCount atomically.
func (s *TxStats) GetCursorCount() int64 {
return atomic.LoadInt64(&s.CursorCount)
}
// IncCursorCount increases CursorCount atomically and return the new value.
func (s *TxStats) IncCursorCount(delta int64) int64 {
return atomic.AddInt64(&s.CursorCount, delta)
}
// GetNodeCount returns NodeCount atomically.
func (s *TxStats) GetNodeCount() int64 {
return atomic.LoadInt64(&s.NodeCount)
}
// IncNodeCount increases NodeCount atomically and returns the new value.
func (s *TxStats) IncNodeCount(delta int64) int64 {
return atomic.AddInt64(&s.NodeCount, delta)
}
// GetNodeDeref returns NodeDeref atomically.
func (s *TxStats) GetNodeDeref() int64 {
return atomic.LoadInt64(&s.NodeDeref)
}
// IncNodeDeref increases NodeDeref atomically and returns the new value.
func (s *TxStats) IncNodeDeref(delta int64) int64 {
return atomic.AddInt64(&s.NodeDeref, delta)
}
// GetRebalance returns Rebalance atomically.
func (s *TxStats) GetRebalance() int64 {
return atomic.LoadInt64(&s.Rebalance)
}
// IncRebalance increases Rebalance atomically and returns the new value.
func (s *TxStats) IncRebalance(delta int64) int64 {
return atomic.AddInt64(&s.Rebalance, delta)
}
// GetRebalanceTime returns RebalanceTime atomically.
func (s *TxStats) GetRebalanceTime() time.Duration {
return atomicLoadDuration(&s.RebalanceTime)
}
// IncRebalanceTime increases RebalanceTime atomically and returns the new value.
func (s *TxStats) IncRebalanceTime(delta time.Duration) time.Duration {
return atomicAddDuration(&s.RebalanceTime, delta)
}
// GetSplit returns Split atomically.
func (s *TxStats) GetSplit() int64 {
return atomic.LoadInt64(&s.Split)
}
// IncSplit increases Split atomically and returns the new value.
func (s *TxStats) IncSplit(delta int64) int64 {
return atomic.AddInt64(&s.Split, delta)
}
// GetSpill returns Spill atomically.
func (s *TxStats) GetSpill() int64 {
return atomic.LoadInt64(&s.Spill)
}
// IncSpill increases Spill atomically and returns the new value.
func (s *TxStats) IncSpill(delta int64) int64 {
return atomic.AddInt64(&s.Spill, delta)
}
// GetSpillTime returns SpillTime atomically.
func (s *TxStats) GetSpillTime() time.Duration {
return atomicLoadDuration(&s.SpillTime)
}
// IncSpillTime increases SpillTime atomically and returns the new value.
func (s *TxStats) IncSpillTime(delta time.Duration) time.Duration {
return atomicAddDuration(&s.SpillTime, delta)
}
// GetWrite returns Write atomically.
func (s *TxStats) GetWrite() int64 {
return atomic.LoadInt64(&s.Write)
}
// IncWrite increases Write atomically and returns the new value.
func (s *TxStats) IncWrite(delta int64) int64 {
return atomic.AddInt64(&s.Write, delta)
}
// GetWriteTime returns WriteTime atomically.
func (s *TxStats) GetWriteTime() time.Duration {
return atomicLoadDuration(&s.WriteTime)
}
// IncWriteTime increases WriteTime atomically and returns the new value.
func (s *TxStats) IncWriteTime(delta time.Duration) time.Duration {
return atomicAddDuration(&s.WriteTime, delta)
}

@ahrtr ahrtr added this to the v1.4.0 milestone Oct 21, 2023
@ahrtr ahrtr added the type/bug label Oct 21, 2023
@ahrtr ahrtr removed this from the v1.4.0 milestone Oct 21, 2023
@tjungblu
Copy link
Contributor

tjungblu commented Oct 23, 2023

It is introduced in 8ebdaf8

@ahrtr we could also re-arrange the struct a bit for those things to be aligned again. It's certainly safer than to worry about which part of the stats need to take a lock or not.

The variables that need to be aligned for this can be declared at the top of the struct, so it's not even a big change.

@ahrtr
Copy link
Member

ahrtr commented Oct 23, 2023

The variables that need to be aligned for this can be declared at the top of the struct, so it's not even a big change.

This is a good point. Refer to the second solution #584

@tjungblu
Copy link
Contributor

yep, good one and less risky :) Thanks!

@ahrtr
Copy link
Member

ahrtr commented Oct 23, 2023

thx @tjungblu

Verified that #584 fixed this issue using qemu-user-static. @nekohasekai would you mind to verify the PR in your environment?

@ahrtr
Copy link
Member

ahrtr commented Oct 27, 2023

Resolved in v1.3.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants