Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover from filestorage panic on corrupted DB #36840

Open
newly12 opened this issue Dec 16, 2024 · 3 comments
Open

recover from filestorage panic on corrupted DB #36840

newly12 opened this issue Dec 16, 2024 · 3 comments

Comments

@newly12
Copy link
Contributor

newly12 commented Dec 16, 2024

Component(s)

extension/storage/filestorage

Is your feature request related to a problem? Please describe.

we are using filelog receiver to collect logs and file_storage extension to persist read states, we have a large number of collectors running and occasionally we saw collectors are crashing while opening the DB, not sure why and how it ran into this state, at the mean time, should we add recover to prevent collector from crashing because of this?

(masked receiver name)

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb5f2a0]

goroutine 95 [running]:
go.etcd.io/bbolt.(*Bucket).Sequence(0x7eff026ea057?)
	go.etcd.io/bbolt@v1.3.11/bucket.go:346
go.etcd.io/bbolt.Compact.walk.func3.1({0x7eff026ea057, 0x15, 0x15}, 0x0)
	go.etcd.io/bbolt@v1.3.11/compact.go:94 +0x3c
go.etcd.io/bbolt.(*Tx).ForEach.func1({0x7eff026ea057, 0x15, 0x15}, {0x0?, 0x0?, 0x3fe0000000000000?})
	go.etcd.io/bbolt@v1.3.11/tx.go:131 +0x62
go.etcd.io/bbolt.(*Bucket).ForEach(0xffffffffffffffff?, 0xc001d1cd58)
	go.etcd.io/bbolt@v1.3.11/bucket.go:397 +0x90
go.etcd.io/bbolt.(*Tx).ForEach(0xc000c75f28?, 0xc001d1cda0?)
	go.etcd.io/bbolt@v1.3.11/tx.go:130 +0x45
go.etcd.io/bbolt.Compact.walk.func3(0x0?)
	go.etcd.io/bbolt@v1.3.11/compact.go:93 +0x33
go.etcd.io/bbolt.(*DB).View(0x3b9aca00?, 0xc001d1cea8)
	go.etcd.io/bbolt@v1.3.11/db.go:917 +0x72
go.etcd.io/bbolt.walk(...)
	go.etcd.io/bbolt@v1.3.11/compact.go:92
go.etcd.io/bbolt.Compact(0xc000954248, 0xc000c75d48, 0x10000)
	go.etcd.io/bbolt@v1.3.11/compact.go:21 +0x14c
github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage/filestorage.(*fileStorageClient).Compact(0xc000886d70, {0xc0000479b0?, 0xc001d1d3e8?}, 0x3b9aca00, 0x10000)
	github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage/filestorage@v0.112.0/client.go:198 +0x425
github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage/filestorage.(*localFileStorage).GetClient(0xc0009e8a80, {0x7c8e00?, 0xc00094cc80?}, 0x0?, {{{0xc000a9b920?, 0x82dc76e33?}}, {0xc000a9b928?, 0x0?}}, {0x0, 0x0})
	github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage/filestorage@v0.112.0/extension.go:82 +0x405
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.GetStorageClient({0x1b722b0, 0x29c2000}, {0x1b5c3c0?, 0xc0009f7160?}, 0xc001b3de20, {{{0xc000a9b920?, 0x0?}}, {0xc000a9b928?, 0x0?}})
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.112.1-0.20241030220000-2e60343e3f02/adapter/storage.go:29 +0x173
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).setStorageClient(0xc001c87420, {0x1b722b0?, 0x29c2000?}, {0x1b5c3c0?, 0xc0009f7160?})
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.112.1-0.20241030220000-2e60343e3f02/adapter/storage.go:34 +0x56
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).Start(0xc001c87420, {0x1b722b0, 0x29c2000}, {0x1b5c3c0, 0xc0009f7160})
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.112.1-0.20241030220000-2e60343e3f02/adapter/receiver.go:49 +0xaf

Describe the solution you'd like

recover on DB error

Describe alternatives you've considered

No response

Additional context

No response

@newly12 newly12 added enhancement New feature or request needs triage New item requiring triage labels Dec 16, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@VihasMakwana
Copy link
Contributor

@newly12 I agree that we should have a way to recover from this.
I'm keen to reproduce this. Is it possible to share the corrupted DB file?
If it contains confidential data, then can you share some hints to reproduce this? I can give that a shot

@VihasMakwana VihasMakwana removed the needs triage New item requiring triage label Dec 17, 2024
@newly12
Copy link
Contributor Author

newly12 commented Dec 18, 2024

The error I posted in description is for the compaction phase, I also find another example for opening DB, please check receiver_filelog_.tgz, it is sort of an empty and no sensitive data.

$ bbolt check receiver_filelog_
panic: freepages: failed to get all reachable pages (page 2: invalid type: freelist (stack: [2]))

goroutine 6 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
	/Users/xxx/go/pkg/mod/go.etcd.io/bbolt@v1.3.8/db.go:1202 +0x8c
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 1
	/Users/xxx/go/pkg/mod/go.etcd.io/bbolt@v1.3.8/db.go:1200 +0x13c

$ strings receiver_filelog_
default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants