Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Node stops syncing after to VM cache error #604

Closed
fabio-nukui opened this issue Nov 12, 2021 · 14 comments
Closed

[BUG] Node stops syncing after to VM cache error #604

fabio-nukui opened this issue Nov 12, 2021 · 14 comments
Labels
bug Something isn't working

Comments

@fabio-nukui
Copy link

Describe the bug

Same bug as reported in #599, using v0.5.11. Node has to be restarted multiple times a day

Snippets of the logs:

5:16AM INF executed block height=5263003 module=state num_invalid_txs=0 num_valid_txs=7
5:16AM INF commit synced commit=436F6D6D697449447B5B323530203233352034322034342031363920313337203139322031343920313237203939203435203238203636203830203139332031332031353120393120313533203132
3720313339203231322031363920313331203139322031353720313633203231372032382031323720313034203135325D3A3530344539427D
5:16AM INF committed state app_hash=FAEB2A2CA989C0957F632D1C4250C10D975B997F8BD4A983C09DA3D91C7F6898 height=5263003 module=state num_txs=7
5:16AM INF indexed block height=5263003 module=txindex
thread '<unnamed>' panicked at 'assertion failed: prev.start > max', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmer-engine-2.0.0/src/trap/frame_info.rs:232:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:256:43
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:256:43

...

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:25
6:43
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:25
6:43
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:25
6:43
5:16AM INF Timed out dur=4663.671183 height=5263005 module=consensus round=0 step=1
5:16AM INF received proposal module=consensus proposal={"Type":32,"block_id":{"hash":"945C10781FB8CBECB36AF34D2BAD7A126765656C2677E69B7C76272C47AA0EC9","parts":{"hash":"1FE1788A28D650297EB9B
28BA05A706A1D7FCC6EBF9054B2323623482E9AF505","total":3}},"height":5263005,"pol_round":-1,"round":0,"signature":"eqKIrKAYLdNN1/Sjs5SrG93Tqp+qxY+Sh74shjjuEzZrhcOqJaaWFtwTuURbmXcOB7lbTt0JQjoRVl
rvsJfrDw==","timestamp":"2021-11-12T05:16:54.265983122Z"}
5:16AM INF received complete proposal block hash=945C10781FB8CBECB36AF34D2BAD7A126765656C2677E69B7C76272C47AA0EC9 height=5263005 module=consensus
5:16AM ERR prevote step: ProposalBlock is invalid err="wrong Block.Header.AppHash.  Expected 23247F4D38DB13BD41389EFD52931612B7100716F60274AA4C8BB73421D66E4B, got DE07F8EE13585675D6931DC14EB
486F78EA0B09EF2C9F11FAA07BF725710EEDB" height=5263005 module=consensus round=0
5:16AM ERR CONSENSUS FAILURE!!! err="precommit step; +2/3 prevoted for an invalid block: wrong Block.Header.AppHash.  Expected 23247F4D38DB13BD41389EFD52931612B7100716F60274AA4C8BB73421D66E4
B, got DE07F8EE13585675D6931DC14EB486F78EA0B09EF2C9F11FAA07BF725710EEDB" module=consensus stack="goroutine 69788 [running]:\nruntime/debug.Stack(0xc05f93f370, 0x2280f00, 0xc116802140)\n\trun
time/debug/stack.go:24 +0x9f\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine.func2(0xc000c83880, 0x2a782a8)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.g
o:726 +0x5b\npanic(0x2280f00, 0xc116802140)\n\truntime/panic.go:965 +0x1b9\ngithub.com/tendermint/tendermint/consensus.(*State).enterPrecommit(0xc000c83880, 0x504e9d, 0x0)\n\tgithub.com/tend
ermint/tendermint@v0.34.13/consensus/state.go:1390 +0x1c4c\ngithub.com/tendermint/tendermint/consensus.(*State).addVote(0xc000c83880, 0xc1150e25a0, 0xc12243b0b0, 0x28, 0x2a7ab90, 0xc05f93fc0
8, 0x11accd9)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.go:2104 +0x131e\ngithub.com/tendermint/tendermint/consensus.(*State).tryAddVote(0xc000c83880, 0xc1150e25a0, 0xc1224
3b0b0, 0x28, 0xc114f97500, 0xc1150d2480, 0xc05b9c11c929a34d)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.go:1930 +0x56\ngithub.com/tendermint/tendermint/consensus.(*State).h
andleMsg(0xc000c83880, 0x2ca7b20, 0xc1150d43d8, 0xc12243b0b0, 0x28)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.go:838 +0x96d\ngithub.com/tendermint/tendermint/consensus.(*S
tate).receiveRoutine(0xc000c83880, 0x0)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.go:762 +0x3f2\ncreated by github.com/tendermint/tendermint/consensus.(*State).OnStart\n\t
github.com/tendermint/tendermint@v0.34.13/consensus/state.go:378 +0x8c5\n"

To Reproduce

Run terrad with heavy query loads (my application mostly decodes mempool transactions and queries contracts).

Context & versions

v0.5.11 Linux release build, running inside docker image based on ubuntu:focal-20211006

terrad version --long:

name: terra
server_name: terrad
version: 0.5.11
commit: 19b75bfd60f7623bf4ef8919ab0e52d73ad99643
build_tags: netgo muslc,
go: go version go1.16.5 linux/amd64
build_deps:
- filippo.io/edwards25519@v1.0.0-beta.2
- github.com/99designs/keyring@v1.1.6 => github.com/cosmos/keyring@v1.1.7-0.20210622111912-ef00f8ac3d76
- github.com/ChainSafe/go-schnorrkel@v0.0.0-20200405005733-88cbf1b4c40d
- github.com/CosmWasm/wasmvm@v0.16.2
- github.com/Workiva/go-datastructures@v1.0.52
- github.com/armon/go-metrics@v0.3.9
- github.com/beorn7/perks@v1.0.1
- github.com/bgentry/speakeasy@v0.1.0
- github.com/btcsuite/btcd@v0.22.0-beta
- github.com/cespare/xxhash/v2@v2.1.1
- github.com/coinbase/rosetta-sdk-go@v0.6.10
- github.com/confio/ics23/go@v0.6.6
- github.com/cosmos/cosmos-sdk@v0.44.2
- github.com/cosmos/go-bip39@v1.0.0
- github.com/cosmos/iavl@v0.17.1
- github.com/cosmos/ibc-go@v1.1.0
- github.com/davecgh/go-spew@v1.1.1
- github.com/desertbit/timer@v0.0.0-20180107155436-c41aec40b27f
- github.com/dvsekhvalnov/jose2go@v0.0.0-20200901110807-248326c1351b
- github.com/enigmampc/btcutil@v1.0.3-0.20200723161021-e2fb6adb2a25
- github.com/felixge/httpsnoop@v1.0.1
- github.com/fsnotify/fsnotify@v1.4.9
- github.com/go-kit/kit@v0.10.0
- github.com/go-logfmt/logfmt@v0.5.0
- github.com/godbus/dbus@v0.0.0-20190726142602-4481cbc300e2
- github.com/gogo/gateway@v1.1.0
- github.com/gogo/protobuf@v1.3.3 => github.com/regen-network/protobuf@v1.3.3-alpha.regen.1
- github.com/golang/protobuf@v1.5.2
- github.com/golang/snappy@v0.0.3-0.20201103224600-674baa8c7fc3
- github.com/google/btree@v1.0.0
- github.com/google/orderedcode@v0.0.1
- github.com/gorilla/handlers@v1.5.1
- github.com/gorilla/mux@v1.8.0
- github.com/gorilla/websocket@v1.4.2
- github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0
- github.com/grpc-ecosystem/grpc-gateway@v1.16.0
- github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c
- github.com/gtank/merlin@v0.1.1
- github.com/gtank/ristretto255@v0.1.2
- github.com/hashicorp/go-immutable-radix@v1.0.0
- github.com/hashicorp/golang-lru@v0.5.4
- github.com/hashicorp/hcl@v1.0.0
- github.com/hdevalence/ed25519consensus@v0.0.0-20210204194344-59a8610d2b87
- github.com/improbable-eng/grpc-web@v0.14.1
- github.com/klauspost/compress@v1.11.7
- github.com/libp2p/go-buffer-pool@v0.0.2
- github.com/magiconair/properties@v1.8.5
- github.com/mattn/go-isatty@v0.0.14
- github.com/matttproud/golang_protobuf_extensions@v1.0.1
- github.com/mimoo/StrobeGo@v0.0.0-20181016162300-f8f6d4d2b643
- github.com/minio/highwayhash@v1.0.1
- github.com/mitchellh/mapstructure@v1.4.1
- github.com/mtibben/percent@v0.2.1
- github.com/pelletier/go-toml@v1.9.3
- github.com/pkg/errors@v0.9.1
- github.com/pmezard/go-difflib@v1.0.0
- github.com/prometheus/client_golang@v1.11.0
- github.com/prometheus/client_model@v0.2.0
- github.com/prometheus/common@v0.29.0
- github.com/prometheus/procfs@v0.6.0
- github.com/rakyll/statik@v0.1.7
- github.com/rcrowley/go-metrics@v0.0.0-20200313005456-10cdbea86bc0
- github.com/regen-network/cosmos-proto@v0.3.1
- github.com/rs/cors@v1.7.0
- github.com/rs/zerolog@v1.23.0
- github.com/spf13/afero@v1.6.0
- github.com/spf13/cast@v1.3.1
- github.com/spf13/cobra@v1.2.1
- github.com/spf13/jwalterweatherman@v1.1.0
- github.com/spf13/pflag@v1.0.5
- github.com/spf13/viper@v1.8.1
- github.com/stretchr/testify@v1.7.0
- github.com/subosito/gotenv@v1.2.0
- github.com/syndtr/goleveldb@v1.0.1-0.20200815110645-5c35d600f0ca
- github.com/tendermint/btcd@v0.1.1
- github.com/tendermint/crypto@v0.0.0-20191022145703-50d29ede1e15
- github.com/tendermint/go-amino@v0.16.0
- github.com/tendermint/tendermint@v0.34.13
- github.com/tendermint/tm-db@v0.6.4
- golang.org/x/crypto@v0.0.0-20210513164829-c07d793c2f9a
- golang.org/x/net@v0.0.0-20210903162142-ad29c8ab022f
- golang.org/x/sys@v0.0.0-20210903071746-97244b99971b
- golang.org/x/term@v0.0.0-20201126162022-7de9c90e9dd1
- golang.org/x/text@v0.3.6
- google.golang.org/genproto@v0.0.0-20210602131652-f16073e35f0c
- google.golang.org/grpc@v1.40.0 => google.golang.org/grpc@v1.33.2
- google.golang.org/protobuf@v1.27.1
- gopkg.in/ini.v1@v1.62.0
- gopkg.in/yaml.v2@v2.4.0
- gopkg.in/yaml.v3@v3.0.0-20210107192922-496545a6307b
- nhooyr.io/websocket@v1.8.6
cosmos_sdk_version: v0.44.2
@fabio-nukui fabio-nukui added the bug Something isn't working label Nov 12, 2021
@hanjukim
Copy link
Contributor

Update Core to v0.5.11

@fabio-nukui
Copy link
Author

@hanjukim As in the report, this error is still happening here with v0.5.11. I am using the pre-compiled release at https://github.com/terra-money/core/releases/download/v0.5.11/terra_0.5.11_Linux_x86_64.tar.gz

@hanjukim
Copy link
Contributor

Thank you for your finding. Did you also send broadcast txs to the node as well?

@fabio-nukui
Copy link
Author

Yes, I broadcasted txs with the node as well

@ivanfong01
Copy link

ivanfong01 commented Nov 12, 2021

I did following actions with my node:

  • intensive queries (I think 100+ per second)
  • broadcast of txs (also intensive, but maybe 20-30+ per sec with most of them failed and did not actually broadcast to other nodes)
  • inspection of mempool (web request 10 per sec)

@yun-yeo
Copy link
Contributor

yun-yeo commented Nov 12, 2021

Seems v0.16.2 is fix for other problem(some vector managing) in libwasmvm, it is still problem.

@yun-yeo
Copy link
Contributor

yun-yeo commented Nov 12, 2021

@ivanfong01 @fabio-nukui

For temporal fix, you can bump Tendermint version to latest. It will make your node to handle wasm query in single thread.

Like https://github.com/terra-money/core/pull/594

@fabio-nukui
Copy link
Author

@YunSuk-Yeo Thanks for the tip!

I will test this branch and see if there are stability / performance issues.

@fabio-nukui
Copy link
Author

The new cosmos-sdk and tendermint versions seem to solve this issue.

I am running a few nodes with v0.5.11 and a few with the new versions. The v0.5.11 nodes froze ~5 times each today, the new nodes didn't freeze a single time.

I also didn't see significant performance differences in my applications.

@yun-yeo
Copy link
Contributor

yun-yeo commented Nov 16, 2021

ok, seems this problem only happens in multi thread query env. Thanks for confirmation.

@faddat
Copy link
Contributor

faddat commented Dec 6, 2021

I think this took down our relayer yesteraday.

@faddat
Copy link
Contributor

faddat commented Dec 7, 2021

@faddat faddat mentioned this issue Dec 8, 2021
4 tasks
@hanjukim
Copy link
Contributor

@YunSuk-Yeo found a way to reproduce the problem.
wasmerio/wasmer#2434 (comment)

@yun-yeo
Copy link
Contributor

yun-yeo commented Feb 21, 2022

@yun-yeo yun-yeo closed this as completed Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants