Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD wal: max entry size limit exceeded #2553

Closed
4 tasks done
roquie opened this issue Dec 24, 2022 · 3 comments
Closed
4 tasks done

ETCD wal: max entry size limit exceeded #2553

roquie opened this issue Dec 24, 2022 · 3 comments
Assignees
Labels
bug Something isn't working Stale

Comments

@roquie
Copy link

roquie commented Dec 24, 2022

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 GNU/Linux
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Version

1.25.4

Sysinfo

`k0s sysinfo`
Machine ID: "e0ef25ef8af3f38d9ceab96ec23467ae28184c848892451a758d27ccd4019018" (from machine) (pass)
Total memory: 1.9 GiB (pass)
Disk space available for /var/lib/k0s: 9.4 GiB (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.4.0-109-generic (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  Executable in path: modprobe: /usr/sbin/modprobe (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 1 (pass)
    cgroup controller "cpu": available (pass)
    cgroup controller "cpuacct": available (pass)
    cgroup controller "cpuset": available (pass)
    cgroup controller "memory": available (pass)
    cgroup controller "devices": available (pass)
    cgroup controller "freezer": available (pass)
    cgroup controller "pids": available (pass)
    cgroup controller "hugetlb": available (pass)
    cgroup controller "blkio": available (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

Error with etcd after VM restart (before them k0s fails because space on device is over) and increasing disk size:

etcd.go:204\",\"msg\":\"discovery failed\",\"error\":\"wal: max entry size limit exceeded, recBytes: 955, fileSize(64000000) - offset(63999464) - padBytes(5) = entryLimit(531)

Steps to reproduce

  1. install cluster
  2. wait until disk is full, turn off VM and increase it size
  3. start VM and got an error

Expected behavior

working cluster controller

Actual behavior

No response

Screenshots and logs

Dec 24 08:51:36 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:36" level=info msg="{\"level\":\"info\",\"ts\":\"2022-12-24T08:51:36.932Z\",\"caller\":\"etcdserver/backend.go:81\",\"msg\":\"opened backend db\",\"path\":\"/var/lib/k0s/etcd/member/snap/db\",\"took\":\"62.300357ms\"}" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=info msg="{\"level\":\"info\",\"ts\":\"2022-12-24T08:51:37.233Z\",\"caller\":\"embed/etcd.go:371\",\"msg\":\"closing etcd server\",\"name\":\"cl1lr3g2d1m8nm5mep8g-eleq\",\"data-dir\":\"/var/lib/k0s/etcd\",\"advertise-peer-urls\":[\"https://10.200.0.19:2380\"],\"advertise-client-urls\":[\"https://127.0.0.1:2379\"]}" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=info msg="{\"level\":\"info\",\"ts\":\"2022-12-24T08:51:37.234Z\",\"caller\":\"embed/etcd.go:373\",\"msg\":\"closed etcd server\",\"name\":\"cl1lr3g2d1m8nm5mep8g-eleq\",\"data-dir\":\"/var/lib/k0s/etcd\",\"advertise-peer-urls\":[\"https://10.200.0.19:2380\"],\"advertise-client-urls\":[\"https://127.0.0.1:2379\"]}" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=info msg="{\"level\":\"fatal\",\"ts\":\"2022-12-24T08:51:37.234Z\",\"caller\":\"etcdmain/etcd.go:204\",\"msg\":\"discovery failed\",\"error\":\"wal: max entry size limit exceeded, recBytes: 955, fileSize(64000000) - offset(63999464) - padBytes(5) = entryLimit(531)\",\"stacktrace\":\"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\\n\\t/etcd/server/etcdmain/etcd.go:204\\ngo.etcd.io/etcd/server/v3/etcdmain.Main\\n\\t/etcd/server/etcdmain/main.go:40\\nmain.main\\n\\t/etcd/server/main.go:32\\nruntime.main\\n\\t/usr/local/>
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=warning msg="exit status 1" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=info msg="respawning in 5s" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: {"level":"warn","ts":"2022-12-24T08:51:37.836Z","logger":"etcd-client","caller":"v3@v3.5.4/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000d68000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
Dec 24 08:51:38 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: {"level":"warn","ts":"2022-12-24T08:51:38.837Z","logger":"etcd-client","caller":"v3@v3.5.4/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b6e1c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}

Additional context

etcd-io/etcd#14025

@roquie roquie added the bug Something isn't working label Dec 24, 2022
@roquie
Copy link
Author

roquie commented Dec 27, 2022

After recovery from backup it still working again some time (3 days). Now this error repeated.

@juanluisvaladas
Copy link
Contributor

Hi @roquie,
If I understand correctly:
1- The filesystem got full
2- Rebooted the vm
3- Increased vm disk size
4- Tried to start etcd again and it fails.

My understanding is that these logs are AFTER the restart:

Dec 24 08:51:36 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:36" level=info msg="{\"level\":\"info\",\"ts\":\"2022-12-24T08:51:36.932Z\",\"caller\":\"etcdserver/backend.go:81\",\"msg\":\"opened backend db\",\"path\":\"/var/lib/k0s/etcd/member/snap/db\",\"took\":\"62.300357ms\"}" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=info msg="{\"level\":\"info\",\"ts\":\"2022-12-24T08:51:37.233Z\",\"caller\":\"embed/etcd.go:371\",\"msg\":\"closing etcd server\",\"name\":\"cl1lr3g2d1m8nm5mep8g-eleq\",\"data-dir\":\"/var/lib/k0s/etcd\",\"advertise-peer-urls\":[\"https://10.200.0.19:2380\"],\"advertise-client-urls\":[\"https://127.0.0.1:2379\"]}" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=info msg="{\"level\":\"info\",\"ts\":\"2022-12-24T08:51:37.234Z\",\"caller\":\"embed/etcd.go:373\",\"msg\":\"closed etcd server\",\"name\":\"cl1lr3g2d1m8nm5mep8g-eleq\",\"data-dir\":\"/var/lib/k0s/etcd\",\"advertise-peer-urls\":[\"https://10.200.0.19:2380\"],\"advertise-client-urls\":[\"https://127.0.0.1:2379\"]}" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=info msg="{\"level\":\"fatal\",\"ts\":\"2022-12-24T08:51:37.234Z\",\"caller\":\"etcdmain/etcd.go:204\",\"msg\":\"discovery failed\",\"error\":\"wal: max entry size limit exceeded, recBytes: 955, fileSize(64000000) - offset(63999464) - padBytes(5) = entryLimit(531)\",\"stacktrace\":\"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\\n\\t/etcd/server/etcdmain/etcd.go:204\\ngo.etcd.io/etcd/server/v3/etcdmain.Main\\n\\t/etcd/server/etcdmain/main.go:40\\nmain.main\\n\\t/etcd/server/main.go:32\\nruntime.main\\n\\t/usr/local/>
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=warning msg="exit status 1" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: time="2022-12-24 08:51:37" level=info msg="respawning in 5s" component=etcd
Dec 24 08:51:37 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: {"level":"warn","ts":"2022-12-24T08:51:37.836Z","logger":"etcd-client","caller":"v3@v3.5.4/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000d68000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
Dec 24 08:51:38 cl1lr3g2d1m8nm5mep8g-eleq k0s[2455]: {"level":"warn","ts":"2022-12-24T08:51:38.837Z","logger":"etcd-client","caller":"v3@v3.5.4/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b6e1c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}

Do you have logs BEFORE the restart? My guess is that the raft log got corrupted because one fsync failed because the disk was full, but I can't be sure.

@github-actions
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Jan 27, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

2 participants