Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to initialize wal on Azure File Storage #6984

Closed
alanchchen opened this issue Dec 12, 2016 · 8 comments · Fixed by #8286
Closed

Failed to initialize wal on Azure File Storage #6984

alanchchen opened this issue Dec 12, 2016 · 8 comments · Fixed by #8286

Comments

@alanchchen
Copy link

I got initialization error when using etcd with Azure File Storage
etcd was deployed in my Kubernetes cluster and it mounted a persistent volume as data directory.

The container logs from Kubernetes dashboard:

2016-12-12T03:45:25.460712973Z 2016-12-12 03:45:25.460620 W | flags: unrecognized environment variable ETCD_PORT_2379_TCP=tcp://10.0.84.255:2379
2016-12-12T03:45:25.460744973Z 2016-12-12 03:45:25.460688 W | flags: unrecognized environment variable ETCD_PORT_2379_TCP_ADDR=10.0.84.255
2016-12-12T03:45:25.460750173Z 2016-12-12 03:45:25.460714 W | flags: unrecognized environment variable ETCD_SERVICE_PORT=2379
2016-12-12T03:45:25.460753273Z 2016-12-12 03:45:25.460724 W | flags: unrecognized environment variable ETCD_SERVICE_HOST=10.0.84.255
2016-12-12T03:45:25.460756173Z 2016-12-12 03:45:25.460732 W | flags: unrecognized environment variable ETCD_PORT_2379_TCP_PORT=2379
2016-12-12T03:45:25.460761973Z 2016-12-12 03:45:25.460741 W | flags: unrecognized environment variable ETCD_PORT=tcp://10.0.84.255:2379
2016-12-12T03:45:25.460768773Z 2016-12-12 03:45:25.460759 W | flags: unrecognized environment variable ETCD_PORT_2379_TCP_PROTO=tcp
2016-12-12T03:45:25.460796173Z 2016-12-12 03:45:25.460769 W | flags: unrecognized environment variable ETCD_SERVICE_PORT_ETCD_CLIENT=2379
2016-12-12T03:45:25.460820373Z 2016-12-12 03:45:25.460801 I | etcdmain: etcd Version: 3.0.15
2016-12-12T03:45:25.460826973Z 2016-12-12 03:45:25.460816 I | etcdmain: Git SHA: fc00305
2016-12-12T03:45:25.460861773Z 2016-12-12 03:45:25.460824 I | etcdmain: Go Version: go1.6.3
2016-12-12T03:45:25.460884272Z 2016-12-12 03:45:25.460853 I | etcdmain: Go OS/Arch: linux/amd64
2016-12-12T03:45:25.460942272Z 2016-12-12 03:45:25.460905 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2016-12-12T03:45:25.471510026Z 2016-12-12 03:45:25.471446 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2016-12-12T03:45:25.473523217Z 2016-12-12 03:45:25.473482 I | etcdmain: listening for peers on http://0.0.0.0:2380
2016-12-12T03:45:25.473606616Z 2016-12-12 03:45:25.473566 I | etcdmain: listening for client requests on 0.0.0.0:2379
2016-12-12T03:45:25.739378943Z 2016-12-12 03:45:25.739272 I | etcdserver: name = etcd
2016-12-12T03:45:25.739404243Z 2016-12-12 03:45:25.739294 I | etcdserver: data dir = /mnt/data
2016-12-12T03:45:25.739408543Z 2016-12-12 03:45:25.739301 I | etcdserver: member dir = /mnt/data/member
2016-12-12T03:45:25.739411643Z 2016-12-12 03:45:25.739305 I | etcdserver: heartbeat = 100ms
2016-12-12T03:45:25.739414643Z 2016-12-12 03:45:25.739308 I | etcdserver: election = 1000ms
2016-12-12T03:45:25.739417543Z 2016-12-12 03:45:25.739311 I | etcdserver: snapshot count = 10000
2016-12-12T03:45:25.739420643Z 2016-12-12 03:45:25.739319 I | etcdserver: advertise client URLs = http://127.0.0.1:2379
2016-12-12T03:45:25.739437943Z 2016-12-12 03:45:25.739323 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2016-12-12T03:45:25.739442243Z 2016-12-12 03:45:25.739376 I | etcdserver: initial cluster = etcd=http://localhost:2380
2016-12-12T03:45:25.981901373Z 2016-12-12 03:45:25.981795 C | etcdserver: create wal error: rename /mnt/data/member/wal.tmp /mnt/data/member/wal: permission denied

I don't understand why etcd got permission denied error as the directories under /mnt/data are created by etcd itself.

@gyuho
Copy link
Contributor

gyuho commented Dec 12, 2016

It depends on your linux user setup. The process that runs etcd should be able to read, write the data directory. etcd doesn't have control over it.

@xiang90
Copy link
Contributor

xiang90 commented Dec 12, 2016

I don't understand why etcd got permission denied error as the directories under /mnt/data are created by etcd itself.

The dir is not created by this run. Some permissions might be changed. See logging

2016-12-12T03:45:25.471510026Z 2016-12-12 03:45:25.471446 N | etcdmain: the server is already initialized as member before, starting as etcd member...

Please check the file system permission. If you can reproduce this on a clean setup, reopen the issue. Thanks!

@xiang90 xiang90 closed this as completed Dec 12, 2016
@heyitsanthony
Copy link
Contributor

@alanchchen what filesystem is this? what does df -T /mnt/data return?

@xiang90
Copy link
Contributor

xiang90 commented Dec 12, 2016

I noticed that Azure has their net FS service. So repoen since it might be related.

@xiang90 xiang90 reopened this Dec 12, 2016
@alanchchen
Copy link
Author

alanchchen commented Dec 13, 2016

@heyitsanthony
The container crashes immediately after starting, so I'm not sure how to execute df -T /mnt/data.
Do you have any idea?
I don't know the filesystem, perhaps some sort of nfs.
Here is my kubernetes persistent volume yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: etcd-pv001
  labels:
    storage: etcd
    type: azureFile
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  azureFile:
    secretName: alan
    shareName: etcd-pv001
    readOnly: false

@xiang90
I did run etcd with a fresh start.
I thought the container logs I've posted didn't come from the container which creates /mnt/data/members since etcd restarted repeatedly.

@heyitsanthony
Copy link
Contributor

@alanchchen OK according to the docs, etcd is writing to a SMB volume. Given past trouble with both NFS and Windows, I wouldn't be surprised if etcd is broken on SMB. Probably worth fixing.

@alanchchen
Copy link
Author

@heyitsanthony
I see. Thank you.

heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Jul 19, 2017
Detecting windows at compile time isn't enough since etcd might be
on linux but the fs is backed by windows.

Fixes: etcd-io#8178
Fixes: etcd-io#6984
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Jul 20, 2017
Detecting windows at compile time isn't enough since etcd might be
on linux but the fs is backed by windows.

Fixes: etcd-io#8178
Fixes: etcd-io#6984
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Jul 20, 2017
Detecting windows at compile time isn't enough since etcd might be
on linux but the fs is backed by windows.

Fixes: etcd-io#8178
Fixes: etcd-io#6984
visheshnp pushed a commit to visheshnp/etcd that referenced this issue Aug 3, 2017
Detecting windows at compile time isn't enough since etcd might be
on linux but the fs is backed by windows.

Fixes: etcd-io#8178
Fixes: etcd-io#6984
@djeeg
Copy link

djeeg commented Mar 21, 2018

It now gets passed that orignal error message

etcdserver: create wal error: rename /mnt/data/member/wal.tmp /mnt/data/member/wal: permission denied

However it bails on the next message

etcdserver: create wal error: sync /etcd-data/member: invalid argument

docker run -ti --rm --network=netfront --mount='type=volume,volume-driver=cloudstor:azure,source=etcd_332,destination=/etcd-data' quay.io/coreos/etcd:v3.3.2 /usr/local/bin/etcd --data-dir=/etcd-data --listen-client-urls=http://0.0.0.0:2379 --advertise-client-urls=http://etcd:2379 --initial-cluster-state=new

2018-03-21 20:18:03.130721 I | etcdmain: etcd Version: 3.3.2
2018-03-21 20:18:03.130799 I | etcdmain: Git SHA: c9d46ab37
2018-03-21 20:18:03.130816 I | etcdmain: Go Version: go1.9.4
2018-03-21 20:18:03.130832 I | etcdmain: Go OS/Arch: linux/amd64
2018-03-21 20:18:03.130891 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-03-21 20:18:03.144840 I | embed: listening for peers on http://localhost:2380
2018-03-21 20:18:03.144955 I | embed: listening for client requests on 0.0.0.0:2379
2018-03-21 20:18:03.379183 I | etcdserver: name = default
2018-03-21 20:18:03.379235 I | etcdserver: data dir = /etcd-data
2018-03-21 20:18:03.379262 I | etcdserver: member dir = /etcd-data/member
2018-03-21 20:18:03.379313 I | etcdserver: heartbeat = 100ms
2018-03-21 20:18:03.379337 I | etcdserver: election = 1000ms
2018-03-21 20:18:03.379380 I | etcdserver: snapshot count = 100000
2018-03-21 20:18:03.379660 I | etcdserver: advertise client URLs = http://etcd:2379
2018-03-21 20:18:03.379682 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2018-03-21 20:18:03.379703 I | etcdserver: initial cluster = default=http://localhost:2380
2018-03-21 20:18:03.594174 I | wal: releasing file lock to rename "/etcd-data/member/wal.tmp" to "/etcd-data/member/wal"
2018-03-21 20:18:03.733434 C | etcdserver: create wal error: sync /etcd-data/member: invalid argument


v3.1.12 and v3.2.12 fail on the first error message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants