Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.3.0-rc.0 restore from same snapshot, cannot add a new member #9109

Closed
gyuho opened this issue Jan 7, 2018 · 27 comments
Closed

v3.3.0-rc.0 restore from same snapshot, cannot add a new member #9109

gyuho opened this issue Jan 7, 2018 · 27 comments
Assignees

Comments

@gyuho
Copy link
Contributor

gyuho commented Jan 7, 2018

From #9094 (comment)

  1. save snapshot
  2. restore 3-node cluster from same snapshot
  3. add a new member (fails)

We don't have any test coverage around this...

Likely related to #9096.

/cc @lyddragon

@gyuho gyuho self-assigned this Jan 7, 2018
@gyuho gyuho added the type/bug label Jan 7, 2018
@OPSTime
Copy link

OPSTime commented Jan 7, 2018

how to do normal restore etcd cluster?

@gyuho
Copy link
Contributor Author

gyuho commented Jan 8, 2018

@lyddragon Can't reproduce. What was the error message from

sudo -u etcd ETCDCTL_API=3 /usr/local/etcd/etcdctl snapshot restore /tmp/s.db --name node4 --initial-cluster node1=http://192.168.0.81:2380,node2=http://192.168.0.82:2380,node3=http://192.168.0.83:2380,node4=http://192.168.0.80:2380 --initial-cluster-token wpbch1bi7yebkdWWfoemlqxyjbwrqt --initial-advertise-peer-urls=http://192.168.0.80:2380 --data-dir /data/etcd/intranet-test.data

start failed

@OPSTime
Copy link

OPSTime commented Jan 8, 2018

node4 start success, but restart fail

Jan 08 21:11:55 node4 etcd[1285]: started streaming with peer c4cf1eeedaaed948 (writer)
Jan 08 21:11:55 node4 etcd[1285]: started streaming with peer c4cf1eeedaaed948 (writer)
Jan 08 21:11:55 node4 etcd[1285]: started streaming with peer c4cf1eeedaaed948 (stream MsgApp v2 reader)
Jan 08 21:11:55 node4 etcd[1285]: started streaming with peer c4cf1eeedaaed948 (stream Message reader)
Jan 08 21:11:55 node4 etcd[1285]: set the initial cluster version to 3.0
Jan 08 21:11:55 node4 etcd[1285]: enabled capabilities for version 3.0
Jan 08 21:11:55 node4 etcd[1285]: updated the cluster version from 3.0 to 3.3
Jan 08 21:11:55 node4 etcd[1285]: enabled capabilities for version 3.3
Jan 08 21:11:55 node4 etcd[1285]: error updating attributes of unknown member 60a45cddcb77254e
Jan 08 21:11:55 node4 bash[1285]: panic: error updating attributes of unknown member 60a45cddcb77254e
Jan 08 21:11:55 node4 bash[1285]: goroutine 166 [running]:
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc42000a100, 0x100fd8e, 0x2e, 0xc420401028, 0x1, 0x1)
Jan 08 21:11:55 node4 bash[1285]: /home/gyuho/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x16d
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/membership.(*RaftCluster).UpdateAttributes(0xc420058180, 0x60a45cddcb77254e, 0xc4202741e8, 0x5, 0xc4202736c0, 0x
Jan 08 21:11:55 node4 bash[1285]: /home/gyuho/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/membership/cluster.go:334 +0x2ea
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*applierV2store).Put(0xc42000ab00, 0xc420011080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Jan 08 21:11:55 node4 bash[1285]: /home/gyuho/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/apply_v2.go:82 +0x931
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyV2Request(0xc420090000, 0xc420011080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Jan 08 21:11:55 node4 bash[1285]: /home/gyuho/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/apply_v2.go:114 +0x1ed
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyEntryNormal(0xc420090000, 0xc4204016f8)
Jan 08 21:11:55 node4 bash[1285]: /home/gyuho/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:1352 +0x5f5
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).apply(0xc420090000, 0xc4200ac048, 0x8, 0x8, 0xc420276a80, 0x44c358, 0x5ead3d4a6150e5, 0x20cfb2fa)
Jan 08 21:11:55 node4 bash[1285]: /home/gyuho/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:1299 +0x37e
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyEntries(0xc420090000, 0xc420276a80, 0xc4201fa0c0)
Jan 08 21:11:55 node4 bash[1285]: /home/gyuho/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:936 +0xc7
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyAll(0xc420090000, 0xc420276a80, 0xc4201fa0c0)
Jan 08 21:11:55 node4 bash[1285]: /home/gyuho/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:798 +0xb3
Jan 08 21:11:55 node4 bash[1285]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).run.func6(0x16006e0, 0xc420272dc0)
Jan 08 21:11:55 node4 systemd[1]: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 08 21:11:55 node4 systemd[1]: Failed to start Etcd Server.
Jan 08 21:11:55 node4 systemd[1]: Unit etcd.service entered failed state.
Jan 08 21:11:55 node4 systemd[1]: etcd.service failed.

@OPSTime
Copy link

OPSTime commented Jan 8, 2018

use backup snapshot build cluster, but new member data sync have problem.

1、get keys:

$ ETCDCTL_API=3 /usr/local/etcd/etcdctl get 0 'z'
k1
1
k2
2
k3
3
k4
4

2、backup on node1,and scp s.db to node2、node3

$ ETCDCTL_API=3 /usr/local/etcd/etcdctl snapshot save /tmp/s.db
$ scp /tmp/s.db node2:/tmp
$ scp /tmp/s.db node3:/tmp

3、on node1、node2、node3

$ systemctl stop etcd
$ rm -rf /data/etcd/*

4、on node1、node2、node3,restore and start etcd service

node1:
$ sudo -u etcd ETCDCTL_API=3 /usr/local/etcd/etcdctl snapshot restore /tmp/s.db --name node1 --initial-cluster node1=http://192.168.0.81:2380,node2=http://192.168.0.82:2380,node3=http://192.168.0.83:2380 --initial-cluster-token wpbch1bi7yebkdWWfoemlqxyjbwrqt --initial-advertise-peer-urls=http://192.168.0.81:2380 --data-dir /data/etcd/intranet-test.data
$ systemctl start etcd

node2:
$ sudo -u etcd ETCDCTL_API=3 /usr/local/etcd/etcdctl snapshot restore /tmp/s.db --name node2 --initial-cluster node1=http://192.168.0.81:2380,node2=http://192.168.0.82:2380,node3=http://192.168.0.83:2380 --initial-cluster-token wpbch1bi7yebkdWWfoemlqxyjbwrqt --initial-advertise-peer-urls=http://192.168.0.82:2380 --data-dir /data/etcd/intranet-test.data
$ systemctl start etcd

node3:
$ sudo -u etcd ETCDCTL_API=3 /usr/local/etcd/etcdctl snapshot restore /tmp/s.db --name node3 --initial-cluster node1=http://192.168.0.81:2380,node2=http://192.168.0.82:2380,node3=http://192.168.0.83:2380 --initial-cluster-token wpbch1bi7yebkdWWfoemlqxyjbwrqt --initial-advertise-peer-urls=http://192.168.0.83:2380 --data-dir /data/etcd/intranet-test.data
$ systemctl start etcd

5、add member node4

$ ETCDCTL_API=3 /usr/local/etcd/etcdctl member add node4 --peer-urls='http://192.168.0.80:2380'

6、start node4

$ cat /usr/local/etcd/conf/etcd.conf
......
initial-cluster-state: existing
......
$ systemctl start etcd

7、on node4, get keys, no keys

$ ETCDCTL_API=3 /usr/local/etcd/etcdctl get 0 'z'

8、on node1, and add keys

$ ETCDCTL_API=3 /usr/local/etcd/etcdctl get 0 'z'
k1
1
k2
2
k3
3
k4
4

$ ETCDCTL_API=3 /usr/local/etcd/etcdctl put k5 5

9、on node4, get keys,no k1-k4

$ ETCDCTL_API=3 /usr/local/etcd/etcdctl get 0 'z'
k5
5

@gyuho gyuho removed the type/bug label Jan 8, 2018
@gyuho
Copy link
Contributor Author

gyuho commented Jan 8, 2018

@lyddragon What is your node4 config after running member add on node4?

@OPSTime
Copy link

OPSTime commented Jan 8, 2018

name: "node4"
data-dir: /data/etcd/intranet-test.data
wal-dir: /data/etcd/intranet-test.wal.data
snapshot-count: 10000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 8589934592
listen-peer-urls: http://192.168.0.80:2380
listen-client-urls: http://192.168.0.80:2379,http://127.0.0.1:2379
max-snapshots: 5
max-wals: 5
cors:
initial-advertise-peer-urls: http://192.168.0.80:2380
advertise-client-urls: http://192.168.0.80:2379
discovery:
discovery-fallback: "proxy"
discovery-proxy:
discovery-srv:
initial-cluster: node1=http://192.168.0.81:2380,node2=http://192.168.0.82:2380,node3=http://192.168.0.83:2380,node4=http://192.168.0.80:2380
initial-cluster-token: "wpbch1bi7yebkdWWfoemlqxyjbwrqt"
initial-cluster-state: existing
strict-reconfig-check: false
enable-v2: false
enable-pprof: true
proxy: "off"
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
ca-file:
cert-file:
key-file:
client-cert-auth: false
trusted-ca-file:
auto-tls: false
peer-transport-security:
ca-file:
cert-file:
key-file:
peer-client-cert-auth: false
trusted-ca-file:
auto-tls: false
debug: false
log-package-levels: etcdmain=DEBUG,etcdserver=DEBUG
log-output: default
force-new-cluster: false

@gyuho
Copy link
Contributor Author

gyuho commented Jan 8, 2018

Did you delete data dir before start? Did you restore snapshot on node4?

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

node4 is new member.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 9, 2018

Did you delete data dir before starting the node4?
And can you reproduce locally?
Works fine in my side...

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

yes,first delete data dir.

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

I had tried more time.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 9, 2018

This is exactly what you are doing?

Try this yourself.
This works as expected.

If not, there's something wrong.

##################################
ETCD_VER=v3.3.0-rc.0

# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/coreos/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/tmp/etcd-download-test/etcd --version

/tmp/etcd-download-test/etcd
##################################

##################################
rm -rf /tmp/etcd-data-1.new
/tmp/etcd-download-test/etcd \
  --name s1 \
  --data-dir /tmp/etcd-data-1.new \
  --listen-client-urls http://localhost:12379 \
  --advertise-client-urls http://localhost:12379 \
  --listen-peer-urls http://localhost:12380 \
  --initial-advertise-peer-urls http://localhost:12380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn \
  --initial-cluster-state new

rm -rf /tmp/etcd-data-2.new
/tmp/etcd-download-test/etcd \
  --name s2 \
  --data-dir /tmp/etcd-data-2.new \
  --listen-client-urls http://localhost:22379 \
  --advertise-client-urls http://localhost:22379 \
  --listen-peer-urls http://localhost:22380 \
  --initial-advertise-peer-urls http://localhost:22380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn \
  --initial-cluster-state new

rm -rf /tmp/etcd-data-3.new
/tmp/etcd-download-test/etcd \
  --name s3 \
  --data-dir /tmp/etcd-data-3.new \
  --listen-client-urls http://localhost:32379 \
  --advertise-client-urls http://localhost:32379 \
  --listen-peer-urls http://localhost:32380 \
  --initial-advertise-peer-urls http://localhost:32380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn \
  --initial-cluster-state new
##################################

##################################
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379 \
  put foo1 bar1

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379 \
  put foo2 bar2

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379 \
  put foo3 bar3

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379 \
  get "" --from-key
##################################

##################################
rm -f /tmp/a.db

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379 \
  snapshot save /tmp/a.db

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  snapshot status /tmp/a.db --write-out=table
##################################

##################################
rm -rf /tmp/etcd-data-1.new
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  snapshot restore /tmp/a.db \
  --data-dir /tmp/etcd-data-1.new \
  --name s1 \
  --initial-advertise-peer-urls http://localhost:12380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn

rm -rf /tmp/etcd-data-2.new
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  snapshot restore /tmp/a.db \
  --data-dir /tmp/etcd-data-2.new \
  --name s2 \
  --initial-advertise-peer-urls http://localhost:22380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn

rm -rf /tmp/etcd-data-3.new
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  snapshot restore /tmp/a.db \
  --data-dir /tmp/etcd-data-3.new \
  --name s3 \
  --initial-advertise-peer-urls http://localhost:32380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn
##################################

##################################
/tmp/etcd-download-test/etcd \
  --name s1 \
  --data-dir /tmp/etcd-data-1.new \
  --listen-client-urls http://localhost:12379 \
  --advertise-client-urls http://localhost:12379 \
  --listen-peer-urls http://localhost:12380 \
  --initial-advertise-peer-urls http://localhost:12380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn \
  --initial-cluster-state existing

/tmp/etcd-download-test/etcd \
  --name s2 \
  --data-dir /tmp/etcd-data-2.new \
  --listen-client-urls http://localhost:22379 \
  --advertise-client-urls http://localhost:22379 \
  --listen-peer-urls http://localhost:22380 \
  --initial-advertise-peer-urls http://localhost:22380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn \
  --initial-cluster-state existing

/tmp/etcd-download-test/etcd \
  --name s3 \
  --data-dir /tmp/etcd-data-3.new \
  --listen-client-urls http://localhost:32379 \
  --advertise-client-urls http://localhost:32379 \
  --listen-peer-urls http://localhost:32380 \
  --initial-advertise-peer-urls http://localhost:32380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380 \
  --initial-cluster-token tkn \
  --initial-cluster-state existing
##################################

##################################
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379 \
  endpoint health

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379 \
  member list --write-out=table

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379 \
  endpoint status --write-out=table
##################################

##################################
# member add
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379 \
  member add s4 --peer-urls=http://localhost:42380

rm -rf /tmp/etcd-data-4.new
/tmp/etcd-download-test/etcd \
  --name s4 \
  --data-dir /tmp/etcd-data-4.new \
  --listen-client-urls http://localhost:42379 \
  --advertise-client-urls http://localhost:42379 \
  --listen-peer-urls http://localhost:42380 \
  --initial-advertise-peer-urls http://localhost:42380 \
  --initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380,s4=http://localhost:42380 \
  --initial-cluster-token tkn \
  --initial-cluster-state existing
##################################

##################################
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:42379 \
  put foo4 bar4

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379 \
  get "" --from-key

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:22379 \
  get "" --from-key

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:32379 \
  get "" --from-key

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:42379 \
  get "" --from-key
##################################

##################################
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379,http://localhost:42379 \
  endpoint health

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379,http://localhost:42379 \
  member list --write-out=table

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl \
  --endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379,http://localhost:42379 \
  endpoint status --write-out=table
##################################

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

my cluster have 3 member, and do snapshot from 3 member cluster!!!

@gyuho
Copy link
Contributor Author

gyuho commented Jan 9, 2018

do snapshot from 3 member cluster!!!

what do you mean?

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

Follow my steps.

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

the very beginning,my cluster have 3 member,your cluster only one member.

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

set up 3-node cluster (A, B, C)
get snapshot of A
shut down A B C
remove A B C db directory
restore A B C snapshot
restart A B C with same configuration
add member D
start member D
get keys on member D

1 2 3 4 5 6 7 8 9

@gyuho
Copy link
Contributor Author

gyuho commented Jan 9, 2018

Updated #9109 (comment). Still works. If not, I should be missing something, still.

Regarding snapshot at the beginning, ass long as the snapshot from A is recent (contains all keys), it won't make any difference.

Please provide locally reproducible script as above, or try the script above yourself.

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

associated with it: wal-dir

my configuration:

name: "nodename"
data-dir: /tmp/nodename.data
wal-dir: /tmp/nodename.wal.data
......

but 'etcdctl snapshot restore ' no option: --wal-dir

test configuration file: e.conf

name: "nodename"
data-dir: /tmp/nodename.data
wal-dir: /tmp/nodename.wal.data
listen-peer-urls: http://localhost:member_number2380
listen-client-urls: http://localhost:member_number2379
initial-advertise-peer-urls: http://localhost:member_number2380
advertise-client-urls: http://localhost:member_number2379
initial-cluster: s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380
initial-cluster-token: "wpbch1bi7yebkdWWfoemlqxyjbwrqt"
initial-cluster-state: new

test script

##################################
ETCD_VER=v3.3.0-rc.0

choose either URL

GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/coreos/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/tmp/etcd-download-test/etcd --version

#/tmp/etcd-download-test/etcd
##################################

##################################
pkill etcd
sleep 10

echo --- build new cluster ---
rm -rf /tmp/s1.data
rm -rf /tmp/s1.wal.data
\cp e.conf /tmp/s1.conf
sed -i 's/nodename/s1/g' /tmp/s1.conf
sed -i 's/member_number/1/g' /tmp/s1.conf
/tmp/etcd-download-test/etcd --config-file /tmp/s1.conf &>/dev/null &

rm -rf /tmp/s2.data
rm -rf /tmp/s2.wal.data
\cp e.conf /tmp/s2.conf
sed -i 's/nodename/s2/g' /tmp/s2.conf
sed -i 's/member_number/2/g' /tmp/s2.conf
/tmp/etcd-download-test/etcd --config-file /tmp/s2.conf &>/dev/null &

rm -rf /tmp/s3.data
rm -rf /tmp/s3.wal.data
\cp e.conf /tmp/s3.conf
sed -i 's/nodename/s3/g' /tmp/s3.conf
sed -i 's/member_number/3/g' /tmp/s3.conf
/tmp/etcd-download-test/etcd --config-file /tmp/s3.conf &>/dev/null &
##################################

##################################
echo --- put keys ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
put foo1 bar1

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
put foo2 bar2

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
put foo3 bar3

echo --- get keys ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
get "" --from-key
##################################

##################################
rm -f /tmp/a.db

echo --- save snapshot ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379
snapshot save /tmp/a.db

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
snapshot status /tmp/a.db --write-out=table

echo --- kill etcd ---
sleep 1
pkill etcd
sleep 10
echo --- check etcd process ---
ps aux |grep etcd|grep -v grep

##################################

##################################
echo --- use snapshot build new cluster ---
rm -rf /tmp/s1.data
rm -rf /tmp/s1.wal.data
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
snapshot restore /tmp/a.db
--data-dir /tmp/s1.data
--name s1
--initial-advertise-peer-urls http://localhost:12380
--initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380
--initial-cluster-token wpbch1bi7yebkdWWfoemlqxyjbwrqt

rm -rf /tmp/s2.data
rm -rf /tmp/s2.wal.data
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
snapshot restore /tmp/a.db
--data-dir /tmp/s2.data
--name s2
--initial-advertise-peer-urls http://localhost:22380
--initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380
--initial-cluster-token wpbch1bi7yebkdWWfoemlqxyjbwrqt

rm -rf /tmp/s3.data
rm -rf /tmp/s3.wal.data
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
snapshot restore /tmp/a.db
--data-dir /tmp/s3.data
--name s3
--initial-advertise-peer-urls http://localhost:32380
--initial-cluster s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380
--initial-cluster-token wpbch1bi7yebkdWWfoemlqxyjbwrqt

##################################

##################################
echo --- start new cluster ---
#sed -i 's/initial-cluster-state: new/initial-cluster-state: existing/' /tmp/s1.conf
/tmp/etcd-download-test/etcd --config-file /tmp/s1.conf &>/tmp/s1.out &

#sed -i 's/initial-cluster-state: new/initial-cluster-state: existing/' /tmp/s2.conf
/tmp/etcd-download-test/etcd --config-file /tmp/s2.conf &>/tmp/s2.out &

#sed -i 's/initial-cluster-state: new/initial-cluster-state: existing/' /tmp/s3.conf
/tmp/etcd-download-test/etcd --config-file /tmp/s3.conf &>/tmp/s3.out &

##################################

##################################
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
endpoint health

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
member list --write-out=table

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
endpoint status --write-out=table
##################################

##################################
echo --- add member s4 ---

member add

sleep 5
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
member add s4 --peer-urls=http://localhost:42380

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
endpoint health

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
member list --write-out=table

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379
endpoint status --write-out=table

rm -rf /tmp/s4.data
rm -rf /tmp/s4.wal.data
\cp e.conf /tmp/s4.conf
sed -i 's/nodename/s4/g' /tmp/s4.conf
sed -i 's/member_number/4/g' /tmp/s4.conf
sed -i 's/initial-cluster-state: new/initial-cluster-state: existing/' /tmp/s4.conf
sed -i 's#initial-cluster: s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380#initial-cluster: s1=http://localhost:12380,s2=http://localhost:22380,s3=http://localhost:32380,s4=http://localhost:42380#' /tmp/s4.conf
/tmp/etcd-download-test/etcd --config-file /tmp/s4.conf &>/tmp/s4.out &

##################################

##################################
echo --- put key on s4 ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:42379
put foo4 bar4

echo --- put key on s1 ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379
get "" --from-key

echo --- put key on s2 ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:22379
get "" --from-key

echo --- put key on s3 ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:32379
get "" --from-key

echo --- get keys on s4 ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:42379
get "" --from-key
##################################

##################################
echo --- check new cluster ---
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379,http://localhost:42379
endpoint health

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379,http://localhost:42379
member list --write-out=table

ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl
--endpoints=http://localhost:12379,http://localhost:22379,http://localhost:32379,http://localhost:42379
endpoint status --write-out=table
##################################

@gyuho
Copy link
Contributor Author

gyuho commented Jan 9, 2018

Now I see what went wrong. You are configuring separate WAL directory /data/etcd/intranet-test.wal.data when snapshot restore outputs everything under --data-dir. Please change your --wal-dir to filepath.Join(DATA_DIR, "member", "wal").

@gyuho gyuho closed this as completed Jan 9, 2018
@OPSTime
Copy link

OPSTime commented Jan 9, 2018

--wal-dir
Path to the dedicated wal directory. If this flag is set, etcd will write the WAL files to the walDir rather than the dataDir. This allows a dedicated disk to be used, and helps avoid io competition between logging and other IO operations.
default: ""
env variable: ETCD_WAL_DIR

Allow point to different directory.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 9, 2018

Problem is snapshot restore command outputs everything under --data-dir flag. And as you said

but 'etcdctl snapshot restore ' no option: --wal-dir

So, I assume your systemd config points to wrong wal dir.

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

'etcdctl snapshot restore' need to add option: --wal-dir, or add option: --config-file (may be necessary) .
--initial-cluster-state option should be abandoned.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 9, 2018

@lyddragon Can you open a new issue explaining your use case? Thanks!

@OPSTime
Copy link

OPSTime commented Jan 9, 2018

To make it easier to maintain

@gyuho
Copy link
Contributor Author

gyuho commented Jan 9, 2018

@lyddragon I submitted a PR to add snapshot restore --wal-dir flag. Will backport to v3.3.

@OPSTime
Copy link

OPSTime commented Jan 10, 2018

If add --config-file may be more convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants