做网站必须要有数据库,成年男女做羞羞视频网站,青岛需要做网站的公司有哪些,工商注册官方网站写在前面 集群电源不稳定#xff0c;或者节点动不动就 宕机,一定要做好备份#xff0c;ETCD 的快照文件很容易受影响损坏。重置了很多次集群#xff0c;才认识到备份的重要博文内容涉及 etcd 运维基础知识了解静态 Pod 方式 etcd 集群灾备与恢复 Demo定时备份的任务编写二进…写在前面 集群电源不稳定或者节点动不动就 宕机,一定要做好备份ETCD 的快照文件很容易受影响损坏。重置了很多次集群才认识到备份的重要博文内容涉及 etcd 运维基础知识了解静态 Pod 方式 etcd 集群灾备与恢复 Demo定时备份的任务编写二进制 etcd 集群灾备恢复 Demo 理解不足小伙伴帮忙指正 我所渴求的無非是將心中脫穎語出的本性付諸生活為何竟如此艱難呢 ------赫尔曼·黑塞《德米安》 etcd 概述
etcd 是 CoreOS团队于2013年6月发起的开源项目它的目标是构建一个高可用的分布式键值(key-value)数据库。
etcd 内部采用 raft 协议作为一致性算法etcd基于Go语言实现。
完全复制集群中的每个节点都可以使用完整的存档高可用性Etcd可用于避免硬件的单点故障或网络问题一致性每次读取都会返回跨多主机的最新写入简单包括一个定义良好、面向用户的API(gRPC)安全实现了带有可选的客户端证书身份验证的自动化TLS快速每秒10000次写入的基准速度可靠使用Raft算法实现了强一致、高可用的服务存储目录
ETCD 集群运维相关的基本知识
读写端口为 2379 数据同步端口 2380ETCD集群是一个分布式系统,使用Raft协议来维护集群内各个节点状态的一致性。主机状态 Leader, Follower, Candidate当集群初始化时候每个节点都是Follower角色通过心跳与其他节点同步数据通过Follower读取数据通过Leader写入数据当Follower在一定时间内没有收到来自主节点的心跳会将自己角色改变为Candidate并发起一次选主投票配置etcd集群建议尽可能是奇数个节点而不要偶数个节点,推荐的数量为 3、5 或者 7 个节点构成一个集群。使用 etcd 的内置备份/恢复工具从源部署备份数据并在新部署中恢复数据。恢复前需要清理数据目录数据目录下 snap: 存放快照数据,etcd防止WAL文件过多而设置的快照存储etcd数据状态。数据目录下 wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中所有数据的修改在提交前都要先写入到WAL中。一个 etcd 集群可能不应超过七个节点,写入性能会受影响建议运行五个节点。一个 5 成员的 etcd 集群可以容忍两个成员故障三个成员可以容忍1个故障。
常用配置参数
ETCD_NAME 节点名称默认为defaulETCD_DATA_DIR 服务运行数据保存的路ETCD_LISTEN_PEER_URLS 监听的同伴通信的地址比如http://ip:2380如果有多个使用逗号分隔。需要所有节点都能够访问所以不要使用 localhostETCD_LISTEN_CLIENT_URLS 监听的客户端服务地址ETCD_ADVERTISE_CLIENT_URLS 对外公告的该节点客户端监听地址这个值会告诉集群中其他节点ETCD_INITIAL_ADVERTISE_PEER_URLS 对外公告的该节点同伴监听地址这个值会告诉集群中其他节ETCD_INITIAL_CLUSTER 集群中所有节点的信息ETCD_INITIAL_CLUSTER_STATE 新建集群的时候这个值为 new假如加入已经存在的集群这个值为existingETCD_INITIAL_CLUSTER_TOKEN 集群的ID多个集群的时候每个集群的ID必须保持唯一
静态 Pod方式 集群备份恢复
单节点ETCD备份恢复
如果 etcd 为单节点部署可以直接 物理备份直接备份对应的数据文件目录即可恢复 的话可以直接把备份的 etcd 数据目录复制到 etcd 指定的目录。恢复完成需要恢复 /etc/kubernetes/manifests 内 etcd.yaml 文件原来的状态。
也可以基于快照进行备份
备份命令
┌──[rootvms81.liruilongs.github.io]-[/backup_20230127]
└─$ETCDCTL_API3 etcdctl --endpointshttps://127.0.0.1:2379 \--cert/etc/kubernetes/pki/etcd/server.crt \--key/etc/kubernetes/pki/etcd/server.key \--cacert/etc/kubernetes/pki/etcd/ca.crt
snapshot save snap-$(date %Y%m%d%H%M).db
Snapshot saved at snap-202301272133.db恢复命令
┌──[rootvms81.liruilongs.github.io]-[/backup_20230127]
└─$ETCDCTL_API3 etcdctl snapshot restore ./snap-202301272133.db \--name vms81.liruilongs.github.io \--cert/etc/kubernetes/pki/etcd/server.crt \--key/etc/kubernetes/pki/etcd/server.key \--cacert/etc/kubernetes/pki/etcd/ca.crt \--initial-advertise-peer-urlshttps://192.168.26.81:2380 \--initial-clustervms81.liruilongs.github.iohttps://192.168.26.81:2380 \--data-dir/var/lib/etcd
2023-01-27 21:40:01.193420 I | mvcc: restore compact to 484325
2023-01-27 21:40:01.199682 I | etcdserver/membership: added member cbf506fa2d16c7 [https://192.168.26.81:2380] to cluster 46c9df5da345274b
┌──[rootvms81.liruilongs.github.io]-[/backup_20230127]
└─$具体对应的参数值可以通过 etcd 静态 pod 的 yaml 文件获取
┌──[rootvms81.liruilongs.github.io]-[/var/lib/etcd/member]
└─$kubectl describe pods etcd-vms81.liruilongs.github.io | grep -e ----advertise-client-urlshttps://192.168.26.81:2379--cert-file/etc/kubernetes/pki/etcd/server.crt--client-cert-authtrue--data-dir/var/lib/etcd--initial-advertise-peer-urlshttps://192.168.26.81:2380--initial-clustervms81.liruilongs.github.iohttps://192.168.26.81:2380--key-file/etc/kubernetes/pki/etcd/server.key--listen-client-urlshttps://127.0.0.1:2379,https://192.168.26.81:2379--listen-metrics-urlshttp://127.0.0.1:2381--listen-peer-urlshttps://192.168.26.81:2380--namevms81.liruilongs.github.io--peer-cert-file/etc/kubernetes/pki/etcd/peer.crt--peer-client-cert-authtrue--peer-key-file/etc/kubernetes/pki/etcd/peer.key--peer-trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt--snapshot-count10000--trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt
┌──[rootvms81.liruilongs.github.io]-[/var/lib/etcd/member]
└─$集群ETCD备份恢复
集群节点状态
┌──[rootvms100.liruilongs.github.io]-[~/ansible/helm]
└─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt member list -w table
------------------------------------------------------------------------------------------------------------------
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
------------------------------------------------------------------------------------------------------------------
| ee392e5273e89e2 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 |
| 11486647d7f3a17b | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 |
| e00e3877df8f76f4 | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 |
------------------------------------------------------------------------------------------------------------------
┌──[rootvms100.liruilongs.github.io]-[~/ansible/helm]version 及 leader 信息。
┌──[rootvms100.liruilongs.github.io]-[~/ansible/kubescape]
└─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt endpoint status --cluster -w table
---------------------------------------------------------------------------------------------------
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
---------------------------------------------------------------------------------------------------
| https://192.168.26.100:2379 | ee392e5273e89e2 | 3.5.4 | 37 MB | false | 100 | 3152364 |
| https://192.168.26.102:2379 | 11486647d7f3a17b | 3.5.4 | 36 MB | false | 100 | 3152364 |
| https://192.168.26.101:2379 | e00e3877df8f76f4 | 3.5.4 | 36 MB | true | 100 | 3152364 |
---------------------------------------------------------------------------------------------------
┌──[rootvms100.liruilongs.github.io]-[~/ansible/kubescape]
└─$集群情况下备份可以单节点备份前面我们也讲过etcd 集群为完全复制单节点备份
┌──[rootvms100.liruilongs.github.io]-[~]
└─$yum -y install etcd没有 etcdctl 工具需要安装一下 etcd 或者从其他的地方单独拷贝一下。这里我们安装下然后把 etcetl 拷贝到其他集群节点。
备份
┌──[rootvms100.liruilongs.github.io]-[~]
└─$ENDPOINThttps://127.0.0.1:2379
┌──[rootvms100.liruilongs.github.io]-[~]
└─$ETCDCTL_API3 etcdctl --endpoints $ENDPOINT --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt snapshot save snapshot.db
Snapshot saved at snapshot.db校验快照 hash 值
┌──[rootvms100.liruilongs.github.io]-[~]
└─$ETCDCTL_API3 etcdctl --write-outtable snapshot status snapshot.db
--------------------------------------------
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
--------------------------------------------
| 46aa26ed | 217504 | 2711 | 27 MB |
--------------------------------------------
┌──[rootvms100.liruilongs.github.io]-[~]
└─$恢复
这里的 etcd 集群部署采用堆叠的方式通过静态 pod 运行位于每个控制节点的上。
一定要备份恢复前需要把原来的数据文件备份清理在恢复前需要确保 etcd 和 api-Service 已经停掉。获取必要的参数
┌──[rootvms100.liruilongs.github.io]-[~]
└─$kubectl describe pod etcd-vms100.liruilongs.github.io -n kube-system | grep -e ----advertise-client-urlshttps://192.168.26.100:2379--cert-file/etc/kubernetes/pki/etcd/server.crt--client-cert-authtrue--data-dir/var/lib/etcd--experimental-initial-corrupt-checktrue--experimental-watch-progress-notify-interval5s--initial-advertise-peer-urlshttps://192.168.26.100:2380--initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380--key-file/etc/kubernetes/pki/etcd/server.key--listen-client-urlshttps://127.0.0.1:2379,https://192.168.26.100:2379--listen-metrics-urlshttp://127.0.0.1:2381--listen-peer-urlshttps://192.168.26.100:2380--namevms100.liruilongs.github.io--peer-cert-file/etc/kubernetes/pki/etcd/peer.crt--peer-client-cert-authtrue--peer-key-file/etc/kubernetes/pki/etcd/peer.key--peer-trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt--snapshot-count10000--trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt
┌──[rootvms100.liruilongs.github.io]-[~]
└─$恢复的时候停掉所有 Master 节点的 kube-apiserver和 etcd 这两个静态pod 。 kubelet 每隔 20s 会扫描一次这个目录确定是否发生静态 pod 变动。 移动Yaml文件 即可停掉。
这是使用 Ansible 集群所有节点执行。
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m command -a mv /etc/kubernetes/manifests/etcd.yaml /tmp/ -i host.yaml
192.168.26.102 | CHANGED | rc0 192.168.26.101 | CHANGED | rc0 192.168.26.100 | CHANGED | rc0 ┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m command -a mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/ -i host.yaml
192.168.26.101 | CHANGED | rc0 192.168.26.102 | CHANGED | rc0 192.168.26.100 | CHANGED | rc0
确实 静态 Yaml 文件发生移动
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m command -a ls /etc/kubernetes/manifests/ -i host.yaml
192.168.26.102 | CHANGED | rc0
haproxy.yaml
keepalived.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
192.168.26.100 | CHANGED | rc0
haproxy.yaml
keepalived.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
192.168.26.101 | CHANGED | rc0
haproxy.yaml
keepalived.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$清空所有集群节点的 etcd 数据目录
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m command -a rm -rf /var/lib/etcd/ -i host.yaml
[WARNING]: Consider using the file module with stateabsent rather than running rm. If you need to use command because file is insufficient you can add warn:
false to this command task or set command_warningsFalse in ansible.cfg to get rid of this message.
192.168.26.101 | CHANGED | rc0 192.168.26.102 | CHANGED | rc0 192.168.26.100 | CHANGED | rc0 复制快照备份文件到集群所有节点
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m copy -a srcsnap-202302070000.db dest/root/ -i host.yaml在 vms100.liruilongs.github.io 上面恢复
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ETCDCTL_API3 etcdctl snapshot restore snap-202302070000.db \--name vms100.liruilongs.github.io \--cert/etc/kubernetes/pki/etcd/server.crt \--key/etc/kubernetes/pki/etcd/server.key \--cacert/etc/kubernetes/pki/etcd/ca.crt \--endpointshttps://127.0.0.1:2379 \--initial-advertise-peer-urlshttps://192.168.26.100:2380 \--initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380,vms102.liruilongs.github.iohttps://192.168.26.102:2380 \--data-dir/var/lib/etcd
2023-02-08 12:50:27.598250 I | mvcc: restore compact to 2837993
2023-02-08 12:50:27.609440 I | etcdserver/membership: added member ee392e5273e89e2 [https://192.168.26.100:2380] to cluster 4816f346663d82a7
2023-02-08 12:50:27.609480 I | etcdserver/membership: added member 70059e836d19883d [https://192.168.26.101:2380] to cluster 4816f346663d82a7
2023-02-08 12:50:27.609487 I | etcdserver/membership: added member b8cb9f66c2e63b91 [https://192.168.26.102:2380] to cluster 4816f346663d82a7在 vms101.liruilongs.github.io 上恢复
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ssh 192.168.26.101
Last login: Wed Feb 8 12:48:31 2023 from 192.168.26.100
┌──[rootvms101.liruilongs.github.io]-[~]
└─$ETCDCTL_API3 etcdctl snapshot restore snap-202302070000.db --name vms101.liruilongs.github.io --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt --endpointshttps://127.0.0.1:2379 --initial-advertise-peer-urlshttps://192.168.26.101:2380 --initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380,vms102.liruilongs.github.iohttps://192.168.26.102:2380 --data-dir/var/lib/etcd
2023-02-08 12:52:21.976748 I | mvcc: restore compact to 2837993
2023-02-08 12:52:21.991588 I | etcdserver/membership: added member ee392e5273e89e2 [https://192.168.26.100:2380] to cluster 4816f346663d82a7
2023-02-08 12:52:21.991622 I | etcdserver/membership: added member 70059e836d19883d [https://192.168.26.101:2380] to cluster 4816f346663d82a7
2023-02-08 12:52:21.991629 I | etcdserver/membership: added member b8cb9f66c2e63b91 [https://192.168.26.102:2380] to cluster 4816f346663d82a7在 vms102.liruilongs.github.io 上恢复
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ssh 192.168.26.102
Last login: Wed Feb 8 12:48:31 2023 from 192.168.26.100
┌──[rootvms102.liruilongs.github.io]-[~]
└─$ETCDCTL_API3 etcdctl snapshot restore snap-202302070000.db --name vms102.liruilongs.github.io --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes
/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt --endpointshttps://127.0.0.1:2379 --initial-advertise-peer-urlshttps://192.168.26.102:2380
--initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380,vms102.liruilongs.github.iohttps:/
/192.168.26.102:2380 --data-dir/var/lib/etcd
2023-02-08 12:53:32.338663 I | mvcc: restore compact to 2837993
2023-02-08 12:53:32.354619 I | etcdserver/membership: added member ee392e5273e89e2 [https://192.168.26.100:2380] to cluster 4816f346663d82a7
2023-02-08 12:53:32.354782 I | etcdserver/membership: added member 70059e836d19883d [https://192.168.26.101:2380] to cluster 4816f346663d82a7
2023-02-08 12:53:32.354790 I | etcdserver/membership: added member b8cb9f66c2e63b91 [https://192.168.26.102:2380] to cluster 4816f346663d82a7
┌──[rootvms102.liruilongs.github.io]-[~]
└─$恢复完成后移动 etcd,api-service 静态pod 配置文件
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m command -a mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/ -i host.yaml
192.168.26.101 | CHANGED | rc0 192.168.26.102 | CHANGED | rc0 192.168.26.100 | CHANGED | rc0 ┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m command -a mv /tmp/etcd.yaml /etc/kubernetes/manifests/etcd.yaml -i host.yaml
192.168.26.101 | CHANGED | rc0 192.168.26.102 | CHANGED | rc0 192.168.26.100 | CHANGED | rc0 ┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$确认移动成功。
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master -m command -a ls /etc/kubernetes/manifests/ -i host.yaml
192.168.26.100 | CHANGED | rc0
etcd.yaml
haproxy.yaml
keepalived.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
192.168.26.101 | CHANGED | rc0
etcd.yaml
haproxy.yaml
keepalived.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
192.168.26.102 | CHANGED | rc0
etcd.yaml
haproxy.yaml
keepalived.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
┌──[rootvms100.liruilongs.github.io]-[~/ansible]任意节点查看 etcd 集群信息。恢复成功
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$kubectl get pods
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt endpoint status --cluster -w table
---------------------------------------------------------------------------------------------------
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
---------------------------------------------------------------------------------------------------
| https://192.168.26.100:2379 | ee392e5273e89e2 | 3.5.4 | 37 MB | false | 2 | 146 |
| https://192.168.26.101:2379 | 70059e836d19883d | 3.5.4 | 37 MB | true | 2 | 146 |
| https://192.168.26.102:2379 | b8cb9f66c2e63b91 | 3.5.4 | 37 MB | false | 2 | 146 |
---------------------------------------------------------------------------------------------------
┌──[rootvms100.liruilongs.github.io]-[~/ansible]
└─$遇到的问题
如果某一节点有下面的报错或者集群节点添加不成功添加了两个需要按照上面的步骤重复进行。
panic: tocommit(258) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost? 问题处理
┌──[rootvms100.liruilongs.github.io]-[~/back]
└─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt endpoint status --cluster -w table
---------------------------------------------------------------------------------------------------
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
---------------------------------------------------------------------------------------------------
| https://192.168.26.100:2379 | ee392e5273e89e2 | 3.5.4 | 37 MB | true | 2 | 85951 |
| https://192.168.26.101:2379 | 70059e836d19883d | 3.5.4 | 37 MB | false | 2 | 85951 |
---------------------------------------------------------------------------------------------------备份定时任务编写
这里的定时备份通过systemd.service 和 systemd.timer 实现定时运行 etcd_back.sh 备份脚本并设置开机自启
很简单没啥说的
┌──[rootvms81.liruilongs.github.io]-[~/back]
└─$systemctl cat etcd-backup
# /usr/lib/systemd/system/etcd-backup.service
[Unit]
Description ETCD 备份
Afternetwork-online.target[Service]
Typeoneshot
EnvironmentETCDCTL_API3
ExecStart/usr/bin/bash /usr/lib/systemd/system/etcd_back.sh[Install]
WantedBymulti-user.target每天午夜执行一次
┌──[rootvms81.liruilongs.github.io]-[~/back]
└─$systemctl cat etcd-backup.timer
# /usr/lib/systemd/system/etcd-backup.timer
[Unit]
Description每天备份一次 ETCD[Timer]
OnBootSec3s
OnCalendar*-*-* 00:00:00
Unitetcd-backup.service[Install]
WantedBymulti-user.target备份脚本
┌──[rootvms100.liruilongs.github.io]-[~/ansible/backup]
└─$cat etcd_back.sh
#!/bin/bash#File : erct_break.sh
#Time : 2023/01/27 23:00:27
#Author : Li Ruilong
#Version : 1.0
#Desc : ETCD 备份
#Contact : 1224965096qq.comif [ ! -d /root/back/ ];thenmkdir -p /root/back/
fi
STR_DATE$(date %Y%m%d%H%M)ETCDCTL_API3 etcdctl \
--endpointshttps://127.0.0.1:2379 \
--cert/etc/kubernetes/pki/etcd/server.crt \
--key/etc/kubernetes/pki/etcd/server.key \
--cacert/etc/kubernetes/pki/etcd/ca.crt \
snapshot save /root/back/snap-${STR_DATE}.dbETCDCTL_API3 etcdctl --write-outtable snapshot status /root/back/snap-${STR_DATE}.dbsudo chmod o-w,u-w,g-w /root/back/snap-${STR_DATE}.db服务和定时任务的备份部署
┌──[rootvms100.liruilongs.github.io]-[~/ansible/backup]
└─$cat deply.sh
#!/bin/bash#File : erct_break.sh
#Time : 2023/01/27 23:00:27
#Author : Li Ruilong
#Version : 1.0
#Desc : ETCD 备份部署
#Contact : 1224965096qq.comcp ./* /usr/lib/systemd/system/
systemctl enable etcd-backup.timer --now
systemctl enable etcd-backup.service --now
ls /root/back/
日志查看
┌──[rootvms100.liruilongs.github.io]-[~/ansible/backup]
└─$journalctl -u etcd-backup.service -o cat
...................
Starting ETCD 备份...
Snapshot saved at /root/back/snap-202301290120.db
--------------------------------------------
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
--------------------------------------------
| 74323316 | 640319 | 2250 | 27 MB |
--------------------------------------------
Started ETCD 备份.
Starting ETCD 备份...
Snapshot saved at /root/back/snap-202301290120.db
--------------------------------------------
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
--------------------------------------------
| e75a16bf | 640325 | 2255 | 27 MB |
--------------------------------------------
Started ETCD 备份.
Starting ETCD 备份...
Snapshot saved at /root/back/snap-202301290121.db
--------------------------------------------
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
--------------------------------------------
| eb5e9e86 | 640388 | 2318 | 27 MB |
--------------------------------------------
Started ETCD 备份.
Starting ETCD 备份...
Snapshot saved at /root/back/snap-202301290121.db
--------------------------------------------
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
--------------------------------------------
| 30a91bb6 | 640402 | 2333 | 27 MB |
--------------------------------------------
Started ETCD 备份.二进制 集群备份恢复
二进制集群的备份恢复和 静态 pod 的方式基本相同。
这里不同的是下面的恢复方式使用先恢复前两个节点构成集群第三个节点加入集群的方式。当前集群信息
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -m shell -a etcdctl member list
192.168.26.101 | CHANGED | rc0
2fd4f9ba70a04579: nameetcd-102 peerURLshttp://192.168.26.102:2380 clientURLshttp://192.168.26.102:2379,http://localhost:2379 isLeaderfalse
6f2038a018db1103: nameetcd-100 peerURLshttp://192.168.26.100:2380 clientURLshttp://192.168.26.100:2379,http://localhost:2379 isLeaderfalse
bd330576bb637f25: nameetcd-101 peerURLshttp://192.168.26.101:2380 clientURLshttp://192.168.26.101:2379,http://localhost:2379 isLeadertrue
192.168.26.102 | CHANGED | rc0
2fd4f9ba70a04579: nameetcd-102 peerURLshttp://192.168.26.102:2380 clientURLshttp://192.168.26.102:2379,http://localhost:2379 isLeaderfalse
6f2038a018db1103: nameetcd-100 peerURLshttp://192.168.26.100:2380 clientURLshttp://192.168.26.100:2379,http://localhost:2379 isLeaderfalse
bd330576bb637f25: nameetcd-101 peerURLshttp://192.168.26.101:2380 clientURLshttp://192.168.26.101:2379,http://localhost:2379 isLeadertrue
192.168.26.100 | CHANGED | rc0
2fd4f9ba70a04579: nameetcd-102 peerURLshttp://192.168.26.102:2380 clientURLshttp://192.168.26.102:2379,http://localhost:2379 isLeaderfalse
6f2038a018db1103: nameetcd-100 peerURLshttp://192.168.26.100:2380 clientURLshttp://192.168.26.100:2379,http://localhost:2379 isLeaderfalse
bd330576bb637f25: nameetcd-101 peerURLshttp://192.168.26.101:2380 clientURLshttp://192.168.26.101:2379,http://localhost:2379 isLeadertrue
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$准备数据
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible 192.168.26.100 -a etcdctl put name liruilong
192.168.26.100 | CHANGED | rc0
OK
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -a etcdctl get name
192.168.26.102 | CHANGED | rc0
name
liruilong
192.168.26.100 | CHANGED | rc0
name
liruilong
192.168.26.101 | CHANGED | rc0
name
liruilong在任意一台主机上对 etcd 做快照
#在任何一台主机上对 etcd 做快照
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible 192.168.26.101 -a etcdctl snapshot save snap20211010.db
192.168.26.101 | CHANGED | rc0
Snapshot saved at snap20211010.db
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$此快照里包含了刚刚写的数据 nameliruilong然后把快照文件复制到所有节点
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible 192.168.26.101 -a scp /root/snap20211010.db root192.168.26.100:/root/
192.168.26.101 | CHANGED | rc0 ┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible 192.168.26.101 -a scp /root/snap20211010.db root192.168.26.102:/root/
192.168.26.101 | CHANGED | rc0 ┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$清空数据所有节点数据
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -a etcdctl del name
192.168.26.101 | CHANGED | rc0
1
192.168.26.102 | CHANGED | rc0
0
192.168.26.100 | CHANGED | rc0
0
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$在所有节点上关闭 etcd并删除/var/lib/etcd/里所有数据
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$# 在所有节点上关闭 etcd并删除/var/lib/etcd/里所有数据
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -a systemctl stop etcd
192.168.26.100 | CHANGED | rc0 192.168.26.102 | CHANGED | rc0 192.168.26.101 | CHANGED | rc0 ┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -m shell -a rm -rf /var/lib/etcd/*
[WARNING]: Consider using the file module with stateabsent rather than running rm. If you need to
use command because file is insufficient you can add warn: false to this command task or set
command_warningsFalse in ansible.cfg to get rid of this message.
192.168.26.102 | CHANGED | rc0 192.168.26.100 | CHANGED | rc0 192.168.26.101 | CHANGED | rc0 在所有节点上把快照文件的所有者和所属组设置为 etcd
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -a chown etcd.etcd /root/snap20211010.db
[WARNING]: Consider using the file module with owner rather than running chown. If you need to use
command because file is insufficient you can add warn: false to this command task or set
command_warningsFalse in ansible.cfg to get rid of this message.
192.168.26.100 | CHANGED | rc0 192.168.26.102 | CHANGED | rc0 192.168.26.101 | CHANGED | rc0 ┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$# 在每台节点上开始恢复数据在 100,101 节点上开始恢复数据 ┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible 192.168.26.100 -m script -a ./snapshot_restore.sh
192.168.26.100 | CHANGED {changed: true,rc: 0,stderr: Shared connection to 192.168.26.100 closed.\r\n,stderr_lines: [Shared connection to 192.168.26.100 closed.],stdout: 2021-10-10 12:14:30.726021 I | etcdserver/membership: added member 6f2038a018db1103 [http://192.168.26.100:2380] to cluster af623437f584d792\r\n2021-10-10 12:14:30.726234 I | etcdserver/membership: added member bd330576bb637f25 [http://192.168.26.101:2380] to cluster af623437f584d792\r\n,stdout_lines: [2021-10-10 12:14:30.726021 I | etcdserver/membership: added member 6f2038a018db1103 [http://192.168.26.100:2380] to cluster af623437f584d792,2021-10-10 12:14:30.726234 I | etcdserver/membership: added member bd330576bb637f25 [http://192.168.26.101:2380] to cluster af623437f584d792]
}
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$cat -n ./snapshot_restore.sh1 #!/bin/bash23 # 每台节点恢复镜像45 etcdctl snapshot restore /root/snap20211010.db \6 --name etcd-100 \7 --initial-advertise-peer-urlshttp://192.168.26.100:2380 \8 --initial-clusteretcd-100http://192.168.26.100:2380,etcd-101http://192.168.26.101:2380 \9 --data-dir/var/lib/etcd/cluster.etcd10
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$sed 6,7s/100/101/g ./snapshot_restore.sh
#!/bin/bash# 每台节点恢复镜像etcdctl snapshot restore /root/snap20211010.db \
--name etcd-101 \
--initial-advertise-peer-urlshttp://192.168.26.101:2380 \
--initial-clusteretcd-100http://192.168.26.100:2380,etcd-101http://192.168.26.101:2380 \
--data-dir/var/lib/etcd/cluster.etcd┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$sed -i 6,7s/100/101/g ./snapshot_restore.sh
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$cat ./snapshot_restore.sh
#!/bin/bash# 每台节点恢复镜像etcdctl snapshot restore /root/snap20211010.db \
--name etcd-101 \
--initial-advertise-peer-urlshttp://192.168.26.101:2380 \
--initial-clusteretcd-100http://192.168.26.100:2380,etcd-101http://192.168.26.101:2380 \
--data-dir/var/lib/etcd/cluster.etcd┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible 192.168.26.101 -m script -a ./snapshot_restore.sh
192.168.26.101 | CHANGED {changed: true,rc: 0,stderr: Shared connection to 192.168.26.101 closed.\r\n,stderr_lines: [Shared connection to 192.168.26.101 closed.],stdout: 2021-10-10 12:20:26.032754 I | etcdserver/membership: added member 6f2038a018db1103 [http://192.168.26.100:2380] to cluster af623437f584d792\r\n2021-10-10 12:20:26.032930 I | etcdserver/membership: added member bd330576bb637f25 [http://192.168.26.101:2380] to cluster af623437f584d792\r\n,stdout_lines: [2021-10-10 12:20:26.032754 I | etcdserver/membership: added member 6f2038a018db1103 [http://192.168.26.100:2380] to cluster af623437f584d792,2021-10-10 12:20:26.032930 I | etcdserver/membership: added member bd330576bb637f25 [http://192.168.26.101:2380] to cluster af623437f584d792]
}
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$所有节点把/var/lib/etcd 及里面内容的所有者和所属组改为 etcd:etcd 然后分别启动 etcd
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -a chown -R etcd.etcd /var/lib/etcd/
[WARNING]: Consider using the file module with owner rather than running chown. If you need to use
command because file is insufficient you can add warn: false to this command task or set
command_warningsFalse in ansible.cfg to get rid of this message.
192.168.26.100 | CHANGED | rc0 192.168.26.101 | CHANGED | rc0 192.168.26.102 | CHANGED | rc0 ┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -a systemctl start etcd
192.168.26.102 | FAILED | rc1
Job for etcd.service failed because the control process exited with error code. See systemctl status etcd.service and journalctl -xe for details.non-zero return code
192.168.26.101 | CHANGED | rc0 192.168.26.100 | CHANGED | rc0 ┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$把剩下的 102 节点添加进集群
# etcdctl member add etcd_name –peer-urls”https://peerURLs”
[rootvms100 cluster.etcd]# etcdctl member add etcd-102 --peer-urlshttp://192.168.26.102:2380
Member fbd8a96cbf1c004d added to cluster af623437f584d792ETCD_NAMEetcd-102
ETCD_INITIAL_CLUSTERetcd-100http://192.168.26.100:2380,etcd-101http://192.168.26.101:2380,etcd-102http://192.168.26.102:2380
ETCD_INITIAL_ADVERTISE_PEER_URLShttp://192.168.26.102:2380
ETCD_INITIAL_CLUSTER_STATEexisting
[rootvms100 cluster.etcd]#测试恢复结果
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible 192.168.26.102 -m copy -a src./etcd.conf dest/etc/etcd/etcd.conf forceyes
192.168.26.102 | SUCCESS {ansible_facts: {discovered_interpreter_python: /usr/bin/python},changed: false,checksum: 2d8fa163150e32da563f5e591134b38cc356d237,dest: /etc/etcd/etcd.conf,gid: 0,group: root,mode: 0644,owner: root,path: /etc/etcd/etcd.conf,size: 574,state: file,uid: 0
}
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible 192.168.26.102 -m shell -a systemctl enable etcd --now
192.168.26.102 | CHANGED | rc0 ┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -m shell -a etcdctl member list
192.168.26.101 | CHANGED | rc0
6f2038a018db1103, started, etcd-100, http://192.168.26.100:2380, http://192.168.26.100:2379,http://localhost:2379
bd330576bb637f25, started, etcd-101, http://192.168.26.101:2380, http://192.168.26.101:2379,http://localhost:2379
fbd8a96cbf1c004d, started, etcd-102, http://192.168.26.102:2380, http://192.168.26.102:2379,http://localhost:2379
192.168.26.100 | CHANGED | rc0
6f2038a018db1103, started, etcd-100, http://192.168.26.100:2380, http://192.168.26.100:2379,http://localhost:2379
bd330576bb637f25, started, etcd-101, http://192.168.26.101:2380, http://192.168.26.101:2379,http://localhost:2379
fbd8a96cbf1c004d, started, etcd-102, http://192.168.26.102:2380, http://192.168.26.102:2379,http://localhost:2379
192.168.26.102 | CHANGED | rc0
6f2038a018db1103, started, etcd-100, http://192.168.26.100:2380, http://192.168.26.100:2379,http://localhost:2379
bd330576bb637f25, started, etcd-101, http://192.168.26.101:2380, http://192.168.26.101:2379,http://localhost:2379
fbd8a96cbf1c004d, started, etcd-102, http://192.168.26.102:2380, http://192.168.26.102:2379,http://localhost:2379
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$ansible etcd -a etcdctl get name
192.168.26.102 | CHANGED | rc0
name
liruilong
192.168.26.101 | CHANGED | rc0
name
liruilong
192.168.26.100 | CHANGED | rc0
name
liruilong
┌──[rootvms81.liruilongs.github.io]-[~/ansible]
└─$博文部分内容参考
文中涉及参考链接内容版权归原作者所有如有侵权请告知 https://etcd.io/docs/v3.5/faq/
https://etcd.io/docs/v3.6/op-guide/recovery/#restoring-a-cluster
https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/configure-upgrade-etcd/
https://docs.vmware.com/en/VMware-Application-Catalog/services/tutorials/GUID-backup-restore-data-etcd-kubernetes-index.html
https://github.com/etcd-io/etcd/issues/13509 © 2018-2023 liruilongergmail.com, All rights reserved. 保持署名-非商用-相同方式共享(CC BY-NC-SA 4.0)