网站域名中请勿使用二级目录形式,南京百度seo,wordpress行间距,东莞代理公司注册前言#xff1a;
kubelet 是 Kubernetes 集群中的一个重要组件#xff0c;运行在每个节点上#xff0c;负责管理该节点上的容器和Pod。它与控制平面#xff08;如 API Server 和 kube-controller-manager#xff09;通信#xff0c;确保节点上的容器与期望的状态保持一致…前言
kubelet 是 Kubernetes 集群中的一个重要组件运行在每个节点上负责管理该节点上的容器和Pod。它与控制平面如 API Server 和 kube-controller-manager通信确保节点上的容器与期望的状态保持一致。
以下是 kubelet 的一些主要功能和工作原理
Pod 管理kubelet 负责管理节点上的 Pod。它根据从 API Server 获取的 Pod 定义PodSpec来创建、启动、停止和删除容器。容器生命周期管理kubelet 负责监控容器的运行状态并在需要时重新启动失败的容器。它还会处理容器的存储卷挂载、网络设置等操作。资源管理kubelet 监控节点上的资源利用情况确保不超出节点的资源限制。它可以根据资源请求和限制设置合适的 QoS 类别并调整 Pod 资源分配。健康检查和探针kubelet 通过周期性的健康检查和探针来监控容器的健康状态。如果容器不健康kubelet 将采取相应的措施如重新启动容器或从服务中移除。Pod 事件和状态报告kubelet 会将节点上发生的事件和状态信息报告给集群的 API Server以便控制平面能够跟踪每个节点和 Pod 的状态。与控制平面通信kubelet 与 API Server 和 kube-controller-manager 进行通信以获取 Pod 定义、报告状态以及接收指令来启动或停止容器。Pod 生命周期钩子kubelet 支持在容器生命周期各个阶段执行用户定义的钩子如容器创建前后、停止前后等。资源配额和限制kubelet 根据 Pod 的资源请求和限制来监视和限制容器的资源使用。镜像管理kubelet 从指定的容器镜像仓库中下载镜像以供启动容器。
从以上可以看出kubelet服务是一个非常重要的服务而在kubernetes集群遇到比较大的负载的情况下例如雪崩事件影响的时候通常kubelet服务会是第一个崩溃此服务崩溃后通常需要自己手动在启动该服务这个未免是不智能的因为如果有非常多的集群需要管理无疑是一个灾难性的事情。
那么我们可以把这样的重要服务加入系统守护进程来在遇到灾难性的事件时通过supervisor守护进程强行拉起kubelet免去手动启动重要服务的麻烦。
下面将就如何把关键服务kubelet加入系统守护进程supervisor做一个简单的描述。
一
supervisor守护进程的安装
supervisor的离线安装包
链接https://pan.baidu.com/s/1PWispap5zo0asvGS6qIY0w?pwdkkey 提取码kkey
###注此安装包设置为本地仓库即可使用yum安装命令为yum install supervisor -y
如果不想使用离线安装包可以参考我的另一篇博文进行在线安装【精选】Linux之奇怪的知识---supervisor超级守护进程的意义和使用方法_systemctl restart supervisord-CSDN博客
supervisor守护进程安装完毕后启动该服务即可后面进行配置
[rootnode4 ~]# systemctl enable supervisord
Created symlink from /etc/systemd/system/multi-user.target.wants/supervisord.service to /usr/lib/systemd/system/supervisord.service.
[rootnode4 ~]# systemctl start supervisord
[rootnode4 ~]# systemctl status supervisord
● supervisord.service - Process Monitoring and Control DaemonLoaded: loaded (/usr/lib/systemd/system/supervisord.service; enabled; vendor preset: disabled)Active: active (running) since Sun 2023-11-05 18:59:44 CST; 7min agoMain PID: 62698 (supervisord)CGroup: /system.slice/supervisord.service└─62698 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord.confNov 05 18:59:44 node4 systemd[1]: Starting Process Monitoring and Control Daemon...
Nov 05 18:59:44 node4 systemd[1]: Started Process Monitoring and Control Daemon.二
kubelet服务
以work节点的kubelet服务为例先观察此kubelet服务的启动命令
[rootnode4 ~]# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node AgentLoaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)Drop-In: /etc/systemd/system/kubelet.service.d└─10-kubeadm.confActive: active (running) since Sun 2023-11-05 19:12:49 CST; 53s agoDocs: http://kubernetes.io/docs/Main PID: 9815 (kubelet)Tasks: 15Memory: 41.6MCGroup: /system.slice/kubelet.service└─9815 /usr/local/bin/kubelet --bootstrap-kubeconfig/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig/etc/kubernetes/kubelet.conf --config/var/lib/kubelet/config.yaml --cgroup-driversystemd --network-plugincni --pod-infra-container-imagekubesphere/pause:3.6 --node-ip192.168.123.14 --hostname-overridenode4Nov 05 19:12:50 node4 kubelet[9815]: I1105 19:12:50.787510 9815 reconciler.go:238] operationExecutor.VerifyControllerAttachedVolume started for volume \kube-api-access-vqrrg\ (UniqueName: \kubernetes.io/projected/9dc7319e-2d19-482d-935a-f069ae991c64-kube-api-access-vqrrg\) pod \kube-proxy-649mn\ (UID: \9dc7319e-2d19-482d-935a-f069ae991c64\) podkube-system/kube-proxy-649mn
Nov 05 19:12:50 node4 kubelet[9815]: I1105 19:12:50.787524 9815 reconciler.go:238] operationExecutor.VerifyControllerAttachedVolume started for volume \var-run-calico\ (UniqueName: \kubernetes.io/host-path/34b2d437-1345-4f5e-a931-7185f56fdda7-var-run-calico\) pod \calico-node-5ztjk\ (UID: \34b2d437-1345-4f5e-a931-7185f56fdda7\) podkube-system/calico-node-5ztjk
Nov 05 19:12:50 node4 kubelet[9815]: I1105 19:12:50.787537 9815 reconciler.go:238] operationExecutor.VerifyControllerAttachedVolume started for volume \var-lib-calico\ (UniqueName: \kubernetes.io/host-path/34b2d437-1345-4f5e-a931-7185f56fdda7-var-lib-calico\) pod \calico-node-5ztjk\ (UID: \34b2d437-1345-4f5e-a931-7185f56fdda7\) podkube-system/calico-node-5ztjk
Nov 05 19:12:50 node4 kubelet[9815]: I1105 19:12:50.787551 9815 reconciler.go:238] operationExecutor.VerifyControllerAttachedVolume started for volume \host-local-net-dir\ (UniqueName: \kubernetes.io/host-path/34b2d437-1345-4f5e-a931-7185f56fdda7-host-local-net-dir\) pod \calico-node-5ztjk\ (UID: \34b2d437-1345-4f5e-a931-7185f56fdda7\) podkube-system/calico-node-5ztjk
Nov 05 19:12:50 node4 kubelet[9815]: I1105 19:12:50.787565 9815 reconciler.go:238] operationExecutor.VerifyControllerAttachedVolume started for volume \xtables-lock\ (UniqueName: \kubernetes.io/host-path/a199e406-8b57-4d77-890d-4b1f0c0a1868-xtables-lock\) pod \nodelocaldns-ndlbw\ (UID: \a199e406-8b57-4d77-890d-4b1f0c0a1868\) podkube-system/nodelocaldns-ndlbw
Nov 05 19:12:50 node4 kubelet[9815]: I1105 19:12:50.787574 9815 reconciler.go:167] Reconciler: start to sync state
Nov 05 19:12:51 node4 kubelet[9815]: E1105 19:12:51.573907 9815 kubelet.go:1745] Failed creating a mirror pod for errpods \haproxy-node4\ already exists podkube-system/haproxy-node4
Nov 05 19:12:51 node4 kubelet[9815]: I1105 19:12:51.847708 9815 prober_manager.go:274] Failed to trigger a manual run probeReadiness
Nov 05 19:12:51 node4 kubelet[9815]: I1105 19:12:51.967443 9815 request.go:685] Waited for 1.078626222s due to client-side throttling, not priority and fairness, request: POST:https://127.0.0.1:6443/api/v1/namespaces/kube-system/serviceaccounts/kube-proxy/token
Nov 05 19:12:56 node4 kubelet[9815]: I1105 19:12:56.256375 9815 prober_manager.go:274] Failed to trigger a manual run probeReadiness可以看到此服务的启动命令是
此命令将在下面的supervisor配置里使用
/usr/local/bin/kubelet --bootstrap-kubeconfig/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig/etc/kubernetes/kubelet.conf --config/var/lib/kubelet/config.yaml --cgroup-driversystemd --network-plugincni --pod-infra-container-imagekubesphere/pause:3.6 --node-ip192.168.123.14 --hostname-overridenode4
三
supervisor的子配置文件编写
cat /etc/supervisord.d/kubelet.ini EOF
[program:kubelet]
command/etc/kubernetes/kubelet.sh run ;
userroot
process_name%(program_name)s ; 程序名称
numprocs1 ; 复制的进程数
directory/etc/kubernetes ; 启动之后进入的目录
priority1 ; 启动的相对优先级
autostarttrue ; supervisord 启动后也启动默认跟随启动
startsecs3 ; 程序持续运行3秒后任即为程序已经运行
startretries3 ; 启动失败后最大的重启尝试次数默认3次
autorestarttrue ; 异常终止后自启
exitcodes0 ;异常退出错误码
stopsignalQUIT ; 向进程发送kill 信号默认TERM
stopwaitsecs10 ; 结束后最大的等待时间
stopasgrouptrue ; send stop signal to the UNIX process group (default false)
killasgrouptrue ; SIGKILL the UNIX process group (def false)
redirect_stderrtrue ; 重定向错误输出到标准输出
stdout_logfile/var/log/kubelet/kubelet.log ; 日志输出路径
stdout_logfile_maxbytes10MB ; 日志文件大小
stdout_logfile_backups20 ; 备份日志份数
EOF
根据以上配置文件创建kubelet的运行脚本和日志存放目录
###注 脚本内容是第二节提到的命令直接复制即可
cat /etc/kubernetes/kubelet.shEOF
#!/bin/bash
/usr/local/bin/kubelet --bootstrap-kubeconfig/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig/etc/kubernetes/kubelet.conf --config/var/lib/kubelet/config.yaml --cgroup-driversystemd --network-plugincni --pod-infra-container-imagekubesphere/pause:3.6 --node-ip192.168.123.14 --hostname-overridenode4
EOFmkdir -p /var/log/kubelet/
chmod ax /etc/kubernetes/kubelet.sh
四
停止原kubelet服务使用supervisor守护kubelet服务
systemctl disable kubelet
systemctl stop kubelet
systemctl restart supervisord查看supervisor守护进程的日志
2023-11-05 19:38:26,043 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set userroot in the config file to avoid this message.
2023-11-05 19:38:26,043 INFO Included extra file /etc/supervisord.d/kubelet.ini during parsing
2023-11-05 19:38:26,055 INFO RPC interface supervisor initialized
2023-11-05 19:38:26,055 CRIT Server unix_http_server running without any HTTP authentication checking
2023-11-05 19:38:26,056 INFO daemonizing the supervisord process
2023-11-05 19:38:26,056 INFO supervisord started with pid 22557
2023-11-05 19:38:27,058 INFO spawned: kubelet with pid 22562
2023-11-05 19:38:30,443 INFO success: kubelet entered RUNNING state, process has stayed up for than 3 seconds (startsecs)可以看到最后一句话表示kubelet服务已经成功由supervisor守护进程守护
查看kubelet服务的日志
Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelets --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --network-plugin has been deprecated, will be removed along with dockershim.
Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelets --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --network-plugin has been deprecated, will be removed along with dockershim.
W1105 19:38:27.107932 22563 feature_gate.go:237] Setting GA feature gate TTLAfterFinishedtrue. It will be removed in a future release.
W1105 19:38:27.108024 22563 feature_gate.go:237] Setting GA feature gate TTLAfterFinishedtrue. It will be removed in a future release.
I1105 19:38:27.117410 22563 server.go:446] Kubelet version kubeletVersionv1.23.16
W1105 19:38:27.117493 22563 feature_gate.go:237] Setting GA feature gate TTLAfterFinishedtrue. It will be removed in a future release.
W1105 19:38:27.117533 22563 feature_gate.go:237] Setting GA feature gate TTLAfterFinishedtrue. It will be removed in a future release.
I1105 19:38:27.117631 22563 server.go:874] Client rotation is on, will bootstrap in background
I1105 19:38:27.118630 22563 certificate_store.go:130] Loading cert/key pair from /var/lib/kubelet/pki/kubelet-client-current.pem.
I1105 19:38:27.119146 22563 dynamic_cafile_content.go:156] Starting controller nameclient-ca-bundle::/etc/kubernetes/pki/ca.crt
I1105 19:38:27.171908 22563 server.go:693] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
I1105 19:38:27.172095 22563 container_manager_linux.go:281] Container manager verified user specified cgroup-root exists cgroupRoot[]下面使用ps命令和netstat命令查看一下kubelet的状态
可以看到kubelet完美运行家人们在也不担心kubelet服务崩溃了
[rootnode4 ~]# netstat -antup |grep kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 22563/kubelet
tcp 0 0 127.0.0.1:42414 0.0.0.0:* LISTEN 22563/kubelet
tcp 0 0 127.0.0.1:58126 127.0.0.1:6443 ESTABLISHED 22563/kubelet
tcp6 0 0 :::10250 :::* LISTEN 22563/kubelet
[rootnode4 ~]# ps aux |grep kubelet
root 22562 0.0 0.0 115308 1424 ? S 19:38 0:00 /bin/bash /etc/kubernetes/kubelet.sh run
root 22563 1.0 1.9 1554664 80732 ? Sl 19:38 0:14 /usr/local/bin/kubelet --bootstrap-kubeconfig/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig/etc/kubernetes/kubelet.conf --config/var/lib/kubelet/config.yaml --cgroup-driversystemd --network-plugincni --pod-infra-container-imagekubesphere/pause:3.6 --node-ip192.168.123.14 --hostname-overridenode4五
模拟kubelet服务崩溃强制结束kubelet服务看看它能不能自己在拉起来
[rootnode4 ~]# kill -9 22563
[rootnode4 ~]# ps aux |grep kubelet
root 46019 0.0 0.0 115308 1420 ? S 20:04 0:00 /bin/bash /etc/kubernetes/kubelet.sh run
root 46020 8.3 1.7 1349348 72012 ? Sl 20:04 0:00 /usr/local/bin/kubelet --bootstrap-kubeconfig/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig/etc/kubernetes/kubelet.conf --config/var/lib/kubelet/config.yaml --cgroup-driversystemd --network-plugincni --pod-infra-container-imagekubesphere/pause:3.6 --node-ip192.168.123.14 --hostname-overridenode4
root 46266 0.0 0.0 112712 960 pts/1 S 20:04 0:00 grep --colorauto kubelet
[rootnode4 ~]# netstat -antup |grep kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 46020/kubelet
tcp 0 0 127.0.0.1:33615 0.0.0.0:* LISTEN 46020/kubelet
tcp 0 0 127.0.0.1:36402 127.0.0.1:6443 ESTABLISHED 46020/kubelet
tcp6 0 0 :::10250 :::* LISTEN 46020/kubelet
查看守护进程的日志可以看到非常迅速的就把kubelet拉起来了
2023-11-05 20:04:39,637 INFO exited: kubelet (exit status 137; not expected)
2023-11-05 20:04:39,638 INFO spawned: kubelet with pid 46019
2023-11-05 20:04:43,028 INFO success: kubelet entered RUNNING state, process has stayed up for than 3 seconds (startsecs)六
supervisor的简单管理
例如上面的操作已经将kubelet服务利用supervisor守护起来了如何使用supervisor来管理这些守护的服务呢
非常简单supervisorctl 命令 带status参数查看守护的服务状态stop restart 是停止和重启这些就不一一演示了
[rootnode4 ~]# supervisorctl status
kubelet RUNNING pid 46019, uptime 2:02:18
[rootnode4 ~]# supervisorctl stop kubelet
kubelet: stopped
[rootnode4 ~]# ps aux |grep kubelet
root 30189 0.0 0.0 112712 960 pts/0 S 22:11 0:00 grep --colorauto kubelet
[rootnode4 ~]# supervisorctl start kubelet
kubelet: started
[rootnode4 ~]# ps aux |grep kubelet
root 30241 0.0 0.0 115308 1424 ? S 22:12 0:00 /bin/bash /etc/kubernetes/kubelet.sh run
root 30242 5.0 1.8 1415396 73600 ? Sl 22:12 0:00 /usr/local/bin/kubelet --bootstrap-kubeconfig/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig/etc/kubernetes/kubelet.conf --config/var/lib/kubelet/config.yaml --cgroup-driversystemd --network-plugincni --pod-infra-container-imagekubesphere/pause:3.6 --node-ip192.168.123.14 --hostname-overridenode4
root 30508 0.0 0.0 112712 964 pts/0 S 22:12 0:00 grep --colorauto kubelet[rootnode4 ~]# supervisorctl restart kubelet
kubelet: stopped
kubelet: started OK关键服务的利用supervisor守护进程自动化管理完美完成
小结
kubelet服务由supervisor守护进程守护可以有效的防止雪崩等比较严重的故障可有效的提升kubernetes集群的健壮性。
那么可能有得小伙伴会有一个疑问我使用supervisor守护了关键服务那么别的关键服务比如esetcd是不是也可以使用supervisor来守护呢答案是肯定的除非磁盘损坏等等不可抗力否则该服务将会永远在线