基于html5的电商网站开发,南宁企业网站制作,奢侈品网站 方案,电子商务技术文章目录 [toc]什么是 ThanosThanos 的主要功能Thanos 的架构组件Thanos 部署架构SidecarReceive架构选择 开始部署部署架构创建 namespacenode-exporter 部署kube-state-metrics 部署Prometheus Thanos-Sidecar 部署固定节点创建 label生成 secretMinIO 配置etcd 证书 启动 P… 文章目录 [toc]什么是 ThanosThanos 的主要功能Thanos 的架构组件Thanos 部署架构SidecarReceive架构选择 开始部署部署架构创建 namespacenode-exporter 部署kube-state-metrics 部署Prometheus Thanos-Sidecar 部署固定节点创建 label生成 secretMinIO 配置etcd 证书 启动 Prometheus Thanos-Sidecar Thanos-store-gateway 部署Thanos-compact 部署Thanos-query 部署Thanos-query-globle 部署Thanos-query-frontend 部署Grafana 部署增加 Thanos 和 MinIO 监控 Grafana dashboardcorednsetcdThanosnode-exporter 最后
什么是 Thanos Thanos 官网Thanos quay.io 镜像仓库Thanos Github Thanos 是一个强大的 Prometheus 扩展解决方案能够解决 Prometheus 在大规模环境下的存储、扩展性和高可用性问题。 它非常适合大规模集群监控需求尤其是需要长期存储监控数据和全局查询。 Thanos 的主要功能
全局查询Global Query View 通过其 Querier 组件提供从多个 Prometheus 实例查询的能力并能对跨多个数据源进行全局去重查询即使在大规模集群中运行多个 Prometheus 实例用户也可以从一个接口统一查询所有的监控数据 长期存储Unlimited Retention Prometheus 默认只适用于短期数据存储而 Thanos 提供了将监控数据推送到长期存储如 Amazon S3、Google Cloud Storage、MinIO 等对象存储的能力 Prometheus 集成Prometheus Compatible Grafana 和其他支持 Prometheus 查询 API 的工具都可以通过 Thanos 查询 Prometheus 数据 数据压缩与去重Downsampling Compaction Thanos 的 Compactor 组件会定期对存储在对象存储中的数据进行压缩、去重和优化以减少存储开销并提高查询性能
Thanos 的架构组件 遵循 KISS 和 Unix 理念Thanos 由一组组件组成每个组件都扮演一个特定的角色 Sidecar 与每个 Prometheus 实例一起部署负责将数据推送到对象存储并暴露出 Prometheus 的数据给 Querier Store Gateway 简称为 Store专门用于从对象存储如 AWS S3、Google Cloud Storage、MinIO 等中检索历史监控数据的组件 Compactor 负责对存储在对象存储中的数据进行压缩、去重和优化提升查询性能并减少存储开销 Receiver 专门用于接收和存储 Prometheus 实例通过 Remote Write 发送数据的组件强烈建议使用 Prometheus v2.13.0因为它的远程读取功能得到了改进。 Ruler/Rule 类似 Prometheus 的 Alertmanager它允许用户基于存储的数据执行告警和规则评估 Querier/Query 一个用于全局查询的组件能够从多个 Prometheus 实例和对象存储中提取数据并提供统一的查询接口 Query Frontend Query 的前端页面通过查询分片、缓存和请求队列等机制加速复杂查询并提升查询在高负载环境下的响应速度
Thanos 部署架构
Sidecar Sidecar 使用 Prometheus 的 reload 接口。确保 Prometheus 启用 --web.enable-lifecycle 参数 优点 轻量级Sidecar 是一个轻量的代理只需要运行在 Prometheus 实例旁边即可无需对 Prometheus 进行大的改动。实时数据访问Sidecar 允许 Thanos 直接访问 Prometheus 的实时监控数据保证了最新监控信息的可查询性。长期存储集成可以将 Prometheus 的数据定期上传到对象存储解决了 Prometheus 原生不具备长期存储的缺陷。 缺点 依赖 PrometheusSidecar 必须依赖于运行的 Prometheus 实例如果 Prometheus 实例宕机Sidecar 也无法提供数据查询功能。水平扩展有限Sidecar 并不设计用于大规模数据接收它主要是作为 Prometheus 的配套组件无法像 Receiver 那样水平扩展来处理大量的数据。
Receive 优点 大规模数据接收Receiver 能够高效接收大量来自 Prometheus 实例的数据适用于大规模部署。多租户支持可以处理和隔离多个租户的数据在需要监控多个独立环境时非常有用。水平扩展通过数据分片和扩展 Receiver 实例能够处理越来越多的数据接收任务。去重和高可用性Receiver 能够通过去重机制确保多实例高可用性并避免重复数据存储。 缺点 无直接查询功能Receiver 本身不具备查询功能接收到的数据需要依赖其他 Thanos 组件如 Querier 和 Store进行查询和分析。 实时性较低相比直接从 Prometheus 实例查询数据Receiver 可能在数据处理和查询时存在一定的延迟。 Sidecar 与 Receiver 的区别对比抄自 ChatGPT 特性Thanos SidecarThanos Receiver主要功能集成 Prometheus 实例提供实时数据访问和长期存储接收 Prometheus 实例的远程写入数据并存储数据源直接从 Prometheus 获取数据Prometheus 的 Remote Write 数据数据存储方式定期上传 Prometheus 数据块到对象存储将接收到的数据存储在本地或对象存储中水平扩展性无法扩展只与单个 Prometheus 实例集成可以通过增加实例水平扩展实时数据查询支持 Prometheus 实时数据查询无法直接查询数据多租户支持不支持支持适用于多租户环境高可用性依赖 Prometheus 实例支持高可用部署和去重机制适用场景与现有 Prometheus 实例集成长期存储数据大规模、多租户环境的数据接收和存储
架构选择 多集群 thanos 监控告警实践打造云原生大型分布式监控系统 (三): Thanos 部署与实践以下的建议取自这两个博客具体的架构选择也只能大家根据自己的实际情况验证和判断Sidecar 与 Receiver 的最主要的区分就是最新数据的查询方式不同 Sidecar 最新数据直接读取 Promethues 数据目录Receiver 的所有数据都在存储服务里面S3 等存储服务 Prometheus 集群不大采集的服务不多的情况下即使 Sidecar 和 全局查询的 Query 不在一个机房只要都是国内的查询延迟一般不会太高Prometheus 集群很大要采集的数据也非常多的情况下尽可能还是选择 Sidecar 架构因为数据一旦激增Receiver 的压力会非常非常大需要很大的资源也需要很强大的存储性能除非主要目的是针对指标历史做分析使用或者 Prometheus 有某些特殊场景无法持久化数据这些以外的场景建议使用 Sidecar 开始部署 采用 sidecar 模式部署 部署架构 考虑用 Prometheus 自带的 rule 做告警这边没打算部署 Thanos-rule k8s 集群 Ak8s 集群 BPrometheus:v2.54.1Prometheus:v2.54.1node-exporter:v1.8.2node-exporter:v1.8.2kube-state-metrics:v2.11.0kube-state-metrics:v2.11.0Thanos-sidecar:v0.36.1Thanos-sidecar:v0.36.1Thanos-query:v0.36.1Thanos-query:v0.36.1Thanos-store-gateway:v0.36.1Thanos-store-gateway:v0.36.1Thanos-compact:v0.36.1Thanos-query-globle:v0.36.1Thanos-query-frontend:v0.36.1Grafana MinIO 部署可以看我之前的博客k8s 1.28.2 集群部署 MinIO 分布式集群先提前准备好 MinIO 集群 创建 namespace 以下所有的 k 命令都代表 kubectl 命令部署这块只展示一个环境的我这边是两套 k8s 集群需要部署两套 Prometheus k create ns monitornode-exporter 部署
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: node-exportername: node-exporter-svcnamespace: monitoring
spec:clusterIP: Noneports:- name: httpport: 9100protocol: TCPselector:app.kubernetes.io/name: node-exportertype: ClusterIP
---
apiVersion: apps/v1
kind: DaemonSet
metadata:labels:app.kubernetes.io/name: node-exportername: node-exporternamespace: monitoring
spec:selector:matchLabels:app.kubernetes.io/name: node-exportertemplate:metadata:labels:app.kubernetes.io/name: node-exporterspec:containers:- args:- --path.rootfs/rootfs- --collector.filesystem.ignored-fs-types^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$image: docker.m.daocloud.io/prom/node-exporter:v1.8.2name: node-exporterports:- containerPort: 9100hostPort: 9100name: httpvolumeMounts:- mountPath: /rootfsname: rootreadOnly: truehostIPC: truehostNetwork: truehostPID: truevolumes:- hostPath:path: /name: rootkube-state-metrics 部署
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metrics-sanamespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metrics
rules:
- apiGroups:- resources:- configmaps- secrets- nodes- pods- services- serviceaccounts- resourcequotas- replicationcontrollers- limitranges- persistentvolumeclaims- persistentvolumes- namespaces- endpointsverbs:- list- watch
- apiGroups:- appsresources:- statefulsets- daemonsets- deployments- replicasetsverbs:- list- watch
- apiGroups:- batchresources:- cronjobs- jobsverbs:- list- watch
- apiGroups:- autoscalingresources:- horizontalpodautoscalersverbs:- list- watch
- apiGroups:- authentication.k8s.ioresources:- tokenreviewsverbs:- create
- apiGroups:- authorization.k8s.ioresources:- subjectaccessreviewsverbs:- create
- apiGroups:- policyresources:- poddisruptionbudgetsverbs:- list- watch
- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequestsverbs:- list- watch
- apiGroups:- discovery.k8s.ioresources:- endpointslicesverbs:- list- watch
- apiGroups:- storage.k8s.ioresources:- storageclasses- volumeattachmentsverbs:- list- watch
- apiGroups:- admissionregistration.k8s.ioresources:- mutatingwebhookconfigurations- validatingwebhookconfigurationsverbs:- list- watch
- apiGroups:- networking.k8s.ioresources:- networkpolicies- ingressclasses- ingressesverbs:- list- watch
- apiGroups:- coordination.k8s.ioresources:- leasesverbs:- list- watch
- apiGroups:- rbac.authorization.k8s.ioresources:- clusterrolebindings- clusterroles- rolebindings- rolesverbs:- list- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metrics
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-state-metrics
subjects:
- kind: ServiceAccountname: kube-state-metrics-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metricsnamespace: monitoring
spec:clusterIP: Noneports:- name: http-metricsport: 8080targetPort: http-metrics- name: telemetryport: 8081targetPort: telemetryselector:app.kubernetes.io/name: kube-state-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metricsnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: kube-state-metricstemplate:metadata:labels:app.kubernetes.io/name: kube-state-metricsspec:automountServiceAccountToken: truecontainers:- image: docker.m.daocloud.io/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.11.0imagePullPolicy: IfNotPresentlivenessProbe:httpGet:path: /livezport: http-metricsinitialDelaySeconds: 5timeoutSeconds: 5name: kube-state-metricsports:- containerPort: 8080name: http-metrics- containerPort: 8081name: telemetryreadinessProbe:httpGet:path: /readyzport: telemetryinitialDelaySeconds: 5timeoutSeconds: 5securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultnodeSelector:kubernetes.io/os: linuxserviceAccountName: kube-state-metrics-saPrometheus Thanos-Sidecar 部署
固定节点创建 label
k label node 192.168.22.125 prometheustrue生成 secret
MinIO 配置 因为包含 MinIO 的 access_key 和 secret_key尽量别用 configmap 去明文读取用 secret 读取一会输出的内容合并成一行后需要放到下面的 secret 里面去替换掉 cat EOF | base64 -
type: S3
config:bucket: prom-thanos-sidecarendpoint: minio.api.devops.icuaccess_key: gsl2dzAHviNzabSn0ikwsecret_key: 82zQ0UMDlOo3LxCQM9TqSygEYrMuxSSRYQdO1KXFinsecure: true
EOFetcd 证书 我是 kubeadm 部署的 k8s 集群我的证书路径是 /etc/kubernetes/pki/etcd我直接把本地文件生成 secret certs_dir/etc/kubernetes/pki/etcd; \
k create secret generic etcd-pki -n monitoring \
--from-fileca${certs_dir}/ca.crt \
--from-filecert${certs_dir}/server.crt \
--from-filekey${certs_dir}/server.key启动 Prometheus Thanos-Sidecar Prometheus 的数据存储用的是本地 hostpath 的方式由于 Thanos 需要读取 Prometheus 的数据所以要保持用户一致不然会因为权限问题Thanos 没法读取数据也没法将数据上传到 MinIO具体的报错参考ts2024-10-21T06:09:16.284378709Z callersidecar.go:410 levelwarn errupload 01JAP2JAZ0AQT8BEYFY30A4VVD: hard link block: hard link file chunks/000001: link /etc/prometheus/data/01JAP2JAZ0AQT8BEYFY30A4VVD/chunks/000001 /etc/prometheus/data/thanos/upload/01JAP2JAZ0AQT8BEYFY30A4VVD/chunks/000001: operation not permitted uploaded0 Prometheus 参数简介 --storage.tsdb.min-block-duration2h最小2小时生成一次新的数据块--storage.tsdb.max-block-duration2h最大2小时生成一次新的数据块--storage.tsdb.retention.time6hPrometheus 本地数据保留时长默认是15天这个可以自己根据实际磁盘情况调整--storage.tsdb.wal-compression启用 WAL 日志压缩减少 WAL 文件的大小降低存储空间的需求--storage.tsdb.no-lockfile禁用锁文件避免影响 Thanos 上传数据块到 MinIO--web.enable-lifecycle支持热更新 localhost:9090/-/reload 热加载配置文件 ---
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheus-sanamespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: cluster-admin
subjects:
- kind: ServiceAccountname: prometheus-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app: prometheusname: prometheus-svcnamespace: monitoring
spec:ports:- name: httpport: 9090targetPort: 9090- name: grpcport: 10901targetPort: 10901selector:app: prometheustype: ClusterIP
---
apiVersion: v1
data:prometheus.yml: |global:scrape_interval: 30sevaluation_interval: 30sscrape_timeout: 10sexternal_labels:cluster: devopsreplica: $(POD_NAME)rule_files:- /etc/prometheus/rules/*.ymlscrape_configs:- job_name: prometheuskubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_service_label_app]regex: prometheusaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.)target_label: __address__replacement: ${1}:9090- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: kube-apiserverkubernetes_sd_configs:- role: endpointsscheme: httpstls_config:insecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default;kubernetes;https- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: kubeletmetrics_path: /metrics/cadvisorscheme: httpstls_config:insecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.)- source_labels: [instance]action: replacetarget_label: node- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: etcdkubernetes_sd_configs:- role: podscheme: httpstls_config:ca_file: /etc/prometheus/etcd-ssl/cacert_file: /etc/prometheus/etcd-ssl/certkey_file: /etc/prometheus/etcd-ssl/keyinsecure_skip_verify: falserelabel_configs:- source_labels: [__meta_kubernetes_pod_label_component]regex: etcdaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.)target_label: __address__replacement: ${1}:2379- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: corednskubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_service_label_k8s_app]regex: kube-dnsaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.)target_label: __address__replacement: ${1}:9153- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: node-exporterkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.)- source_labels: [__address__]regex: (.*):10250replacement: ${1}:9100target_label: __address__action: replace- source_labels: [__meta_kubernetes_node_address_InternalIP]action: replacetarget_label: ip- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: kube-state-metricskubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: monitoring;kube-state-metricsaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.)target_label: __address__replacement: ${1}:8080- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace
kind: ConfigMap
metadata:name: prometheus-cmnamespace: monitoring
---
apiVersion: v1
data:config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:labels:app.kubernetes.io/name: prometheusname: thanos-confignamespace: monitoring
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app: prometheusname: prometheusnamespace: monitoring
spec:replicas: 1selector:matchLabels:app: prometheustemplate:metadata:labels:app: prometheusspec:affinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution:nodeSelectorTerms:- matchExpressions:- key: prometheusoperator: Invalues:- truepodAntiAffinity:requiredDuringSchedulingIgnoredDuringExecution:- labelSelector:matchExpressions:- key: appoperator: Invalues:- prometheustopologyKey: kubernetes.io/hostnamecontainers:- args:- --config.file/etc/prometheus/config/prometheus.yml- --storage.tsdb.path/etc/prometheus/data- --storage.tsdb.min-block-duration2h- --storage.tsdb.max-block-duration2h- --storage.tsdb.retention.time6h- --storage.tsdb.wal-compression- --storage.tsdb.no-lockfile- --web.enable-lifecyclecommand:- /bin/prometheusenv:- name: TZvalue: Asia/Shanghaiimage: quay.io/prometheus/prometheus:v2.54.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 60initialDelaySeconds: 5periodSeconds: 10successThreshold: 1tcpSocket:port: httptimeoutSeconds: 1name: prometheusports:- containerPort: 9090name: httpreadinessProbe:failureThreshold: 60initialDelaySeconds: 5periodSeconds: 10successThreshold: 1tcpSocket:port: httptimeoutSeconds: 1resources:limits:cpu: 500mmemory: 1024Mirequests:cpu: 100mmemory: 100MivolumeMounts:- mountPath: /etc/prometheus/dataname: prometheus-home- mountPath: /etc/prometheus/configname: prometheus-config- mountPath: /etc/prometheus/etcd-sslname: etcd-ssl- args:- sidecar- --log.levelinfo- --log.formatlogfmt- --grpc-address0.0.0.0:10901- --http-address0.0.0.0:10902- --tsdb.path/etc/prometheus/data- --prometheus.urlhttp://localhost:9090- --objstore.config-file/etc/thanos/config/thanos-sidecar.ymlimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentname: thanos-sidecarports:- containerPort: 10901name: grpcvolumeMounts:- mountPath: /etc/prometheus/dataname: prometheus-home- mountPath: /etc/thanos/config/thanos-sidecar.ymlname: thanos-configreadOnly: truesubPath: configimagePullSecrets:- name: harbor-secretinitContainers:- command:- sh- -c- [ -d /etc/prometheus/data/thanos ] || chown -R 65534:65534 /etc/prometheus/dataimage: quay.io/prometheus/prometheus:v2.54.1imagePullPolicy: IfNotPresentname: init-dirsecurityContext:runAsUser: 0volumeMounts:- mountPath: /etc/prometheus/dataname: prometheus-homesecurityContext:runAsUser: 65534serviceAccount: prometheus-saterminationGracePeriodSeconds: 0volumes:- hostPath:path: /approot/k8s_data/prometheustype: DirectoryOrCreatename: prometheus-home- configMap:name: prometheus-cmname: prometheus-config- name: thanos-configsecret:secretName: thanos-config- name: etcd-sslsecret:secretName: etcd-pkiThanos-store-gateway 部署 secret 里面涉及的内容和 sidecar 里面的是一样的记得替换成自己的 ---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gateway-sanamespace: monitoring
---
apiVersion: v1
data:config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-objstore-confignamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gateway-headlessnamespace: monitoring
spec:clusterIP: Noneports:- name: grpcport: 10901targetPort: grpc- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-store-gatewaytype: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gatewaynamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-store-gatewayserviceName: thanos-store-gateway-headlesstemplate:metadata:labels:app.kubernetes.io/name: thanos-store-gatewayspec:containers:- args:- store- --log.levelinfo- --log.formatlogfmt- --data-dir/var/thanos/store- --grpc-address0.0.0.0:10901- --http-address0.0.0.0:10902- --no-cache-index-header- --objstore.config-file/etc/thanos/objstore.yamlenv:- name: NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-store-gatewayports:- containerPort: 10901name: grpcprotocol: TCP- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultvolumeMounts:- mountPath: /etc/thanos/objstore.yamlname: objstore-configreadOnly: truesubPath: config- mountPath: /var/thanos/storename: datareadOnly: falsesecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-store-gateway-savolumes:- name: objstore-configsecret:secretName: thanos-objstore-config- emptyDir:sizeLimit: 100Miname: dataThanos-compact 部署
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gateway-sanamespace: monitoring
---
apiVersion: v1
data:config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-objstore-confignamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gateway-headlessnamespace: monitoring
spec:clusterIP: Noneports:- name: grpcport: 10901targetPort: grpc- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-store-gatewaytype: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gatewaynamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-store-gatewayserviceName: thanos-store-gateway-headlesstemplate:metadata:labels:app.kubernetes.io/name: thanos-store-gatewayspec:containers:- args:- store- --log.levelinfo- --log.formatlogfmt- --data-dir/var/thanos/store- --grpc-address0.0.0.0:10901- --http-address0.0.0.0:10902- --no-cache-index-header- --objstore.config-file/etc/thanos/objstore.yamlenv:- name: NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-store-gatewayports:- containerPort: 10901name: grpcprotocol: TCP- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultvolumeMounts:- mountPath: /etc/thanos/objstore.yamlname: objstore-configreadOnly: truesubPath: config- mountPath: /var/thanos/storename: datareadOnly: falsesecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-store-gateway-savolumes:- name: objstore-configsecret:secretName: thanos-objstore-config- emptyDir:sizeLimit: 100Miname: data
rootdream:/approot/chen2ha/kubetpl 13:58:08 # cat output/thanos-compact.yaml
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-compactname: thanos-compact-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-compactname: thanos-compact-headlessnamespace: monitoring
spec:clusterIP: Noneports:- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-compacttype: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/name: thanos-compactname: thanos-compactnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-compactserviceName: thanos-compact-headlesstemplate:metadata:labels:app.kubernetes.io/name: thanos-compactspec:containers:- args:- compact- --wait- --log.levelinfo- --log.formatlogfmt- --data-dir/var/thanos/compact- --http-address0.0.0.0:10902- --objstore.config-file/etc/thanos/objstore.yaml- --compact.enable-vertical-compaction- --deduplication.replica-labelreplica- --deduplication.funcpenalty- --delete-delay1d- --retention.resolution-raw7d- --retention.resolution-5m15d- --retention.resolution-1h30denv:- name: NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-compactports:- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultvolumeMounts:- mountPath: /etc/thanos/objstore.yamlname: objstore-configreadOnly: truesubPath: config- mountPath: /var/thanos/compactname: datareadOnly: falsesecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-compact-savolumes:- name: objstore-configsecret:secretName: thanos-objstore-config- emptyDir:sizeLimit: 100Miname: dataThanos-query 部署 --query.replica-label 参数指定依据哪个标签做数据的去重在 Prometheus 的 external_labels 里面配置的给 Thanos-query 的 gRPC 端口配一个独立的 svc 通过 nodeport 的方式暴露端口再由一个全局的 Thanos-query 来注册各个集群的 Thanos-query最终通过 Thanos-query-frontend 来查询 当然如果资源足够也完全可以每个集群再多部署一个 Thanos-query 来当作全局查询内外查询做一个分流 ---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-queryname: thanos-query-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-queryname: thanos-query-svcnamespace: monitoring
spec:ports:- name: grpcport: 10901targetPort: grpc- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-querytype: ClusterIP
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-queryname: thanos-query-np-svcnamespace: monitoring
spec:ports:- name: grpcnodePort: 31901port: 10901targetPort: grpcselector:app.kubernetes.io/name: thanos-querytype: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app.kubernetes.io/name: thanos-queryname: thanos-querynamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-querytemplate:metadata:labels:app.kubernetes.io/name: thanos-queryspec:containers:- args:- query- --log.levelinfo- --log.formatlogfmt- --grpc-address0.0.0.0:10901- --http-address0.0.0.0:10902- --query.replica-labelreplica- --endpointdnssrv_grpc._tcp.thanos-store-gateway-headless.monitoring.svc.cluster.local- --endpointdnssrv_grpc._tcp.prometheus-svc.monitoring.svc.cluster.localenv:- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-queryports:- containerPort: 10901name: grpcprotocol: TCP- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultsecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-query-saThanos-query-globle 部署 --endpoint 我是两个集群各挑了两个节点 ---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-query-globlename: thanos-query-globle-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-query-globlename: thanos-query-globle-svcnamespace: monitoring
spec:ports:- name: grpcport: 10901targetPort: grpc- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-query-globletype: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app.kubernetes.io/name: thanos-query-globlename: thanos-query-globlenamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-query-globletemplate:metadata:labels:app.kubernetes.io/name: thanos-query-globlespec:containers:- args:- query- --log.levelinfo- --log.formatlogfmt- --grpc-address0.0.0.0:10901- --http-address0.0.0.0:10902- --query.replica-labelreplica- --endpoint192.168.22.112:31901- --endpoint192.168.22.113:31901- --endpoint192.168.22.122:31901- --endpoint192.168.22.123:31901env:- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-query-globleports:- containerPort: 10901name: grpcprotocol: TCP- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultsecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-query-globle-saThanos-query-frontend 部署
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-query-frontendname: thanos-query-frontend-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-query-frontendname: thanos-query-frontend-svcnamespace: monitoring
spec:ports:- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-query-frontendtype: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app.kubernetes.io/name: thanos-query-frontendname: thanos-query-frontendnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-query-frontendtemplate:metadata:labels:app.kubernetes.io/name: thanos-query-frontendspec:containers:- args:- query-frontend- --log.levelinfo- --log.formatlogfmt- --http-address0.0.0.0:10902- --query-frontend.downstream-urlhttp://thanos-query-globle-svc.monitoring.svc.cluster.local:10902env:- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-query-frontendports:- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultsecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-query-frontend-sa
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: thanos-query-frontendnamespace: monitoring
spec:ingressClassName: nginxrules:- host: thanos.devops.icuhttp:paths:- backend:service:name: thanos-query-frontend-svcport:number: 10902path: /pathType: PrefixGrafana 部署 这边采用了 nfs 针对 dashboard 的 json 文件做了持久化有修改或者增加就比较方便直接上传到 nfs 就可以了 ---
apiVersion: v1
data:grafana.ini: |provisioning /etc/grafana/provisioning
kind: ConfigMap
metadata:name: grafana-cmnamespace: monitoring
---
apiVersion: v1
data:prometheus.yaml: |apiVersion: 1datasources:- name: Prometheustype: prometheusaccess: proxyurl: http://thanos-query-globle-svc.monitoring.svc.cluster.local:10902
kind: ConfigMap
metadata:name: grafana-datasourcenamespace: monitoring
---
apiVersion: v1
data:dashboards.yaml: |apiVersion: 1providers:- name: a unique provider nameorgId: 1folder: folderUid: type: filedisableDeletion: falseeditable: trueupdateIntervalSeconds: 10allowUiUpdates: trueoptions:# string, required path to dashboard files on disk. Requiredpath: /etc/grafana/provisioning/dashboards/views
kind: ConfigMap
metadata:name: grafana-dashboardnamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: grafananame: grafana-svcnamespace: monitoring
spec:ports:- port: 3000protocol: TCPtargetPort: http-grafanaselector:app.kubernetes.io/name: grafanatype: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/name: grafananame: grafananamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: grafanatemplate:metadata:labels:app.kubernetes.io/name: grafanaspec:containers:- env:- name: POD_NAMEvalueFrom:fieldRef:apiVersion: v1fieldPath: metadata.nameimage: docker.m.daocloud.io/grafana/grafana:11.3.0imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 3initialDelaySeconds: 30periodSeconds: 10successThreshold: 1tcpSocket:port: 3000timeoutSeconds: 1name: grafanaports:- containerPort: 3000name: http-grafanaprotocol: TCPreadinessProbe:failureThreshold: 3httpGet:path: /robots.txtport: 3000scheme: HTTPinitialDelaySeconds: 10periodSeconds: 30successThreshold: 1timeoutSeconds: 2resources:limits:cpu: 1000mmemory: 1024Mirequests:cpu: 250mmemory: 750MivolumeMounts:- mountPath: /etc/grafana/grafana.ininame: grafana-configsubPath: grafana.ini- mountPath: /etc/grafana/provisioning/datasources/prometheus.yamlname: grafana-datasourcesubPath: prometheus.yaml- mountPath: /etc/grafana/provisioning/dashboards/grafana-dashboard.yamlname: grafana-dashboardsubPath: dashboards.yaml- mountPath: /etc/grafana/provisioning/dashboards/viewsname: grafanasubPathExpr: $(POD_NAME)securityContext:fsGroup: 472supplementalGroups:- 0volumes:- configMap:name: grafana-cmname: grafana-config- configMap:name: grafana-datasourcename: grafana-datasource- configMap:name: grafana-dashboardname: grafana-dashboardvolumeClaimTemplates:- metadata:name: grafanaspec:accessModes:- ReadWriteOnceresources:requests:storage: 5GistorageClassName: nfs-client
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: grafananamespace: monitoring
spec:ingressClassName: nginxrules:- host: grafana.devops.icuhttp:paths:- backend:service:name: grafana-svcport:number: 3000path: /pathType: Prefix增加 Thanos 和 MinIO 监控 Prometheus 采集 MinIO 指标需要鉴权需要通过 mc 命令配置 JWT 认证可以查看官方文档mc admin prometheus generate 或者 MinIO 配置 MINIO_PROMETHEUS_AUTH_TYPEpublic 参数需要重启 MinIO 生效使 Prometheus 可以直接访问 metrics api - job_name: miniometrics_path: /minio/v2/metrics/clusterkubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: storage;minio-svcaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.)target_label: __address__replacement: ${1}:9000- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: thanos-querykubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: monitoring;thanos-query-svcaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.)target_label: __address__replacement: ${1}:10902- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: thanos-store-gatewaykubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: monitoring;thanos-store-gateway-headlessaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.)target_label: __address__replacement: ${1}:10902- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: thanos-compactkubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: monitoring;thanos-compact-headlessaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.)target_label: __address__replacement: ${1}:10902- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespaceGrafana dashboard 记录几个我这边配置的 dashboard id因为我这边是双 k8s 集群所以要加上 cluster 这个变量大部分都需要自己再细调一下 coredns 14981 etcd 用的官方给的模板grafana.json Thanos 12937 node-exporter 12633 或者 21902 16098 最后 yaml 和 dashboard 的 json 文件可以从 gitee 自取https://gitee.com/chen2ha/yaml_for_kubernetes/tree/master/thanos