网站的付款链接怎么做的,广州市企业网站制作公司,企业系统查询,长沙推广网站作者简介#xff1a;靖顺#xff0c;OcenaBase 开发工程师#xff0c;专注于数据库诊断与调优 1. 前言
在2024年初#xff0c;我与一线运维人员交流时#xff0c;他们纷纷提及在运维过程中遭遇的难题——OceanBase出现问题时#xff0c;排查工作不容易#xff0c;有时需… 作者简介靖顺OcenaBase 开发工程师专注于数据库诊断与调优 1. 前言
在2024年初我与一线运维人员交流时他们纷纷提及在运维过程中遭遇的难题——OceanBase出现问题时排查工作不容易有时需要依赖原厂的支持人员。然而线上交流效率不高故障排查的时间又尤为宝贵他们反馈说花费在信息采集上的时间过多这无疑影响了服务的SLA。因此我向他们推荐了obdiag这个工具并建议他们使用一行命令进行信息采集。几位支持人员给出了反馈他们认为目前obdiag的诊断信息采集虽然功能全面但各项采集任务是独立的如采集日志、主机信息、SQL信息等这对于单个问题的排查来说仍显繁琐。他们期望能针对常见的故障场景提供套餐式的一键采集功能以更高效地完成信息采集工作。
本着客户第一的原则。这个需求必须高效优先去做。就有了2024年1月31号发布的obdiag 1.6版本支持场景化的一键诊断信息采集。 2. obdiag 场景化信息采集使用
2.1 支持的场景列表
执行如下命令可查看支持的场景
obdiag gather scene list
结果如下
#obdiag gather scene list[Other Problem Gather Scenes]:
---------------------------------------------------------------------------------------
command info_en info_cn
---------------------------------------------------------------------------------------
obdiag gather scene run --sceneother.application_error [application error] [应用报错问题]
---------------------------------------------------------------------------------------[Obproxy Problem Gather Scenes]:
----------------------------------------------------------------------------------
command info_en info_cn
----------------------------------------------------------------------------------
obdiag gather scene run --sceneobproxy.restart [obproxy restart] [obproxy无故重启]
----------------------------------------------------------------------------------[Observer Problem Gather Scenes]:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
command info_en info_cn
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
obdiag gather scene run --sceneobserver.backup [backup problem] [数据备份问题]
obdiag gather scene run --sceneobserver.backup_clean [backup clean] [备份清理问题]
obdiag gather scene run --sceneobserver.clog_disk_full [clog disk full] [clog盘满]
obdiag gather scene run --sceneobserver.cluster_down [cluster down] [集群无法连接]
obdiag gather scene run --sceneobserver.compaction [compaction] [合并问题]
obdiag gather scene run --sceneobserver.cpu_high [High CPU] [CPU高]
obdiag gather scene run --sceneobserver.delay_of_primary_and_backup [delay of primary and backup] [主备库延迟]
obdiag gather scene run --sceneobserver.io [io problem] [io问题]
obdiag gather scene run --sceneobserver.log_archive [log archive] [日志归档问题]
obdiag gather scene run --sceneobserver.long_transaction [long transaction] [长事务]
obdiag gather scene run --sceneobserver.memory [memory problem] [内存问题]
obdiag gather scene run --sceneobserver.perf_sql --env {db_connect-hxx -Pxx -uxx -pxx -Dxx, trace_idxx} [SQL performance problem] [SQL性能问题]
obdiag gather scene run --sceneobserver.recovery [recovery] [数据恢复问题]
obdiag gather scene run --sceneobserver.restart [restart] [observer无故重启]
obdiag gather scene run --sceneobserver.rootservice_switch [rootservice switch] [有主改选或者无主选举的切主]
obdiag gather scene run --sceneobserver.sql_err --env {db_connect-hxx -Pxx -uxx -pxx -Dxx, trace_idxx} [SQL execution error] [SQL 执行出错]
obdiag gather scene run --sceneobserver.suspend_transaction [suspend transaction] [悬挂事务]
obdiag gather scene run --sceneobserver.unit_data_imbalance [unit data imbalance] [unit迁移/缩小 副本不均衡问题]
obdiag gather scene run --sceneobserver.unknown [unknown problem] [未能明确问题的场景]
--------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2.2 使用说明
运行如下命令即可一键采集某个场景下所有的故障信息
obdiag gather scene run --scene{SceneName}--scene{SceneName}SceneName 是对需要执行收集的场景Example1:
obdiag gather scene run --sceneobserver.unknown
选项说明如下
选项名是否必选数据类型默认值说明--scene是string默认为空场景名可以通过obdiag gather scene list 查看当前版本支持哪些场景--from否string默认为空日志收集的开始时间格式为: yyyy-mm-dd hh:mm:ss不需要加引号例如 1970-01-01 12:00:00。--to否string默认为空日志收集的结束时间格式为: yyyy-mm-dd hh:mm:ss不需要加引号例如 1970-01-01 13:00:00。--since否string默认为空日志收集最近的某段时间格式为: \n m\|h\|d其中n 表示待输入的时间数字m 表示分钟h 表示小时d 表示天例如 30m表示收集最近 30 分钟的日志。--env否string默认为空部分场景需要额外的参数才能支持统一放到了--env这个参数里边--store_dir否string默认为命令执行的当前路径存储结果的本地路径。-c否string~/.obdiag/config.yml配置文件路径
例子 应用报错问题
obdiag gather scene run --sceneother.application_errorobproxy无故重启
obdiag gather scene run --sceneobproxy.restart数据备份问题
obdiag gather scene run --sceneobserver.backup备份清理问题
obdiag gather scene run --sceneobserver.backup_cleanclog盘满
obdiag gather scene run --sceneobserver.clog_disk_full 合并问题
obdiag gather scene run --sceneobserver.compaction CPU高
obdiag gather scene run --sceneobserver.cpu_high主备库延迟
obdiag gather scene run --sceneobserver.delay_of_primary_and_backup 日志归档问题
obdiag gather scene run --sceneobserver.log_archive长事务
obdiag gather scene run --sceneobserver.long_transaction 内存问题
obdiag gather scene run --sceneobserver.memorySQL性能问题, 此处env中的trace_id对应gv$ob_sql_audit的trace_id
obdiag gather scene run --sceneobserver.perf_sql --env {db_connect-hxx -Pxx -uxx -pxx -Dxx, trace_idxx} 数据恢复问题
obdiag gather scene run --sceneobserver.recovery observer无故重启
obdiag gather scene run --sceneobserver.restart 有主改选或者无主选举的切主
obdiag gather scene run --sceneobserver.rootservice_switch SQL 执行出错 此处env中的trace_id对应gv$ob_sql_audit的trace_id
obdiag gather scene run --sceneobserver.sql_err --env {db_connect-hxx -Pxx -uxx -pxx -Dxx, trace_idxx} 悬挂事务
obdiag gather scene run --sceneobserver.suspend_transaction unit迁移/缩小 副本不均衡问题
obdiag gather scene run --sceneobserver.unit_data_imbalance 未能明确问题的场景
obdiag gather scene run --sceneobserver.unknownio问题
obdiag gather scene run --sceneobserver.io 3. 自定义添加场景
场景化信息采集有两种方式
yaml编排通过yaml的方式进行采集项的编排添加后执行器会按照编排顺序依次执行采集所需要的信息。适用于简单场景普通用户都可添加hardcode方式通过硬编码写python脚本的方式进行采集流程自主控制执行器执行的时候会自动跳转到hardcode模式进行采集。需要下载obdiag源代码新增后编译使用适用于开发者
3.1 yaml 编排添加场景
在用户目录下增加~/.obdiag/gather/tasks场景即可,注意一个yaml对应一个场景, 如下
.
├── obproxy
│ └── restart.yaml
├── observer
│ ├── backup_clean.yaml
│ ├── backup.yaml
│ ├── clog_disk_full.yaml
│ ├── cluster_down.yaml
│ ├── compaction.yaml
│ ├── delay_of_primary_and_backup.yaml
│ ├── io.yaml
│ ├── log_archive.yaml
│ ├── long_transaction.yaml
│ ├── memory.yaml
│ ├── recovery.yaml
│ ├── restart.yaml
│ ├── rootservice_switch.yaml
│ ├── suspend_transaction.yaml
│ ├── unit_data_imbalance.yaml
│ └── unknown.yaml
└── other└── application_error.yaml
可在observer增加一个~/.obdiag/gather/tasks/observer/test.yaml的场景
具体编写详情参见官网文档链接。
例子 info_en: [io problem]
info_cn: [io问题]
command: obdiag gather scene run --sceneobserver.io
task:- version: [2.0.0.0, 4.0.0.0]steps:- type: sqlsql: show variables like version_comment;global: true- type: sqlsql: SELECT * FROM oceanbase.v$ob_clusterglobal: true- type: sqlsql: SELECT * FROM oceanbase.__all_zone WHERE nameidc;global: true- type: sqlsql: select svr_ip,zone,with_rootserver,status,block_migrate_in_time,start_service_time,stop_time,build_version from oceanbase.__all_server order by zone;global: true- type: sqlsql: SELECT zone, concat(svr_ip, :, svr_port) observer, cpu_capacity, cpu_total, cpu_assigned, cpu_assigned_percent, mem_capacity, mem_total, mem_assigned, mem_assigned_percent, unit_Num, round(load, 2) load, round(cpu_weight, 2) cpu_weight, round(memory_weight, 2) mem_weight, leader_count FROM oceanbase.__all_virtual_server_stat ORDER BY zone,svr_ip;global: true- type: sqlsql: select tenant_id,tenant_name,primary_zone,compatibility_mode from oceanbase.__all_tenant;global: true- type: sqlsql: show parameters like %syslog_level%;global: true- type: sqlsql: show parameters like %syslog_io_bandwidth_limit%;global: true- type: sqlsql: select count(*),tenant_id,zone_list,unit_count from oceanbase.__all_resource_pool group by tenant_id,zone_list,unit_count;global: true- type: sshssh: df -hglobal: false- type: sshssh: cat /proc/sys/fs/aio-nrglobal: false- type: sshssh: cat /proc/sys/fs/aio-max-nrglobal: false- type: logglobal: falsegrep: IO- type: sysstatglobal: falsesysstat: - version: [4.0.0.0, *]steps:- type: sqlsql: show variables like version_comment;global: true- type: sqlsql: SELECT * FROM oceanbase.DBA_OB_ZONES;global: true- type: sqlsql: SELECT * FROM oceanbase.DBA_OB_SERVERS;global: true- type: sqlsql: SELECT * FROM oceanbase.GV$OB_SERVERS;global: true- type: sqlsql: SELECT * FROM oceanbase.DBA_OB_UNIT_CONFIGS;global: true- type: sqlsql: SELECT * FROM oceanbase.DBA_OB_RESOURCE_POOLS;global: true- type: sqlsql: SELECT * FROM oceanbase.DBA_OB_TENANTS;global: true- type: sqlsql: SELECT c.TENANT_ID, e.TENANT_NAME, concat(c.NAME, : , d.NAME) pool:conf,concat(c.UNIT_COUNT, unit: , d.min_cpu, C/, ROUND(d.MEMORY_SIZE/1024/1024/1024,0), G) unit_info FROM oceanbase.DBA_OB_RESOURCE_POOLS c, oceanbase.DBA_OB_UNIT_CONFIGS d, oceanbase.DBA_OB_TENANTS e WHERE c.UNIT_CONFIG_IDd.UNIT_CONFIG_ID AND c.TENANT_IDe.TENANT_ID AND c.TENANT_ID1000 ORDER BY c.TENANT_ID;global: true- type: sqlsql: SELECT a.TENANT_NAME,a.TENANT_ID,b.SVR_IP FROM oceanbase.DBA_OB_TENANTS a, oceanbase.GV$OB_UNITS b WHERE a.TENANT_IDb.TENANT_ID;global: true- type: sqlsql: show parameters like %syslog_level%;global: true- type: sqlsql: show parameters like %syslog_io_bandwidth_limit%;global: true- type: sqlsql: select * from __all_virtual_io_quota limit 20global: true- type: sshssh: df -hglobal: false- type: sshssh: cat /proc/sys/fs/aio-nrglobal: false- type: sshssh: cat /proc/sys/fs/aio-max-nrglobal: false- type: logglobal: falsegrep: IO- type: sysstatglobal: falsesysstat:
3.2 硬编码添加场景
源代码下载 GitHub - oceanbase/oceanbase-diagnostic-tool: OceanBase Diagnostic Tool is designed to help OceanBase users quickly gather necessary information and analyze the cause of the problem.
开发者在该目录下增加{project_dir}/handler/gather/scene场景, 建议一个场景一个.py文件。
│ ├── gather/ 一键收集功能
│ │ ├── gather_awr.py awr报告收集代码
│ │ ├── gather_log.py 日志收集代码
│ │ ├── gather_obadmin.py 解析clog/slog的代码
│ │ ├── gather_obproxy_log.py 收集obproxy代码
│ │ ├── gather_obstack2.py 收集堆栈信息的代码
│ │ ├── gather_perf.py 收集火焰图的代码
│ │ ├── gather_plan_monitor.py 收集并行SQL的代码
│ │ ├── gather_scenes.py 场景化收集的入口代码
│ │ ├── gather_sysstat.py 收集主机信息的代码
│ │ ├── scenes/ 场景化信息采集的处理代码
│ │ │ ├── base.py
│ │ │ ├── cpu_high.py cpu高场景
│ │ │ ├── list.py 场景列表展示的代码
│ │ │ ├── register.py 硬编码场景注册代码
│ │ │ └── sql_problem.py sql问题采集的代码
│ │ ├── step/ 场景化采集的执行器
│ │ │ ├── base.py
│ │ │ ├── sql.py sql执行器
│ │ │ └── ssh.py ssh执行器
│ │ └── tasks/ 收集场景的yaml文件 3.2.1 模版
#!/usr/bin/env python
# -*- coding: UTF-8 -*
# Copyright (c) 2022 OceanBase
# OceanBase Diagnostic Tool is licensed under Mulan PSL v2.
# You can use this software according to the terms and conditions of the Mulan PSL v2.
# You may obtain a copy of Mulan PSL v2 at:
# http://license.coscl.org.cn/MulanPSL2
# THIS SOFTWARE IS PROVIDED ON AN AS IS BASIS, WITHOUT WARRANTIES OF ANY KIND,
# EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
# MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
# See the Mulan PSL v2 for more details.
file: test.py
desc:
class TestScene(object):def __init__(self, nodes, cluster, report_path, task_variable_dictNone, argsNone, env{}):if task_variable_dict is None:self.task_variable_dict {}else:self.task_variable_dict task_variable_dictself.nodes nodesself.cluster clusterself.report_path report_pathself.args argsself.env envself.is_ssh Truedef execute(self): # 执行函数passdef xxx(self):pass 3.2.2 例子 #!/usr/bin/env python
# -*- coding: UTF-8 -*
# Copyright (c) 2022 OceanBase
# OceanBase Diagnostic Tool is licensed under Mulan PSL v2.
# You can use this software according to the terms and conditions of the Mulan PSL v2.
# You may obtain a copy of Mulan PSL v2 at:
# http://license.coscl.org.cn/MulanPSL2
# THIS SOFTWARE IS PROVIDED ON AN AS IS BASIS, WITHOUT WARRANTIES OF ANY KIND,
# EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
# MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
# See the Mulan PSL v2 for more details.
file: test.py
desc:import os
from utils.shell_utils import SshHelper
from common.logger import logger
from handler.gather.gather_obstack2 import GatherObstack2Handler
from handler.gather.gather_perf import GatherPerfHandlerclass TestScene(object):def __init__(self, nodes, cluster, report_path, task_variable_dictNone, argsNone, env{}):if task_variable_dict is None:self.task_variable_dict {}else:self.task_variable_dict task_variable_dictself.nodes nodesself.cluster clusterself.report_path report_pathself.args argsself.env envself.is_ssh Truedef execute(self): # 执行函数self.__gather_obstack() # 例self.__gather_perf()self.__gather_cmd_info()def __gather_obstack(self):logger.info(gather obstack start)obstack GatherObstack2Handler(nodesself.nodes, gather_pack_dirself.report_path, is_sceneTrue)obstack.handle(self.args)logger.info(gather obstack end)def __gather_perf(self):logger.info(gather perf start)perf GatherPerfHandler(nodesself.nodes, gather_pack_dirself.report_path, is_sceneTrue)self.args ParserAction.add_attribute_to_namespace(self.args, scope, all)perf.handle(self.args)logger.info(gather perf end) 4. 附录
obdiag 官方文档 OceanBase分布式数据库-海量数据 笔笔算数obdiag github地址 GitHub - oceanbase/oceanbase-diagnostic-tool: OceanBase Diagnostic Tool is designed to help OceanBase users quickly gather necessary information and analyze the cause of the problem. 第一篇如何修炼成“神医”——《OceanBase诊断系列》之一第二篇走进SQL审计视图——《OceanBase诊断系列》之二第三篇快速收集诊断信息敏捷诊断工具obdiag应用实践——《OceanBase诊断系列》之三第四篇如何快速分析OB集群日志敏捷诊断工具obdiag分析能力实践——《OceanBase诊断系列》之四第五篇防患未然OceanBase巡检工具应用实践——《OceanBase诊断系列》之五第六篇obdiag帮你读懂全链路诊断日志——《OceanBase诊断系列》之六第七篇如何排查合并问题——《OceanBase诊断系列》之七第八篇轻松掌握锁冲突问题的排查方法——《OceanBase诊断系列》之八第九篇obdiag如何实现一键采集20故障场景的诊断信息——《OceanBase诊断系列》之九