快速建网站,ks数据分析神器,做金融服务网站赚钱,社交网站开发难度前言
本文以一次真实的内核宕机问题为切入点#xff0c;结合实际操作案例#xff0c;详细展示了如何利用工具 crash对内核转储#xff08;kdump#xff09;进行深入分析和调试的方法。通过对崩溃日志的解读、函数调用栈的梳理、关键地址的定位以及代码逻辑的排查#xff…前言
本文以一次真实的内核宕机问题为切入点结合实际操作案例详细展示了如何利用工具 crash对内核转储kdump进行深入分析和调试的方法。通过对崩溃日志的解读、函数调用栈的梳理、关键地址的定位以及代码逻辑的排查本文提供了一套系统化的内核问题分析思路和实用技巧。本指南基于 InLinux2312-LTS-SP1 版本旨在帮助读者快速掌握内核 kdump 问题的排查方法提升故障处理效率。
浪潮云启操作系统InLinux版本
以下操作步骤均基于InLinux2312-LTS-SP1版本在此版本上进行问题分析。
问题分析过程
问题现象
测试环境有3台服务器服务器存储配置为2*6.4T NVMe10*12T SATA盘基于bcache做缓存加速配置每块NVMe盘分了5分区每个nvme分区作为1块12T SATA盘的cache device。 因为需要提高单台服务器的存储密度所以将12T SATA盘更换为16T SATA盘。
现场操作步骤如下
1、创建bache设备。
make-bcache -C /dev/nvme2n1p1 -B /dev/sda --writeback --force --wipe-bcache/dev/sda为12T的SATA盘。 /dev/nvme2n1p1为nvme盘的第一个分区。分区大小为1024G。 分区命令为 parted -s --align optimal /dev/nvme2n1 mkpart primary 2048s 1024GiB
共10块硬盘2个nvme将每个nvme分区成5个分区共创建10个bcache设备。
2、在bcache0上执行fio测试
cat /home/script/run-fio-randrw.sh
bcache_name$1
if [ -z ${bcache_name} ];thenecho bcache_name is emptyexit -1
fifio --filename/dev/${bcache_name} --ioenginelibaio --rwrandrw --bs4k --size100% --iodepth128 --numjobs4 --direct1 --namerandrw --group_reporting --runtime30 --ramp_time5 --lockmem1G | tee -a ./randrw-iops_k1.log多次执行bash run-fio-randrw.sh bcache0 2、 关机
poweroff没有执行bcache数据清除操作
3、替换12T的SATA盘为16TSATA盘
关机后拔掉12T硬盘替换成16T的硬盘。
4、调整nvme2n1分区大小为1536G 分区执行完触发kernel panic
parted -s --align optimal /dev/nvme2n1 mkpart primary 2048s 1536GiB 5、重启系统不能正常进入系统。一直处于重启状态。 6、通过光盘进入rescue模式清除nvme2n1p1 超级块信息后。再次重新启动后可以正常进入系统。 wipefs -af /dev/nvme2n1p1 7、重新分区再次触发kernel panic。 parted -s --align optimal /dev/nvme2n1 mkpart primary 2048s 1536GiB 在另外两台服务器上执行同样操作未触发panic。 出问题的服务器加上cache_set结构体的root为空判断后能够正常进入系统。
日志分析
错误日志信息
[rootstorage-aqkp-002 127.0.0.1-2024-11-10-11:47:37]# cat vmcore-dmesg.txt |grep bcache
[ 21.365228] bcache: bch_journal_replay() journal replay done, 9 keys in 5 entries, seq 987460
[ 21.382581] bcache: register_cache() registered cache device nvme3n1p4
[ 21.524130] bcache: bch_journal_replay() journal replay done, 9 keys in 5 entries, seq 1019863
[ 21.535174] bcache: register_cache() registered cache device nvme3n1p2
[ 21.698388] bcache: bch_journal_replay() journal replay done, 9 keys in 5 entries, seq 1109121
[ 21.708619] bcache: register_cache() registered cache device nvme3n1p3
[ 21.868881] bcache: bch_journal_replay() journal replay done, 0 keys in 1 entries, seq 1127759
[ 21.879083] bcache: register_cache() registered cache device nvme3n1p5
[ 22.054332] bcache: bch_journal_replay() journal replay done, 9 keys in 5 entries, seq 1102627
[ 22.064518] bcache: register_cache() registered cache device nvme3n1p1
[ 249.369289] bcache: register_bcache() error : device already registered
[ 249.369415] bcache: register_bcache() error : device already registered
[ 249.370308] bcache: register_bcache() error : device already registered
[ 249.370517] bcache: register_bcache() error : device already registered
[ 249.371315] bcache: register_bcache() error : device already registered
[ 359.459929] nvme2n1:
[ 359.473124] nvme2n1: p1
[ 359.618056] bcache: prio_read() bad csum reading priorities
[ 359.624878] bcache: bch_cache_set_error() error on f774c122-6c02-469b-b798-ca53c10efa76: IO error reading priorities, disabling caching
[ 359.638311] bcache: register_cache() error nvme2n1p1: failed to run cache set
[ 359.646709] bcache: register_bcache() error : failed to register device
[ 359.658968] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000200
[ 359.669077] Mem abort info:
[ 359.672871] ESR 0x96000044
[ 359.676929] EC 0x25: DABT (current EL), IL 32 bits
[ 359.683221] SET 0, FnV 0
[ 359.687253] EA 0, S1PTW 0
[ 359.691368] Data abort info:
[ 359.695212] ISV 0, ISS 0x00000044
[ 359.700003] CM 0, WnR 1
[ 359.703909] user pgtable: 4k pages, 48-bit VAs, pgdp00002040022e2000
[ 359.711284] [0000000000000200] pgd0000000000000000, p4d0000000000000000
[ 359.719262] Internal error: Oops: 0000000096000044 [#1] SMP
[ 359.725760] Modules linked in: xt_set ipt_rpfilter xt_multiport iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth xt_statistic xt_nat xt_addrtype ip6table_nat ip6_tables ipt
able_mangle xt_physdev xt_conntrack xt_comment xt_mark iptable_filter nf_conntrack_netlink nfnetlink sch_ingress iptable_nat xt_MASQUERADE ip_tables rbd ceph libceph dns_resolver overlay openvswitch nsh n
f_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c 8021q garp mrp bonding vfat fat dm_multipath rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser rdma_
cm iw_cm ib_cm libiscsi scsi_transport_iscsi hns_roce_hw_v2 ib_uverbs ib_core bcache dm_mod crc64 ipmi_ssif ses enclosure aes_ce_blk aes_ce_cipher realtek acpi_ipmi hisi_sas_v3_hw hibmc_drm ghash_ce hclgesha1_ce hisi_sas_main nvme drm_vram_helper hns3 ipmi_si drm_ttm_helper nvme_core libsas hnae3 ipmi_devintf ttm host_edma_drv sg scsi_transport_sas i2c_designware_platform
[ 359.725845] nfit
[ 359.730936] bcache: register_bcache() error : device already registered
[ 359.815384] ipmi_msghandler i2c_designware_core hisi_uncore_ddrc_pmu hisi_uncore_hha_pmu hisi_uncore_l3c_pmu libnvdimm hisi_uncore_pmu sch_fq_codel br_netfilter bridge stp llc fuse ext4 mbcache jbd2 s
d_mod t10_pi ahci libahci sha2_ce sha256_arm64 sbsa_gwdt libata megaraid_sas(OE) aes_neon_bs aes_neon_blk crypto_simd cryptd
[ 359.833119] bcache: register_bcache() error : device already registered
[ 359.856792] CPU: 57 PID: 7773 Comm: kworker/57:2 Kdump: loaded Tainted: G OE 5.10.0-202.0.0.115.ile2312sp1.aarch64 #1
[ 359.856793] Hardware name: Enginetech EG920A-G20/BC82AMDDRA, BIOS 6.67 11/15/2023
[ 359.856819] Workqueue: events cache_set_flush [bcache]
[ 359.894922] pstate: 00400009 (nzcv daif PAN -UAO -TCO BTYPE--)
[ 359.901919] pc : cache_set_flush0x94/0x190 [bcache]
[ 359.907876] lr : cache_set_flush0x88/0x190 [bcache]
[ 359.913815] sp : ffff800046373d50
[ 359.918104] x29: ffff800046373d50 x28: 0000000000000000
[ 359.924380] x27: ffff800012213c48 x26: ffffbe503baba218
[ 359.930651] x25: ffff49cc48ca0808 x24: ffff49cc06674000
[ 359.936916] x23: ffff49cc48ca0808 x22: ffff49cc48ca0000
[ 359.943172] x21: ffff49cc48ca04a8 x20: 0000000000000000
[ 359.949419] x19: 0000000000000200 x18: 0000000000000000
[ 359.955662] x17: 0000000000000000 x16: ffffbe503a531760
[ 359.961896] x15: 0000000000000004 x14: ffff49cc00004990
[ 359.968123] x13: 0000000000000000 x12: ffff49cc3dd02a40
[ 359.974342] x11: ffff49cc3dd02910 x10: ffff2a0c0040b6c2
[ 359.980556] x9 : ffffbe503a591d88 x8 : ffff49cc3dd02938
[ 359.986770] x7 : ffff49cc07f03a18 x6 : 0000000000000000
[ 359.992977] x5 : ffff29cc59c16218 x4 : ffff49cc48ca0808
[ 359.999182] x3 : 0000000000000000 x2 : ffff49cc48ca0808
[ 360.004565] bcache: bch_journal_replay() journal replay done, 11 keys in 6 entries, seq 1096092
[ 360.005380] x1 : ffff49cc48ca0808 x0 : 0000000000000001
[ 360.016207] bcache: register_cache() registered cache device nvme2n1p3
[ 360.022922] Call trace:
[ 360.022934] cache_set_flush0x94/0x190 [bcache]
[ 360.022946] process_one_work0x1d8/0x4e0
[ 360.045082] bcache: register_bcache() error : device already registered
[ 360.045966] worker_thread0x154/0x420
[ 360.045970] kthread0x108/0x150
[ 360.046495] bcache: register_bcache() error : device already registered
[ 360.066044] bcache: register_bcache() error : device already registered
[ 360.066162] bcache: register_bcache() error : device already registered
[ 360.070249] ret_from_fork0x10/0x18
[ 360.070254] Code: 940043e2 72001c1f 54000700 f90006f3 (f9010297)
[ 360.090288] bcache: register_bcache() error : device already registered
[ 360.091355] bcache: register_bcache() error : device already registered
[ 360.097327] SMP: stopping secondary CPUs
[ 360.119238] Starting crashdump kernel...日志分析结果
代码正向分析
根据日志可以分析到问题函数调用栈 #mermaid-svg-QuwFVLG5pVIfOYMC {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-QuwFVLG5pVIfOYMC .error-icon{fill:#552222;}#mermaid-svg-QuwFVLG5pVIfOYMC .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-QuwFVLG5pVIfOYMC .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-QuwFVLG5pVIfOYMC .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-QuwFVLG5pVIfOYMC .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-QuwFVLG5pVIfOYMC .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-QuwFVLG5pVIfOYMC .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-QuwFVLG5pVIfOYMC .marker{fill:#333333;stroke:#333333;}#mermaid-svg-QuwFVLG5pVIfOYMC .marker.cross{stroke:#333333;}#mermaid-svg-QuwFVLG5pVIfOYMC svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-QuwFVLG5pVIfOYMC .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-QuwFVLG5pVIfOYMC .cluster-label text{fill:#333;}#mermaid-svg-QuwFVLG5pVIfOYMC .cluster-label span{color:#333;}#mermaid-svg-QuwFVLG5pVIfOYMC .label text,#mermaid-svg-QuwFVLG5pVIfOYMC span{fill:#333;color:#333;}#mermaid-svg-QuwFVLG5pVIfOYMC .node rect,#mermaid-svg-QuwFVLG5pVIfOYMC .node circle,#mermaid-svg-QuwFVLG5pVIfOYMC .node ellipse,#mermaid-svg-QuwFVLG5pVIfOYMC .node polygon,#mermaid-svg-QuwFVLG5pVIfOYMC .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-QuwFVLG5pVIfOYMC .node .label{text-align:center;}#mermaid-svg-QuwFVLG5pVIfOYMC .node.clickable{cursor:pointer;}#mermaid-svg-QuwFVLG5pVIfOYMC .arrowheadPath{fill:#333333;}#mermaid-svg-QuwFVLG5pVIfOYMC .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-QuwFVLG5pVIfOYMC .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-QuwFVLG5pVIfOYMC .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-QuwFVLG5pVIfOYMC .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-QuwFVLG5pVIfOYMC .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-QuwFVLG5pVIfOYMC .cluster text{fill:#333;}#mermaid-svg-QuwFVLG5pVIfOYMC .cluster span{color:#333;}#mermaid-svg-QuwFVLG5pVIfOYMC div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-QuwFVLG5pVIfOYMC :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} run_cache_set register_cache_set prio_read run_cache_set bch_cache_set_unregister bch_cache_set_stop register_cache_set __cache_set_unregister cache_set_flush list_add 用户态执行bcache-make注册bcache设备的时候会调用register_cache_set函数。
register_cache_set函数先进行uuid检查确保uuid的唯一性。调用bch_cache_set_alloc进行结构体成员初始化、closure回调函数注册等操作。在这里cacheing的closure回调函数设置为__cache_set_unregister然后运行run_cache_set。run_cache_set会先读取bcache硬盘上的日志文件初始化btree root结构。根据日志错误IO error reading priorities,cache结构体的root成员还没有被初始化。后面的cache_set_flush操作必然会导致内核panic。
static const char *register_cache_set(struct cache *ca)
{char buf[12];const char *err cannot allocate memory;struct cache_set *c;//uuid重复性检查list_for_each_entry(c, bch_cache_sets, list)if (!memcmp(c-set_uuid, ca-sb.set_uuid, 16)) {if (c-cache)return duplicate cache set member;goto found;}//内存里的缓存结构sb结构成员初始化c bch_cache_set_alloc(ca-sb);if (!c)return err;err error creating kobject;if (kobject_add(c-kobj, bcache_kobj, %pU, c-set_uuid) ||kobject_add(c-internal, c-kobj, internal))goto err;//增加监控统计信息/sys/block/bcache0/bcache/stats_{totalstats_five_minute, s//stats_day,stats_hour}if (bch_cache_accounting_add_kobjs(c-accounting, c-kobj))goto err;//初始化debugfs下bcache信息bch_debug_init_cache_set(c);//如果存在缓存集添加到缓存list成员中list_add(c-list, bch_cache_sets);
found://创建类似/sys/block/bcache0/bcache/cache/cache0/set目录链接和缓存集目录下的//cache0的目录链接sprintf(buf, cache%i, ca-sb.nr_this_dev);if (sysfs_create_link(ca-kobj, c-kobj, set) ||sysfs_create_link(c-kobj, ca-kobj, buf))goto err;//添加缓存结合和缓存集的映射关系kobject_get(ca-kobj);ca-set c;ca-set-cache ca;err failed to run cache set;if (run_cache_set(c) 0)goto err;return NULL;
err://出错后调用注销bcache设备操作bch_cache_set_unregister(c);return err;
}struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
{int iter_size;struct cache *ca container_of(sb, struct cache, sb);struct cache_set *c kzalloc(sizeof(struct cache_set), GFP_KERNEL);if (!c)return NULL;__module_get(THIS_MODULE);//初始化异步执行结构closure_init(c-cl, NULL);set_closure_fn(c-cl, cache_set_free, system_wq);closure_init(c-caching, c-cl);set_closure_fn(c-caching, __cache_set_unregister, system_wq);closure_init(c-caching, c-cl);set_closure_fn(c-caching, __cache_set_unregister, system_wq);在bch_cache_set_alloc函数中设置closure的回调函数为__cache_set_unregister。
void bch_cache_set_unregister(struct cache_set *c)
{set_bit(CACHE_SET_UNREGISTERING, c-flags);//停止bcache缓存盘和后端盘bch_cache_set_stop(c);
}run_cache_set函数在这个问题中返回err。
static int run_cache_set(struct cache_set *c)
{const char *err cannot allocate memory;struct cached_dev *dc, *t;struct cache *ca c-cache;struct closure cl;LIST_HEAD(journal);struct journal_replay *l;closure_init_stack(cl);c-nbuckets ca-sb.nbuckets;set_gc_sectors(c);if (CACHE_SYNC(c-cache-sb)) {struct bkey *k;struct jset *j;err cannot allocate memory for journal;if (bch_journal_read(c, journal))goto err;pr_debug(btree_journal_read() done\n);err no journal entries found;if (list_empty(journal))goto err;j list_entry(journal.prev, struct journal_replay, list)-j;err IO error reading priorities;if (prio_read(ca, j-prio_bucket[ca-sb.nr_this_dev]))goto err;/** If prio_read() fails itll call cache_set_error and well* tear everything down right away, but if we perhaps checked* sooner we could avoid journal replay.*/k j-btree_root;err bad btree root;if (__bch_btree_ptr_invalid(c, k))goto err;err error reading btree root;//这里初始化cache_set的root成员前面如果出错就不会初始化。root指针为空。c-root bch_btree_node_get(c, NULL, k,j-btree_level,true, NULL);if (IS_ERR_OR_NULL(c-root))goto err;list_del_init(c-root-list);rw_unlock(true, c-root);。err:while (!list_empty(journal)) {l list_first_entry(journal, struct journal_replay, list);list_del(l-list);kfree(l);}closure_sync(cl);bch_cache_set_error(c, %s, err);return -EIO;
}执行run_cache_set出错后bcache会执行bch_cache_set_unregister函数注销bcache设备。bch_cache_set_unregister调用bch_cache_set_stop在bch_cache_set_stop中调用之前注册的__cache_set_unregister异步回调函数完成bcache设备注销操作。
void bch_cache_set_stop(struct cache_set *c)
{if (!test_and_set_bit(CACHE_SET_STOPPING, c-flags))/* closure_fn set to __cache_set_unregister() */closure_queue(c-caching);//异步回调机制调用之前的注册按照函数注册的前后顺序执行
}static inline void closure_queue(struct closure *cl)
{struct workqueue_struct *wq cl-wq;/*** Changes made to closure, work_struct, or a couple of other structs* may cause work.func not pointing to the right location.*/BUILD_BUG_ON(offsetof(struct closure, fn)! offsetof(struct work_struct, func));if (wq) {INIT_WORK(cl-work, cl-work.func);BUG_ON(!queue_work(wq, cl-work));} elsecl-fn(cl);//这里会执行注册的__cache_set_unregister异步回调函数
}crash 逆向分析问题
ARM寄存器介绍 X0到X7为传递参数和结果的寄存器X19和X28为调用函数时传递参数的寄存器。
FPX29为栈帧寄存器LRX30为链接寄存器。
在 ARM 架构中FPFrame Pointer和 LRLink Register是用于函数调用和堆栈帧管理的两个重要寄存器 FPFrame Pointer通常指向当前函数调用的堆栈帧的开始位置。每当一个新函数被调用时FP 会被推送到堆栈中并在调用函数时被设置为当前函数的堆栈帧的起始地址。FP 可以用于追踪堆栈中函数调用的链条帮助在调试时查看调用历史。 LRLink Register存储函数返回的地址。当函数被调用时LR 会存储当前指令的下一条指令的地址。函数返回时会将 LR 的值复制到程序计数器PC中从而返回到调用者的位置。 PCProgram Counter这是 ARM64 架构中的程序计数器寄存器记录了当前执行的指令地址。在崩溃时PC 指向的地址是导致错误的指令位置。
汇编指令
(1) stp指令
在 ARM 架构的汇编语言中stp 是一种指令用于将两个寄存器的值存储到内存中。具体来说stp 代表 Store Pair存储一对数据。常用于函数调用时保存寄存器特别是当需要同时保存多个寄存器时。它有助于优化代码减少多次 str 指令的使用。
语法
stp reg1, reg2, [address, offset]reg1 和 reg2要存储的两个寄存器的内容。
[address, offset]存储目标的内存地址可以使用一个基地址和偏移量来指定。
示例stp x19, x20, [sp, #16]
解释
x19 和 x20 是待存储的两个寄存器。[sp, #16] 表示内存地址基地址是 sp堆栈指针寄存器并且偏移量是 #16。
具体来说stp x19, x20, [sp, #16] 的操作是
将 x19 的值存储到栈上偏移量为 16 字节。将 x20 的值存储到栈上紧接在 x19 后面即 x19 存储后面的内存位置是 x20 的位置。
内存布局
假设 sp 当前的值是 0xffffbe50121fa000执行这条指令后 x19 的值会被存储到 0xffffbe50121fa010sp 16。x20 的值会被存储到 0xffffbe50121fa018sp 24。
使用场景
stp 通常用于保存一对寄存器的内容尤其在函数调用时保存寄存器的值例如保存返回地址、寄存器的内容等以便在函数返回时恢复这些寄存器的值。
举例说明
在函数调用过程中通常会有类似于以下的代码来保存寄存器的状态
stp x19, x20, [sp, #-16]!这条指令的意思是
将 x19 和 x20 存储到当前堆栈指针 sp 减去 16 字节的地址处同时更新 sp即 sp sp - 16。如果使用的是 !例如 [sp, #-16]!这表示在存储后立即更新 sp。
在栈帧的保存与恢复中stp 用来高效地处理多个寄存器的保存可以减少使用单独的 str 指令来存储每个寄存器的次数。
(2) ldp指令
stp 对应的加载指令是 ldpLoad Pair用于从内存中加载一对数据到两个寄存器中。用法类似只不过是从内存读取数据。
示例
ldp x19, x20, [sp, #16]这条指令将内存中 sp 16 处的数据加载到 x19 和 x20 寄存器中。crash分析调用栈步骤
调试kdump vmcore文件需要安装crash命令和kernel debuginfo rpm安装包。
yum install crash kernel-debuginfo kernel-debugsource -y[ 359.992977] x5 : ffff29cc59c16218 x4 : ffff49cc48ca0808
[rootstorage-aqkp-002 127.0.0.1-2024-11-10-11:47:37]# crash /usr/lib/debug/lib/modules/5.10.0-202.0.0.115.ile2312sp1.aarch64/vmlinux /var/crash/127.0.0.1-2024-11-10-11\:47\:37/vmcorecrash 8.0.2-1.ile2312sp1
Copyright (C) 2002-2022 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
Copyright (C) 2015, 2021 VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter help copying to see the conditions.
This program has absolutely no warranty. Enter help warranty for details.GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type show copying and show warranty for details.
This GDB was configured as aarch64-unknown-linux-gnu.
Type show configuration for configuration details.
Find the GDB manual and other documentation resources online at:http://www.gnu.org/software/gdb/documentation/.For help, type help.
Type apropos word to search for commands related to word...WARNING: kernel version inconsistency between vmlinux and dumpfileKERNEL: /usr/lib/debug/lib/modules/5.10.0-202.0.0.115.ile2312sp1.aarch64/vmlinux [TAINTED]DUMPFILE: /var/crash/127.0.0.1-2024-11-10-11:47:37/vmcore [PARTIAL DUMP]CPUS: 96DATE: Sun Nov 10 11:46:56 CST 2024UPTIME: 00:06:00
LOAD AVERAGE: 0.15, 0.28, 0.17TASKS: 1763NODENAME: storage-aqkp-002RELEASE: 5.10.0-202.0.0.115.ile2312sp1.aarch64VERSION: #1 SMP Mon Jun 17 01:51:52 UTC 2024MACHINE: aarch64 (unknown Mhz)MEMORY: 704 GBPANIC: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000200PID: 7773COMMAND: kworker/57:2TASK: ffff49cc44d69340 [THREAD_INFO: ffff49cc44d69340]CPU: 57STATE: TASK_RUNNING (PANIC)crash mod -s bcache /usr/lib/debug/lib/modules/5.10.0-202.0.0.115.ile2312sp1.aarch64/kernel/drivers/md/bcache/bcache.ko-5.10.0-202.0.0.115.ile2312sp1.aarch64.debugMODULE NAME BASE SIZE OBJECT FILE
ffffbe501221b040 bcache ffffbe50121e2000 319488 /usr/lib/debug/lib/modules/5.10.0-202.0.0.115.ile2312sp1.aarch64/kernel/drivers/md/bcache/bcache.ko-5.10.0-202.0.0.115.ile2312sp1.aarch64.debug
crash
crash bt
PID: 7773 TASK: ffff49cc44d69340 CPU: 57 COMMAND: kworker/57:2#0 [ffff800046373800] machine_kexec at ffffbe5039eb54a8#1 [ffff8000463739b0] __crash_kexec at ffffbe503a052824#2 [ffff8000463739e0] crash_kexec at ffffbe503a0529cc#3 [ffff800046373a60] die at ffffbe5039e9445c#4 [ffff800046373ac0] die_kernel_fault at ffffbe5039ec698c#5 [ffff800046373af0] __do_kernel_fault at ffffbe5039ec6a38#6 [ffff800046373b20] do_page_fault at ffffbe503ac76ba4#7 [ffff800046373b70] do_translation_fault at ffffbe503ac76ebc#8 [ffff800046373b90] do_mem_abort at ffffbe5039ec68ac#9 [ffff800046373bc0] el1_abort at ffffbe503ac669bc
#10 [ffff800046373bf0] el1_sync_handler at ffffbe503ac671d4
#11 [ffff800046373d30] el1_sync at ffffbe5039e82230
#12 [ffff800046373d50] cache_set_flush at ffffbe50121fa4c4 [bcache]
#13 [ffff800046373da0] process_one_work at ffffbe5039f5af68
#14 [ffff800046373e00] worker_thread at ffffbe5039f5b3c4
#15 [ffff800046373e50] kthread at ffffbe5039f634b8
crash dis cache_set_flush0x94
0xffffbe50121fa4c8 cache_set_flush148: str x23, [x20, #512]
crash dis -s cache_set_flush0x94
FILE: ./include/linux/list.h
LINE: 7166 {67 if (!__list_add_valid(new, prev, next))68 return;69 70 next-prev new;
* 71 new-next next;72 new-prev prev;73 WRITE_ONCE(prev-next, new);74 }crash crash分析的时候除了安装kernel-debuginfo安装包外还需要加载模块调试信息。
#加载bcache调试信息
mod -s bcache /usr/lib/debug/lib/modules/5.10.0-202.0.0.115.ile2312sp1.aarch64/kernel/drivers/md/bcache/bcache.ko-5.10.0-202.0.0.115.ile2312sp1.aarch64.debug根据vmcore-message.txt中的出问题的函数地址ARM中为PC寄存器内容X86上为RIP寄存器内容。本次崩溃的函数地址为pc : cache_set_flush0x94/0x190通过dis -s cache_set_flush0x94就可以查看出错问题的调用栈。
然后结合汇编代码和vmcore-message.txt的寄存器内容对问题进行分析。
crash dis cache_set_flush
0xffffbe50121fa434 cache_set_flush: mov x9, x30
0xffffbe50121fa438 cache_set_flush4: nop
0xffffbe50121fa43c cache_set_flush8: paciasp
0xffffbe50121fa440 cache_set_flush12: stp x29, x30, [sp, #-80]!
0xffffbe50121fa444 cache_set_flush16: mov x29, sp
0xffffbe50121fa448 cache_set_flush20: stp x21, x22, [sp, #32]
0xffffbe50121fa44c cache_set_flush24: mov x21, x0
0xffffbe50121fa450 cache_set_flush28: sub x22, x0, #0x4a8
0xffffbe50121fa454 cache_set_flush32: stp x19, x20, [sp, #16]
0xffffbe50121fa458 cache_set_flush36: add x0, x22, #0x128
0xffffbe50121fa45c cache_set_flush40: stp x23, x24, [sp, #48]
0xffffbe50121fa460 cache_set_flush44: ldur x24, [x21, #-56]
0xffffbe50121fa464 cache_set_flush48: bl 0xffffbe50121f8e88 bch_cache_accounting_destroy
0xffffbe50121fa468 cache_set_flush52: add x0, x22, #0xc0
0xffffbe50121fa46c cache_set_flush56: bl 0xffffbe501220b890 bcache_device_free2504
0xffffbe50121fa470 cache_set_flush60: add x0, x22, #0x60
0xffffbe50121fa474 cache_set_flush64: bl 0xffffbe501220b6e0 bcache_device_free2072
0xffffbe50121fa478 cache_set_flush68: ldr x0, [x22, #2256]
0xffffbe50121fa47c cache_set_flush72: cbz x0, 0xffffbe50121fa48c cache_set_flush88
0xffffbe50121fa480 cache_set_flush76: cmn x0, #0x1, lsl #12
0xffffbe50121fa484 cache_set_flush80: b.hi 0xffffbe50121fa48c cache_set_flush88 // b.pmore
0xffffbe50121fa488 cache_set_flush84: bl 0xffffbe501220b56c bcache_device_free1700
0xffffbe50121fa48c cache_set_flush88: add x0, x22, #0x8, lsl #12
0xffffbe50121fa490 cache_set_flush92: ldr x20, [x0, #17656]
0xffffbe50121fa494 cache_set_flush96: cmn x20, #0x1, lsl #12
0xffffbe50121fa498 cache_set_flush100: b.hi 0xffffbe50121fa4d8 cache_set_flush164 // b.pmore
0xffffbe50121fa49c cache_set_flush104: str x25, [sp, #64]
0xffffbe50121fa4a0 cache_set_flush108: add x19, x20, #0x200
0xffffbe50121fa4a4 cache_set_flush112: add x25, x21, #0x360
0xffffbe50121fa4a8 cache_set_flush116: mov x0, x19
0xffffbe50121fa4ac cache_set_flush120: ldr x23, [x21, #864]
0xffffbe50121fa4b0 cache_set_flush124: mov x1, x25
0xffffbe50121fa4b4 cache_set_flush128: mov x2, x23
0xffffbe50121fa4b8 cache_set_flush132: bl 0xffffbe501220b440 bcache_device_free1400
0xffffbe50121fa4bc cache_set_flush136: tst w0, #0xff
0xffffbe50121fa4c0 cache_set_flush140: b.eq 0xffffbe50121fa5a0 cache_set_flush364 // b.none
0xffffbe50121fa4c4 cache_set_flush144: str x19, [x23, #8]
0xffffbe50121fa4c8 cache_set_flush148: str x23, [x20, #512]
0xffffbe50121fa4cc cache_set_flush152: str x25, [x20, #520]
0xffffbe50121fa4d0 cache_set_flush156: str x19, [x21, #864]
0xffffbe50121fa4d4 cache_set_flush160: ldr x25, [sp, #64]
0xffffbe50121fa4d8 cache_set_flush164: ldur x0, [x21, #-72]
0xffffbe50121fa4dc cache_set_flush168: tst w0, #0x8
0xffffbe50121fa4e0 cache_set_flush172: b.ne 0xffffbe50121fa4f8 cache_set_flush196 // b.any
0xffffbe50121fa4e4 cache_set_flush176: ldr x0, [x22, #2056]
0xffffbe50121fa4e8 cache_set_flush180: add x23, x21, #0x360
0xffffbe50121fa4ec cache_set_flush184: sub x19, x0, #0x200
0xffffbe50121fa4f0 cache_set_flush188: cmp x23, x0
0xffffbe50121fa4f4 cache_set_flush192: b.ne 0xffffbe50121fa584 cache_set_flush336 // b.any
0xffffbe50121fa4f8 cache_set_flush196: ldr x0, [x24, #2504]
0xffffbe50121fa4fc cache_set_flush200: cbz x0, 0xffffbe50121fa504 cache_set_flush208
0xffffbe50121fa500 cache_set_flush204: bl 0xffffbe501220b56c bcache_device_free1700
0xffffbe50121fa504 cache_set_flush208: add x19, x22, #0x8, lsl #12
0xffffbe50121fa508 cache_set_flush212: ldr x0, [x19, #18568]
0xffffbe50121fa50c cache_set_flush216: cbz x0, 0xffffbe50121fa52c cache_set_flush248
0xffffbe50121fa510 cache_set_flush220: mov x0, #0xc710 // #50960
0xffffbe50121fa514 cache_set_flush224: add x22, x22, x0
0xffffbe50121fa518 cache_set_flush228: mov x0, x22
0xffffbe50121fa51c cache_set_flush232: bl 0xffffbe501220b710 bcache_device_free2120
0xffffbe50121fa520 cache_set_flush236: ldr x1, [x19, #18216]
0xffffbe50121fa524 cache_set_flush240: mov x0, x22
0xffffbe50121fa528 cache_set_flush244: blr x1
0xffffbe50121fa52c cache_set_flush248: str xzr, [x21]
0xffffbe50121fa530 cache_set_flush252: str xzr, [x21, #24]
0xffffbe50121fa534 cache_set_flush256: dmb ish
0xffffbe50121fa538 cache_set_flush260: mov w1, #0x1 // #1
0xffffbe50121fa53c cache_set_flush264: mov x0, x21
0xffffbe50121fa540 cache_set_flush268: movk w1, #0x4000, lsl #16
0xffffbe50121fa544 cache_set_flush272: bl 0xffffbe50121ef064 closure_sub
0xffffbe50121fa548 cache_set_flush276: ldp x19, x20, [sp, #16]
0xffffbe50121fa54c cache_set_flush280: ldp x21, x22, [sp, #32]
0xffffbe50121fa550 cache_set_flush284: ldp x23, x24, [sp, #48]
0xffffbe50121fa554 cache_set_flush288: ldp x29, x30, [sp], #80
0xffffbe50121fa558 cache_set_flush292: autiasp
0xffffbe50121fa55c cache_set_flush296: ret
0xffffbe50121fa560 cache_set_flush300: mov x0, x19
0xffffbe50121fa564 cache_set_flush304: mov x1, #0x0 // #0
0xffffbe50121fa568 cache_set_flush308: bl 0xffffbe50121e9714 __bch_btree_node_write
0xffffbe50121fa56c cache_set_flush312: mov x0, x20
0xffffbe50121fa570 cache_set_flush316: bl 0xffffbe501220b704 bcache_device_free2108
0xffffbe50121fa574 cache_set_flush320: ldr x1, [x19, #512]
0xffffbe50121fa578 cache_set_flush324: sub x19, x1, #0x200
0xffffbe50121fa57c cache_set_flush328: cmp x23, x1
0xffffbe50121fa580 cache_set_flush332: b.eq 0xffffbe50121fa4f8 cache_set_flush196 // b.none
0xffffbe50121fa584 cache_set_flush336: add x20, x19, #0x90
0xffffbe50121fa588 cache_set_flush340: mov x0, x20
0xffffbe50121fa58c cache_set_flush344: bl 0xffffbe501220b4e8 bcache_device_free1568
0xffffbe50121fa590 cache_set_flush348: ldr x0, [x19, #176]
0xffffbe50121fa594 cache_set_flush352: tst w0, #0x2
0xffffbe50121fa598 cache_set_flush356: b.eq 0xffffbe50121fa56c cache_set_flush312 // b.none
0xffffbe50121fa59c cache_set_flush360: b 0xffffbe50121fa560 cache_set_flush300
0xffffbe50121fa5a0 cache_set_flush364: ldr x25, [sp, #64]
0xffffbe50121fa5a4 cache_set_flush368: b 0xffffbe50121fa4d8 cache_set_flush164
0xffffbe50121fa5a8 cache_set_flush372: nop
0xffffbe50121fa5ac cache_set_flush376: nop
0xffffbe50121fa5b0 cache_set_flush380: ldrsb w4, [x5, #2724]
0xffffbe50121fa5b4 cache_set_flush384: .inst 0xffffbe50 ; undefined
0xffffbe50121fa5b8 cache_set_flush388: nop
0xffffbe50121fa5bc cache_set_flush392: ldr x16, 0xffffbe50121fa5b0 cache_set_flush380
0xffffbe50121fa5c0 cache_set_flush396: br x16
crash pc : cache_set_flush0x94/0x190 说明程序在 cache_set_flush 函数执行到偏移 0x94 处的指令时崩溃。通过反汇编分析这个位置的指令是 str x23, [x20, #512]导致了对 NULL 指针的访问错误。
日志中的 pc : cache_set_flush0x94/0x190 表示崩溃发生时的程序计数器Program Counter, pc的值即当前执行的指令在函数 cache_set_flush 中的位置。具体含义如下
cache_set_flush0x94/0x190
cache_set_flush 是函数名表示程序当前正在 cache_set_flush 函数中执行。0x94 表示程序计数器位于 cache_set_flush 函数的偏移量 0x94即十六进制 148的指令上。/0x190 表示整个 cache_set_flush 函数的长度为 0x190即十六进制 400。这是函数的总长度用于提供一个相对参考帮助确定崩溃发生的位置在函数中的相对进度。
crash dis -s cache_set_flush0x94
FILE: ./include/linux/list.h
LINE: 7166 {67 if (!__list_add_valid(new, prev, next))68 return;69 70 next-prev new;
* 71 new-next next;72 new-prev prev;73 WRITE_ONCE(prev-next, new);74 }上面dis -s cache_set_flush0x94结果表示问题出现在链表的操作过程中。
从 crash dis cache_set_flush的输出我们可以看到 cache_set_flush 函数的反汇编代码。我们可以重点分析其中的几个部分找到发生崩溃的原因并且确认如何定位 NULL 指针访问。
汇编代码关键部分分析
寄存器内容 x0 用于传递第一个参数通常是指向 cache_set 结构体的指针在反汇编中可以看到它在多次操作中出现。 x20 被用来存储一个值并且在代码中多次出现。特别是 ldr x20, [x0, #17656]它指向了 cache_set 结构体的偏移量 0x17656并将其内容存入 x20。 x19 用来存储某些地址似乎是某种缓存或内存地址在后续代码中会被多次修改。 代码执行流程 初始化堆栈
stp x29, x30, [sp, #-80]!
mov x29, sp这部分代码保存了当前函数的返回地址和堆栈指针。把栈的空间往下延伸128字节然后把调用者的FP和LR压入栈。存放在栈顶向下的偏移128字节地方。
参数传递和指针计算
日志中x0 : 0000000000000001表示第一个参数x0寄存器的地址0000000000000001。
mov x21, x0
sub x22, x0, #0x4a8第一个参数减去0x4a8偏移也就是将caching成员偏移1192获取cache_set结构体的地址。x22寄存器存放cache_set结构体的地址。
x21存放的是第一个参数的地址。
日志中X22的地址为ffff49cc48ca0000。
crash struct -o cache_set
struct cache_set {[0] struct closure cl;[80] struct list_head list;[96] struct kobject kobj;[192] struct kobject internal;[288] struct dentry *debug;[296] struct cache_accounting accounting;[1120] unsigned long flags;[1128] atomic_t idle_counter;[1132] atomic_t at_max_writeback_rate;[1136] struct cache *cache;[1144] struct bcache_device **devices;[1152] unsigned int devices_max_used;[1156] atomic_t attached_dev_nr;[1160] struct list_head cached_devs;[1176] uint64_t cached_dev_sectors;[1184] atomic_long_t flash_dev_dirty_sectors;[1192] struct closure caching;[1272] struct closure sb_write;[1352] struct semaphore sb_write_mutex;[1376] mempool_t search;[1448] mempool_t bio_meta;[1520] struct bio_set bio_split;[1952] struct shrinker shrink;[2016] struct mutex bucket_lock;[2048] unsigned short bucket_bits;[2050] unsigned short block_bits;[2052] unsigned int btree_pages;[2056] struct list_head btree_cache;[2072] struct list_head btree_cache_freeable;[2088] struct list_head btree_cache_freed;296偏移为成员accounting。x0设置为cache_set结构体偏移296后的地址也就是c-accounting变量的地址。
crash struct -o cache_set
struct cache_set {[0] struct closure cl;[80] struct list_head list;[96] struct kobject kobj;[192] struct kobject internal;[288] struct dentry *debug;[296] struct cache_accounting accounting;访问内存和函数调用
add x0, x22, #0x128
stp x23, x24, [sp, #48]
ldur x24, [x21, #-56]
bl 0xffffbe50121f8e88 bch_cache_accounting_destroycrash struct -o cache_set.cache
struct cache_set {[1136] struct cache *cache;
}
crash struct -o cache_set.caching
struct cache_set {[1192] struct closure caching;
}对应C代码 bch_cache_accounting_destroy(c-accounting);从 x21 偏移 -56(1192-1136) 读取数据也就是从caching成员偏移到cache成员读取cache地址到寄存器x24。
调用函数 bch_cache_accounting_destroy函数参数地址为x0也就是c-accounting变量的地址。
0xffffbe50121fa468 cache_set_flush52: add x0, x22, #0xc0
0xffffbe50121fa46c cache_set_flush56: bl 0xffffbe501220b890 bcache_device_free25040xc0偏移对应cache_set结构体internal成员
crash struct -o cache_set.internal
struct cache_set {[192] struct kobject internal;
}上面汇编对应C代码 kobject_put(c-internal);0xffffbe50121fa470 cache_set_flush60: add x0, x22, #0x60
0xffffbe50121fa474 cache_set_flush64: bl 0xffffbe501220b6e0 bcache_device_free20720x60偏移对应的cache_set结构体的kobj成员。
crash struct -o cache_set.kobj
struct cache_set {[96] struct kobject kobj;
}上面汇编对应C代码
kobject_del(c-kobj);通过结构体成员偏移可以确认。
crash struct -o cache_set.kobj -x
struct cache_set {[0x60] struct kobject kobj;
}
crash struct -o cache_set.internal -x
struct cache_set {[0xc0] struct kobject internal;
}
crash 0xc0和0x60恰好和汇编代码参数偏移对得上。
NULL 检查和访问 x20
0xffffbe50121fa478 cache_set_flush68: ldr x0, [x22, #2256]
0xffffbe50121fa47c cache_set_flush72: cbz x0, 0xffffbe50121fa48c cache_set_flush88
0xffffbe50121fa480 cache_set_flush76: cmn x0, #0x1, lsl #12
0xffffbe50121fa484 cache_set_flush80: b.hi 0xffffbe50121fa48c cache_set_flush80 // b.pmoreldr x0, [x22, #2256]
cbz x0, 0xffffbe50121fa48c cache_set_flush88
cmn x0, #0x1, lsl #12
b.hi 0xffffbe50121fa48c cache_set_flush80
bl 0xffffbe501220b56c bcache_device_free1700crash struct -o cache_set
struct cache_set {[0] struct closure cl;[80] struct list_head list;...[296] struct cache_accounting accounting;...[2192] struct gc_stat gc_stats;[2240] size_t nbuckets;[2248] size_t avail_nbuckets;[2256] struct task_struct *gc_thread;这里从 x22 偏移 2256 读取值到 x0x22偏移2256得到gc_thread的地址。然后通过cbz 检查它是否为 NULL。如果为 NULL代码跳转至 cache_set_flush88否则进行进一步处理。
对应C代码 if (!IS_ERR_OR_NULL(c-gc_thread))kthread_stop(c-gc_thread);处理 x20 内容
0xffffbe50121fa48c cache_set_flush88: add x0, x22, #0x8, lsl #12
0xffffbe50121fa490 cache_set_flush92: ldr x20, [x0, #17656]
0xffffbe50121fa494 cache_set_flush96: cmn x20, #0x1, lsl #12
0xffffbe50121fa498 cache_set_flush100: b.hi 0xffffbe50121fa4d8 cache_set_flush156 // b.pmore
0xffffbe50121fa49c cache_set_flush104: str x25, [sp, #64]
0xffffbe50121fa4a0 cache_set_flush108: add x19, x20, #0x200
0xffffbe50121fa4a4 cache_set_flush112: add x25, x21, #0x360
0xffffbe50121fa4a8 cache_set_flush116: mov x0, x19
0xffffbe50121fa4ac cache_set_flush120: ldr x23, [x21, #864]
0xffffbe50121fa4b0 cache_set_flush124: mov x1, x25
0xffffbe50121fa4b4 cache_set_flush128: mov x2, x23
0xffffbe50121fa4b8 cache_set_flush132: bl 0xffffbe501220b440 bcache_device_free1400将 x22 寄存器的值加上 0x8 左移 12 位的结果并将其存储在 x0 中。x22 是指向 struct cache_set 的指针。
#0x8, lsl #12 表示对 0x8 进行左移 12 位结果是 0x8000。因此x0 x22 0x8000。此时 x22 是指向 struct cache_set 的指针而加上 0x8000 后0x8000176560xc4f8恰好就是结构体cache_set的成员root地址。这里 x20可以确认存储的就是结构体cache_set的成员root地址。
crash struct -o cache_set.root -x
struct cache_set {[0xc4f8] struct btree *root;
}接着检查 x20 是否符合某些条件。如果条件满足代码跳转至 cache_set_flush156。
cmn x20, #0x1, lsl #12 (偏移 0xffffbe50121fa494)作用将 x20 和 0x1 12即 0x1000进行加法运算并更新条件标志。
解释cmn 指令会更新条件标志这样可以在后续的分支指令中使用。x20 0x1000 的结果会影响标志位。
b.hi 0xffffbe50121fa4d8 (偏移 0xffffbe50121fa498)作用如果上面的 cmn 指令结果表明 x20 0x1000则跳转到偏移 0xffffbe50121fa4d8 处。
解释如果 x20 的值大于 0x1000跳转到 cache_set_flush156。
对应c代码 if (!IS_ERR_OR_NULL(c-root))list_add(c-root-list, c-btree_cache);static inline void list_add(struct list_head *new, struct list_head *head)
{__list_add(new, head, head-next);
}
static inline void __list_add(struct list_head *new,struct list_head *prev,struct list_head *next)
{if (!__list_add_valid(new, prev, next))return;next-prev new;new-next next;new-prev prev;WRITE_ONCE(prev-next, new);
}这里正好和dis -s cache_set_flush0x94出错位置的代码对应上。
另外从汇编代码中可以分析出X20表示cache_set的root成员地址而vmcore-message.txt中X20为 x20: 0000000000000000。
内存写入和进一步函数调用
str x19, [sp, #64]
str x23, [x20, #512]这两行代码将 x19 和 x23 的内容分别存储到堆栈和内存中的指定位置。
crash struct -o btree.list
struct btree {[512] struct list_head list;
}btree结构偏移512后为list访问root指针的512偏移时发生kernel崩溃。
分析结论
从 cache_set_flush 的反汇编可以看出在执行 ldr x20, [x0, #17656] 时x0 是指向 cache_set 结构体的指针而 x20 存储的是从该结构体偏移 17656 地址读取的内容也就是cache_set结构体的root成员地址。x20是 NULL操作list时导致kernel panic。
崩溃的原因由于 cache_set_flush 函数中访问了未初始化或空的指针x20需要检查 cache_set 结构体的初始化过程确认相关指针是否正确设置。如果指针为 NULL 或无效则需要修复初始化过程以避免这种错误。
总结
当内核出现kdump时一般按照下面步骤分析
1、分析内核崩溃时候的日志根据PC或者RIP寄存器定位出错问题函数地址。
2、根据现场操作确认导致崩溃的操作步骤。
3、梳理代码和函数调用栈。
4、通过crash命令分析转储文件确认导致问题的代码。