VMWARE上虚拟机发生CPU禁用The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.

alex Linux评论阅读模式

在vcenter上可以看待虚拟机发生CPU禁用

 
 

VMWARE上虚拟机发生CPU禁用The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.-图片1

 
 

esxi上vmware.log

2019-09-30T02:54:20.165Z| vcpu-1| I125: APIC THERMLVT write: 0x10000

2019-09-30T02:54:20.165Z| vcpu-3| I125: APIC THERMLVT write: 0x10000

2019-09-30T02:54:20.165Z| vcpu-4| I125: APIC THERMLVT write: 0x10000

2019-09-30T02:54:20.165Z| vcpu-5| I125: APIC THERMLVT write: 0x10000

2019-09-30T02:54:20.165Z| vcpu-0| I125: APIC THERMLVT write: 0x10000

2019-09-30T02:54:20.165Z| vcpu-2| I125: APIC THERMLVT write: 0x10000

2019-09-30T02:54:20.165Z| vcpu-6| I125: APIC THERMLVT write: 0x10000

2019-09-30T02:54:20.165Z| vcpu-0| I125: Vix: [248776 vmxCommands.c:7739]: VMAutomation_HandleCLIHLTEvent. Do nothing.

2019-09-30T02:54:20.165Z| vcpu-0| I125: MsgHint: msg.monitorevent.halt

2019-09-30T02:54:20.165Z| vcpu-0| I125+ The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.

 
 

解决办法:

  • Update the kernel package to at least kernel-3.10.0-514.el7 or newer as the patches have been pulled into the newer kernel versions.
  • The fixes for this issue have also been included in RHEL7.2 EUS kernel version 3.10.0-327.62.1.el7. The errata for this is (RHBA-2017:3256).
  • Lately, the issue with a similar symptom was also reported on the RHEL 7.5 kernel. However, it had a different root cause, which was investigated in the internal BZ 1636066 and is now described in our KCS 4094221.
  • As a workaround, try disabling Transparent HugePages until the patches can be applied.
  • If there are any third party, non-Red Hat shipped modules loaded, consider removing them. If after removing them, the server crashes again, please contact Red Hat.

     
     

     
     

     
     

     
     

    以下是根据官方步骤对dump进行分析

     
     

     
     

    [root@test /]# crash /usr/lib/debug/usr/lib/modules/3.10.0-327.el7.x86_64/vmlinux /var/crash/127.0.0.1-2019-09-28-09\:41\:08/vmcore <---使用CRASH命令进行调试

    crash> bt

    PID: 10620 TASK: ffff880232319700 CPU: 2 COMMAND: "dotenv-generato"

    #0 [ffff8801a070f610] machine_kexec at ffffffff81051beb

    #1 [ffff8801a070f670] crash_kexec at ffffffff810f2542

    #2 [ffff8801a070f740] oops_end at ffffffff8163e1a8

    #3 [ffff8801a070f768] no_context at ffffffff8162e2b8

    #4 [ffff8801a070f7b8] __bad_area_nosemaphore at ffffffff8162e34e

    #5 [ffff8801a070f800] bad_area_nosemaphore at ffffffff8162e4b8

    #6 [ffff8801a070f810] __do_page_fault at ffffffff81640fce

    #7 [ffff8801a070f868] do_page_fault at ffffffff81641113

    #8 [ffff8801a070f890] page_fault at ffffffff8163d408

    [exception RIP: down_read_trylock+9] <-----看到当时Panicked在这里

    RIP: ffffffff810aa989 RSP: ffff8801a070f948 RFLAGS: 00010206

    RAX: 0000000000000000 RBX: ffff8800846a0b40 RCX: ffff8800846a0b40

    RDX: 0000000000000001 RSI: 0000000000000301 RDI: 000000000000002d

    RBP: ffff8801a070f948 R8: 0000000000000015 R9: ffff8800846a0b40

    R10: ffff88023ffd8000 R11: 0000000000000000 R12: ffff8800846a0b41

    R13: ffffea0005ee5480 R14: 000000000000002d R15: 0000000000000000

    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000

    #9 [ffff8801a070f950] page_lock_anon_vma_read at ffffffff811a2e65 <--------最后

    #10 [ffff8801a070f980] try_to_unmap_anon at ffffffff811a3291

    #11 [ffff8801a070f9d0] try_to_unmap at ffffffff811a33dd

    #12 [ffff8801a070f9e8] migrate_pages at ffffffff811c7449

    #13 [ffff8801a070fa90] compact_zone at ffffffff8118f259

    #14 [ffff8801a070fae0] compact_zone_order at ffffffff8118f45c

    #15 [ffff8801a070fb80] try_to_compact_pages at ffffffff8118f811

    #16 [ffff8801a070fbe0] __alloc_pages_direct_compact at ffffffff816305c8

    #17 [ffff8801a070fc40] __alloc_pages_nodemask at ffffffff811734e8

    #18 [ffff8801a070fd78] alloc_pages_vma at ffffffff811b78ca

    #19 [ffff8801a070fde0] do_huge_pmd_anonymous_page at ffffffff811cc2d3

    #20 [ffff8801a070fe40] handle_mm_fault at ffffffff81196c78

    #21 [ffff8801a070fed0] __do_page_fault at ffffffff81640e22

    #22 [ffff8801a070ff28] do_page_fault at ffffffff81641113

    #23 [ffff8801a070ff50] page_fault at ffffffff8163d408

    RIP: 000000000040cac1 RSP: 000000c000042cf8 RFLAGS: 00010206

    RAX: 00007ffff7fbdd98 RBX: 0000000000000000 RCX: 000000c000400000

    RDX: 000000c000038700 RSI: 0000000000000008 RDI: 000000000065a020

    RBP: 000000c000042d88 R8: 0000000000000001 R9: 0000000000000001

    R10: 0000000000000200 R11: 0000000000000000 R12: 0000000000000040

    R13: 0000000000000001 R14: 0000000000000000 R15: 000000c0000603c0

    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

    crash>

     
     

    查看Panicked位置

    crash> dis -rl down_read_trylock+0x9

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/kernel/rwsem.c: 32

    0xffffffff810aa980 <down_read_trylock>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]

    0xffffffff810aa985 <down_read_trylock+5>: push %rbp

    0xffffffff810aa986 <down_read_trylock+6>: mov %rsp,%rbp

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/arch/x86/include/asm/rwsem.h: 83

    0xffffffff810aa989 <down_read_trylock+9>: mov (%rdi),%rax

    crash>

     
     

    VMWARE上虚拟机发生CPU禁用The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.-图片2

     
     

    从上面可以看到rwsem的地址被破坏而发生的Panicked

    RDI: 000000000000002d -----这个值看起来不是正常的

    crash> bt | awk '/exception RIP: down_read_trylock/,/RDI:/ {print}' | grep RDI

    RDX: 0000000000000001 RSI: 0000000000000301 RDI: 000000000000002d

    crash> eval 000000000000002d | grep binary

    binary: 0000000000000000000000000000000000000000000000000000000000101101

    crash>

     
     

    查看
    值是来之哪里 在执行调用page_lock_anon_vma_read发生Panicked

    crash> dis -rl ffffffff811a2e65

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 446

    0xffffffff811a2e10 <page_lock_anon_vma_read>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]

    0xffffffff811a2e15 <page_lock_anon_vma_read+5>: push %rbp

    0xffffffff811a2e16 <page_lock_anon_vma_read+6>: mov %rsp,%rbp

    0xffffffff811a2e19 <page_lock_anon_vma_read+9>: push %r14

    0xffffffff811a2e1b <page_lock_anon_vma_read+11>: push %r13

    0xffffffff811a2e1d <page_lock_anon_vma_read+13>: mov %rdi,%r13

    0xffffffff811a2e20 <page_lock_anon_vma_read+16>: push %r12

    0xffffffff811a2e22 <page_lock_anon_vma_read+18>: push %rbx

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 452

    0xffffffff811a2e23 <page_lock_anon_vma_read+19>: mov 0x8(%rdi),%r12

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 453

    0xffffffff811a2e27 <page_lock_anon_vma_read+23>: mov %r12,%rax

    0xffffffff811a2e2a <page_lock_anon_vma_read+26>: and $0x3,%eax

    0xffffffff811a2e2d <page_lock_anon_vma_read+29>: cmp $0x1,%rax

    0xffffffff811a2e31 <page_lock_anon_vma_read+33>: je 0xffffffff811a2e48 <page_lock_anon_vma_read+56>

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 447

    0xffffffff811a2e33 <page_lock_anon_vma_read+35>: xor %ebx,%ebx

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 505

    0xffffffff811a2e35 <page_lock_anon_vma_read+37>: mov %rbx,%rax

    0xffffffff811a2e38 <page_lock_anon_vma_read+40>: pop %rbx

    0xffffffff811a2e39 <page_lock_anon_vma_read+41>: pop %r12

    0xffffffff811a2e3b <page_lock_anon_vma_read+43>: pop %r13

    0xffffffff811a2e3d <page_lock_anon_vma_read+45>: pop %r14

    0xffffffff811a2e3f <page_lock_anon_vma_read+47>: pop %rbp

    0xffffffff811a2e40 <page_lock_anon_vma_read+48>: retq

    0xffffffff811a2e41 <page_lock_anon_vma_read+49>: nopl 0x0(%rax)

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/arch/x86/include/asm/atomic.h: 26

    0xffffffff811a2e48 <page_lock_anon_vma_read+56>: mov 0x18(%rdi),%eax

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 455

    0xffffffff811a2e4b <page_lock_anon_vma_read+59>: test %eax,%eax

    0xffffffff811a2e4d <page_lock_anon_vma_read+61>: js 0xffffffff811a2e33 <page_lock_anon_vma_read+35>

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 459

    0xffffffff811a2e4f <page_lock_anon_vma_read+63>: mov -0x1(%r12),%r14

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 458

    0xffffffff811a2e54 <page_lock_anon_vma_read+68>: lea -0x1(%r12),%rbx

    /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/mm/rmap.c: 460

    0xffffffff811a2e59 <page_lock_anon_vma_read+73>: add $0x8,%r14

    0xffffffff811a2e5d <page_lock_anon_vma_read+77>: mov %r14,%rdi

    0xffffffff811a2e60 <page_lock_anon_vma_read+80>: callq 0xffffffff810aa980 <down_read_trylock>

    0xffffffff811a2e65 <page_lock_anon_vma_read+85>: test %eax,%eax

    crash>

     
     

    通过对代码的追踪发现是程序去锁定内存已anon_vma

    VMWARE上虚拟机发生CPU禁用The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.-图片3

     
     

    查看获取 anon_vma

    crash> struct -o page.mapping

    struct page {

    [8] struct address_space *mapping;

    }

    crash> struct -o anon_vma

    struct anon_vma {

    [0] struct anon_vma *root;

    [8] struct rw_semaphore rwsem;

    [40] atomic_t refcount;

    [48] struct rb_root rb_root;

    }

    SIZE: 56

    crash>

     
     

     
     

    在继续进行跟踪

    crash> dis -r ffffffff811a3291

    0xffffffff811a3270 <try_to_unmap_anon>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]

    0xffffffff811a3275 <try_to_unmap_anon+5>: push %rbp

    0xffffffff811a3276 <try_to_unmap_anon+6>: mov %rsp,%rbp

    0xffffffff811a3279 <try_to_unmap_anon+9>: push %r15

    0xffffffff811a327b <try_to_unmap_anon+11>: push %r14

    0xffffffff811a327d <try_to_unmap_anon+13>: mov %esi,%r14d

    0xffffffff811a3280 <try_to_unmap_anon+16>: push %r13

    0xffffffff811a3282 <try_to_unmap_anon+18>: push %r12

    0xffffffff811a3284 <try_to_unmap_anon+20>: push %rbx

    0xffffffff811a3285 <try_to_unmap_anon+21>: mov %rdi,%rbx

    0xffffffff811a3288 <try_to_unmap_anon+24>: sub $0x18,%rsp

    0xffffffff811a328c <try_to_unmap_anon+28>: callq 0xffffffff811a2e10 <page_lock_anon_vma_read>

    0xffffffff811a3291 <try_to_unmap_anon+33>: mov %rax,%rcx

    crash>

     
     

    crash> whatis page_lock_anon_vma_read

    struct anon_vma *page_lock_anon_vma_read(struct page *);

     
     

    crash> dis -r ffffffff811a2e65

    0xffffffff811a2e10 <page_lock_anon_vma_read>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]

    0xffffffff811a2e15 <page_lock_anon_vma_read+5>: push %rbp

    0xffffffff811a2e16 <page_lock_anon_vma_read+6>: mov %rsp,%rbp

    0xffffffff811a2e19 <page_lock_anon_vma_read+9>: push %r14

    0xffffffff811a2e1b <page_lock_anon_vma_read+11>: push %r13

    0xffffffff811a2e1d <page_lock_anon_vma_read+13>: mov %rdi,%r13

    0xffffffff811a2e20 <page_lock_anon_vma_read+16>: push %r12

    0xffffffff811a2e22 <page_lock_anon_vma_read+18>: push %rbx <--- page* is pushed onto the stack

    0xffffffff811a2e23 <page_lock_anon_vma_read+19>: mov 0x8(%rdi),%r12 <--- 452 anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);

    0xffffffff811a2e27 <page_lock_anon_vma_read+23>: mov %r12,%rax

    0xffffffff811a2e2a <page_lock_anon_vma_read+26>: and $0x3,%eax

    0xffffffff811a2e2d <page_lock_anon_vma_read+29>: cmp $0x1,%rax

    0xffffffff811a2e31 <page_lock_anon_vma_read+33>: je 0xffffffff811a2e48 <page_lock_anon_vma_read+56>

    0xffffffff811a2e33 <page_lock_anon_vma_read+35>: xor %ebx,%ebx <--- 453 if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)

    0xffffffff811a2e35 <page_lock_anon_vma_read+37>: mov %rbx,%rax <--- 454 goto out;

    0xffffffff811a2e38 <page_lock_anon_vma_read+40>: pop %rbx

    0xffffffff811a2e39 <page_lock_anon_vma_read+41>: pop %r12

    0xffffffff811a2e3b <page_lock_anon_vma_read+43>: pop %r13

    0xffffffff811a2e3d <page_lock_anon_vma_read+45>: pop %r14

    0xffffffff811a2e3f <page_lock_anon_vma_read+47>: pop %rbp

    0xffffffff811a2e40 <page_lock_anon_vma_read+48>: retq

    0xffffffff811a2e41 <page_lock_anon_vma_read+49>: nopl 0x0(%rax)

    0xffffffff811a2e48 <page_lock_anon_vma_read+56>: mov 0x18(%rdi),%eax

    0xffffffff811a2e4b <page_lock_anon_vma_read+59>: test %eax,%eax

    0xffffffff811a2e4d <page_lock_anon_vma_read+61>: js 0xffffffff811a2e33 <page_lock_anon_vma_read+35>

    0xffffffff811a2e4f <page_lock_anon_vma_read+63>: mov -0x1(%r12),%r14 <---- 458 anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);

    0xffffffff811a2e54 <page_lock_anon_vma_read+68>: lea -0x1(%r12),%rbx <--- 458 anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);

    0xffffffff811a2e59 <page_lock_anon_vma_read+73>: add $0x8,%r14 <--- &root_anon_vma->rwsem

    0xffffffff811a2e5d <page_lock_anon_vma_read+77>: mov %r14,%rdi <--- %r14 is rwsem

    0xffffffff811a2e60 <page_lock_anon_vma_read+80>: callq 0xffffffff810aa980 <down_read_trylock>

    0xffffffff811a2e65 <page_lock_anon_vma_read+85>: test %eax,%eax

    crash>

     
     

    The page* was pushed onto the stack

    %r14 holds the anon_vma itself while %rbx holds a pointer to it from the mapping. The values are maintained before and through the call down_read_trylock

     
     

     
     

    crash> bt -f | awk '/page_lock_anon_vma_read/,/page_referenced/ {print}'

    #9 [ffff8801a070f950] page_lock_anon_vma_read at ffffffff811a2e65

    ffff8801a070f958: ffffea0005ee5480 ffffea0005ee5440

    >> %rbx << %r12

    ffff8801a070f968: ffffea00078b2c40 0000000000000301

    %r13 %r14

    ffff8801a070f978: ffff8801a070f9c8 ffffffff811a3291

    %rbp %rip

    #10 [ffff8801a070f980] try_to_unmap_anon at ffffffff811a3291

    ffff8801a070f988: ffffea0005ee5480 ffff8801a070fa50

    ffff8801a070f998: 0000000000000001 ffffea0005ee5480

    ffff8801a070f9a8: ffffea0005ee5440 ffffea00078b2c40

    ffff8801a070f9b8: 0000000000000000 0000000000000000

    ffff8801a070f9c8: ffff8801a070f9e0 ffffffff811a33dd

    #11 [ffff8801a070f9d0] try_to_unmap at ffffffff811a33dd

    ffff8801a070f9d8: ffffea0005ee5480 ffff8801a070fa88

    ffff8801a070f9e8: ffffffff811c7449

    #12 [ffff8801a070f9e8] migrate_pages at ffffffff811c7449

    ffff8801a070f9f0: ffff8800846a0b40 ffff880232319700

    ffff8801a070fa00: ffff880100000001 00000000a070fb00

    ffff8801a070fa10: 0000000000000000 0000000000000000

    ffff8801a070fa20: 000000000000000f ffff8801a070faf0

    ffff8801a070fa30: ffffffff8118e260 ffffea0005ee54a0

    ffff8801a070fa40: ffff8801a070fb00 0000000000000000

    ffff8801a070fa50: ffff880237052000 00000000641e072f

    ffff8801a070fa60: ffff88023ffd8000 ffff8801a070fb00

    ffff8801a070fa70: 0000000000140000 ffff8801a070faf0

    ffff8801a070fa80: ffff880232319700 ffff8801a070fad8

    ffff8801a070fa90: ffffffff8118f259

    #13 [ffff8801a070fa90] compact_zone at ffffffff8118f259

    ffff8801a070fa98: 00000000ab51b7e8 ffff8801a070fb00

    ffff8801a070faa8: 0000000000000020 ffff8801a070faf0

    ffff8801a070fab8: ffff8801a070fd17 ffff88023ffd8000

    ffff8801a070fac8: ffff88023ffd9008 0000000000000000

    ffff8801a070fad8: ffff8801a070fb78 ffffffff8118f45c

    #14 [ffff8801a070fae0] compact_zone_order at ffffffff8118f45c

     
     

    crash> page.mapping ffffea0005ee5480

    mapping = 0xffff8800846a0b41

    crash> bt | awk '/exception RIP: down_read_trylock/,/ORIG_RAX:/ {print}' | grep -e R14 -e RBX

    RAX: 0000000000000000 RBX: ffff8800846a0b40 RCX: ffff8800846a0b40

    R13: ffffea0005ee5480 R14: 000000000000002d R15: 0000000000000000

    crash>

     
     

    由以上可以得相关地址值

    page*: ffffea0005ee5480

    page->mapping: 0xffff8800846a0b41

    anon_vma->root: 000000000000002d

    &anon_vma->root: ffff8800846a0b40

     
     

    验证各值

    crash> kmem ffffea0005ee5480

    PAGE PHYSICAL MAPPING INDEX CNT FLAGS

    ffffea0005ee5480 17b952000 ffff8800846a0b41 7ffff33a6 2 2fffff00080009 locked,uptodate,swapbacked

    crash> kmem 000000000000002d

    PAGE PHYSICAL MAPPING INDEX CNT FLAGS

    ffffea0000000000 0 0 0 0 0

    crash> kmem 0xffff8800846a0b41

    CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE

    ffff880237058700 vm_area_struct 216 51394 51408 2856 4k

    SLAB MEMORY NODE TOTAL ALLOCATED FREE

    ffffea000211a800 ffff8800846a0000 0 18 18 0

    FREE / [ALLOCATED]

    [ffff8800846a0af8]

     
     

    PAGE PHYSICAL MAPPING INDEX CNT FLAGS

    ffffea000211a800 846a0000 0 0 1 1fffff00000080 slab

    crash> kmem ffff8800846a0b40

    CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE

    ffff880237058700 vm_area_struct 216 51394 51408 2856 4k

    SLAB MEMORY NODE TOTAL ALLOCATED FREE

    ffffea000211a800 ffff8800846a0000 0 18 18 0

    FREE / [ALLOCATED]

    [ffff8800846a0af8]

     
     

    PAGE PHYSICAL MAPPING INDEX CNT FLAGS

    ffffea000211a800 846a0000 0 0 1 1fffff00000080 slab

    crash>

     
     

    crash> anon_vma 000000000000002d

    struct: invalid kernel virtual address: 000000000000002d

    crash>

     
     

     
     

    以上分析过程是根据官方KB来操作,如有需求请参考官方KB

    https://access.redhat.com/solutions/2779851

     
     

     
     

     
     

     
     

     
     

     
     

     
     

文章末尾固定信息

 
alex
  • 本文由 alex 发表于 2019年10月12日 15:41:55
  • 转载请务必保留本文链接:https://www.qnjslm.com/ITHelp/1030.html
匿名

发表评论

匿名网友
:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
确定