Table of Contents
Checking kernel version
[root@grac41 Desktop]# cat /etc/oracle-release Oracle Linux Server release 6.5 [root@grac41 Desktop]# uname -r 2.6.39-400.214.4.el6uek.x86_64 [root@grac41 Desktop]# rpm -qa | grep `uname -r` kernel-uek-devel-2.6.39-400.214.4.el6uek.x86_64 kernel-uek-2.6.39-400.214.4.el6uek.x86_64
Download and Install required packages
Install crash package [root@grac41 crash]# yum install crash Loaded plugins: refresh-packagekit, security adobe-linux-x86_64 | 951 B 00:00 public_ol6_UEKR3_latest | 1.2 kB 00:00 public_ol6_latest | 1.4 kB 00:00 public_ol6_latest/primary | 37 MB 00:34 public_ol6_latest 25211/25211 Setting up Install Process Package crash-6.1.0-5.0.1.el6.x86_64 already installed and latest version Nothing to do Download and install debuginfo and debuginfo-common package # export DLP="https://oss.oracle.com/ol6/debuginfo" # wget ${DLP}/kernel-uek-debuginfo-`uname -r`.rpm # wget ${DLP}/kernel-uek-debuginfo-common-`uname -r`.rpm [root@grac41 Desktop]# ls -l /KITS/RPMS total 312604 -rw-r--r--. 1 root root 208 May 7 10:45 INFO -rw-r--r--. 1 root root 278827216 Mar 26 04:33 kernel-uek-debuginfo-2.6.39-400.214.4.el6uek.x86_64.rpm -rw-r--r--. 1 root root 41264124 Mar 26 04:32 kernel-uek-debuginfo-common-2.6.39-400.214.4.el6uek.x86_64.rpm # rpm -Uhv kernel-uek-debuginfo-2.6.39-400.214.4.el6uek.x86_64.rpm kernel-uek-debuginfo-common-2.6.39-400.214.4.el6uek.x86_64.rpm Prepairing... ########################################### [100%] 1:kernel-uek-debuginfo-co########################################### [ 50%] 2:kernel-uek-debuginfo ########################################### [100%] [root@grac41 RPMS]# ls /usr/lib/debug/lib/modules/ 2.6.39-400.214.4.el6uek.x86_64
Perpare kdump by adding crashkernel=128m to /etc/grub.conf
Add crashkernel=128m to /etc/grub.conf title Oracle Linux Server (2.6.39-400.214.4.el6uek.x86_64) root (hd0,0) kernel /vmlinuz-2.6.39-400.214.4.el6uek.x86_64 ro root=/dev/mapper/vg_oel64-lv_root rd_NO_LUKS rd_NO_DM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=de rd_LVM_LV=vg_oel64/lv_swap rd_LVM_LV=vg_oel64/lv_root rhgb quiet numa=off transparent_hugepage=never crashkernel=128m --> reboot After reboot confirm kdump is active [root@grac41 Desktop]# service kdump status Kdump is operational [root@grac41 Desktop]# cat /sys/kernel/kexec_crash_loaded 1 [root@grac41 Desktop]# cat /proc/iomem | grep Crash 2e000000-35ffffff : Crash kernel [root@grac41 Desktop]# service kdump status Kdump is operational [root@grac41 Desktop]# cat /sys/kernel/kexec_crash_loaded 1 [root@grac41 Desktop]# cat /proc/iomem | grep Crash 2e000000-35ffffff : Crash kernel
Test kdump i.e. trigger a kernel crash
--> Clearly you shouldn't do this on a production machine! ### # echo 1 > /proc/sys/kernel/sysrq # echo c > /proc/sysrq-trigger --> System hangs immediated - reboot will take some time
Invoke crash utility and extract crash details
Check kernel modules and vmcore location [root@grac41 RPMS]# ls /usr/lib/debug/lib/modules/ 2.6.39-400.214.4.el6uek.x86_64 [root@grac41 RPMS]# ls /var/crash 127.0.0.1-2014-05-05-03:56:2 [root@grac41 RPMS]# ls -l /var/crash/127.0.0.1-2014-05-05-03:56:28 -rw-------. 1 root root 79861964 May 5 03:56 vmcore To determine the version of the kernel that produced a vmcore file: # crash --osrelease /var/crash/127.0.0.1-2014-05-05-03:56:28/vmcore 2.6.39-400.214.4.el6uek.x86_64 The appropriate vmlinux file must exist in /usr/lib/debug/lib/modules/kernel_version Invoke crash utility # crash /usr/lib/debug/lib/modules/2.6.39-400.214.4.el6uek.x86_64/vmlinux /var/crash/127.0.0.1-2014-05-05-03:56:28/vmcore This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: /usr/lib/debug/lib/modules/2.6.39-400.214.4.el6uek.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2014-05-05-03:56:28/vmcore [PARTIAL DUMP] CPUS: 1 DATE: Tue May 6 12:39:52 2014 UPTIME: 00:04:04 LOAD AVERAGE: 4.62, 2.01, 0.79 TASKS: 586 NODENAME: grac41.example.com RELEASE: 2.6.39-400.214.4.el6uek.x86_64 VERSION: #1 SMP Tue Mar 25 18:05:58 PDT 2014 MACHINE: x86_64 (3266 Mhz) MEMORY: 3.3 GB PANIC: "Oops: 0002 [#1] SMP " (check log for details) PID: 7345 COMMAND: "bash" TASK: ffff880072ae4580 [THREAD_INFO: ffff8800778ea000] CPU: 0 STATE: TASK_RUNNING (PANIC)
View system related crash dump data
Display backtrace info when system crashed crash> bt PID: 7345 TASK: ffff880072ae4580 CPU: 0 COMMAND: "bash" #0 [ffff8800778eba40] machine_kexec at ffffffff8103aa89 #1 [ffff8800778ebab0] crash_kexec at ffffffff810b91e3 #2 [ffff8800778ebb80] oops_end at ffffffff8150dc38 #3 [ffff8800778ebbb0] no_context at ffffffff810484cc #4 [ffff8800778ebbf0] __bad_area_nosemaphore at ffffffff810485f5 #5 [ffff8800778ebc40] bad_area at ffffffff810487ae #6 [ffff8800778ebc70] do_page_fault at ffffffff8151085b #7 [ffff8800778ebd80] page_fault at ffffffff8150d1d5 [exception RIP: sysrq_handle_crash+22] RIP: ffffffff81319806 RSP: ffff8800778ebe38 RFLAGS: 00010092 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000086 RDI: 0000000000000063 RBP: ffff8800778ebe38 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff81819ce0 R13: 0000000000000286 R14: 0000000000000004 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff8800778ebe40] __handle_sysrq at ffffffff8131a049 #9 [ffff8800778ebe80] write_sysrq_trigger at ffffffff8131a0fa #10 [ffff8800778ebeb0] proc_reg_write at ffffffff811cc891 #11 [ffff8800778ebf00] vfs_write at ffffffff8116d0f8 #12 [ffff8800778ebf30] sys_write at ffffffff8116d2c1 #13 [ffff8800778ebf80] system_call_fastpath at ffffffff81514ec2 RIP: 00000039d0edb790 RSP: 00007fff7f86a048 RFLAGS: 00000213 RAX: 0000000000000001 RBX: ffffffff81514ec2 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 00007fcc62282000 RDI: 0000000000000001 RBP: 00007fcc62282000 R8: 000000000000000a R9: 00007fcc62271700 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 00000039d118e780 R14: 0000000000000002 R15: 00000039d118e780 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b --> here we an see that we get a kernel crash in __handle_sysrq() due to page fault crash> bt -a reports the same data as we have only a single CPU here Display memory usage when system crashed crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 812770 3.1 GB ---- FREE 379780 1.4 GB 46% of TOTAL MEM USED 432990 1.7 GB 53% of TOTAL MEM SHARED 56792 221.8 MB 6% of TOTAL MEM BUFFERS 9273 36.2 MB 1% of TOTAL MEM CACHED 149823 585.2 MB 18% of TOTAL MEM SLAB 30012 117.2 MB 3% of TOTAL MEM TOTAL SWAP 1302527 5 GB ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 1302527 5 GB 100% of TOTAL SWAP Display swap usage when system crashed crash> swap FILENAME TYPE SIZE USED PCT PRIORITY /dm-1 PARTITION 5210108k 0k 0% -1 Display current run queue when system crashed crash> runq CPU 0 RUNQUEUE: ffff8800d2012180 CURRENT: PID: 7345 TASK: ffff880072ae4580 COMMAND: "bash" RT PRIO_ARRAY: ffff8800d20122d0 [no tasks queued] CFS RB_ROOT: ffff8800d2012218 [130] PID: 7975 TASK: ffff8800c7662640 COMMAND: "sh" View Interrupt queue when system crashed crash> irq -s CPU0 0: 366397 XT-PIC-XT-PIC timer 1: 63 XT-PIC-XT-PIC i8042 2: 0 XT-PIC-XT-PIC cascade 5: 66062 XT-PIC-XT-PIC ahci,Intel 82801AA-ICH,eth2 8: 0 XT-PIC-XT-PIC rtc0 9: 12091 XT-PIC-XT-PIC acpi,vboxguest,eth1 10: 74 XT-PIC-XT-PIC ehci_hcd:usb1,eth0 11: 39 XT-PIC-XT-PIC ohci_hcd:usb2 12: 250 XT-PIC-XT-PIC i8042 14: 0 XT-PIC-XT-PIC ata_piix 15: 384 XT-PIC-XT-PIC ata_piix -> Timer interrupt was called 366397 times on CPU CPU0. First number is the IRQ number low values means higher priority View system when crash occured crash> sys KERNEL: /usr/lib/debug/lib/modules/2.6.39-400.214.4.el6uek.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2014-05-05-03:56:28/vmcore [PARTIAL DUMP] CPUS: 1 DATE: Tue May 6 12:39:52 2014 UPTIME: 00:04:04 LOAD AVERAGE: 4.62, 2.01, 0.79 TASKS: 586 NODENAME: grac41.example.com RELEASE: 2.6.39-400.214.4.el6uek.x86_64 VERSION: #1 SMP Tue Mar 25 18:05:58 PDT 2014 MACHINE: x86_64 (3266 Mhz) MEMORY: 3.3 GB PANIC: "Oops: 0002 [#1] SMP " (check log for details)
Investiage Process Details for a certain PID which crashed the system
Find active process PID responsible for the crash crash> bt PID: 7345 TASK: ffff880072ae4580 CPU: 0 COMMAND: "bash" .. crash> ps 7345 PID PPID CPU TASK ST %MEM VSZ RSS COMM > 7345 7342 0 ffff880072ae4580 RU 0.1 108520 1876 bash crash> files 7345 PID: 7345 TASK: ffff880072ae4580 CPU: 0 COMMAND: "bash" ROOT: / CWD: /root/Desktop FD FILE DENTRY INODE TYPE PATH 0 ffff880072ad3880 ffff88007e811b40 ffff88007eb94a88 CHR /dev/pts/0 1 ffff880072ba0080 ffff88009adb8840 ffff88009adb7838 REG /proc/sysrq-trigger 2 ffff880072ad3880 ffff88007e811b40 ffff88007eb94a88 CHR /dev/pts/0 10 ffff880072ad3880 ffff88007e811b40 ffff88007eb94a88 CHR /dev/pts/0 255 ffff880072ad3880 ffff88007e811b40 ffff88007eb94a88 CHR /dev/pts/0 Display process leader crash> ps -G | grep ocssd.bin 5408 1 0 ffff88009731c5c0 IN 3.5 666372 120320 ocssd.bin Display only user tasks crash> ps -u | grep ocssd.bin 5408 1 0 ffff88009731c5c0 IN 3.5 666372 120320 ocssd.bin 5414 1 0 ffff88009735a600 IN 3.5 666372 120320 ocssd.bin 5415 1 0 ffff880097378640 IN 3.5 666372 120320 ocssd.bin ...
Using crash help ( sample for ps comannd )
crash> help ps
NAME
ps - display process status information
SYNOPSIS
ps [-k|-u|-G][-s][-p|-c|-t|-l|-a|-g|-r] [pid | taskp | command] ...
DESCRIPTION
This command displays process status for selected, or all, processes
..
Nice post. I was checking constantly this blog and I’m inspired!
Very useful info particularly the ultimate section 🙂 I deal with such info a lot.
I was looking for this certain info for a long time.
Thanks and best of luck.
my web-site: search Engine results
I am now not positive the place you’re getting your info, however great topic.
I needs to spend a while studying more or working out more.
Thank you for great info I was in search of this info
for my mission.
Also visit my web site :: internet marketing strategy (http://www.ask.com)