Table of Contents
Test system
- 8 CPUs OEL 6.4 Oracle 11.2.0.4 RAC
Compare the 2 sessions above running on the same hardware and same CPU load
Session 1: Top 10 Foreground Events by Total Wait Time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tota Wait % DB Event Waits Time Avg(ms) time Wait Class ------------------------------ ------------ ---- ------- ------ ---------- log file sync 5,007 127. 23 93.9 Commit DB CPU 6.6 5.3 gc current block busy 34 1 29 .8 Cluster db file sequential read 27 .2 9 .2 User I/O gc current block 2-way 166 .2 1 .1 Cluster db file scattered read 19 .1 6 .1 User I/O gc buffer busy release 15 .1 6 .1 Cluster gc current block 3-way 69 .1 1 .0 Cluster SQL*Net message to client 10,011 0 0 .0 Network control file sequential read 112 0 0 .0 System I/O Session2: Top 10 Foreground Events by Total Wait Time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tota Wait % DB Event Waits Time Avg(ms) time Wait Class ------------------------------ ------------ ---- ------- ------ ---------- log file sync 5,008 41.1 8 81.2 Commit DB CPU 6.2 12.2 reliable message 1 1 1031 2.0 Other gc current block busy 52 .9 17 1.8 Cluster gc buffer busy release 41 .4 10 .8 Cluster control file sequential read 112 .1 1 .3 System I/O db file sequential read 15 .1 8 .2 User I/O db file scattered read 11 .1 7 .1 User I/O gc cr block busy 4 .1 14 .1 Cluster SQL*Net message to client 10,011 0 0 .1 Network
- AWR reports tells us we are running the same number of COMMITs ( 5000 ) but session2 is about 3x faster than session1.
- Note also the overall CPU load is the same for both tests – in Session1 we only run a CPU intensive process on the same CPU as our lgwr process whereas in Session2 the CPU intensive process is running on a different CPU.
Running both sq load process and lgwr on CPU 5
Compile and run sq.c #include <math.h> double sqrt(); main () { int i; double z; printf("PID: %d\n", getpid()); for (;;) { for (i = 0; i < 1000000; i++) { z = sqrt (i); } poll (0, 0, 1); } } $ cc sq.c -o sq -lm Attach sq process to CPU 5 and increase priority $ taskset -c 5 sq PID: 12657 # renice -15 12657 Now reschedule the LGWR to CPU 5 and monitor both process with top $ ps -elf | egrep 'NI|ora_lgwr' | grep -v grep F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S oracle 6181 1 0 80 0 - 394456 semtim 18:31 ? 00:00:01 ora_lgwr_grac41 $ taskset -pc 5 6181 pid 6181's current affinity list: 0-7 pid 6181's new affinity list: 5 Run the JDBC test case $ java UCPDemo_si 1 5000 -noseq Started at: Fri Nov 15 19:12:43 CET 2013 - Number of Threads: 1 - tx_cnt:5000 - mode: 2 Debug mode: false -> SQL: insert into rac_perftest values (?, ?, ?, sysdate) Connected at: 19:12:45 to instance: grac41 at host: grac41.example.com Released connection to pool - Instance: grac41 - Records inserted: 5000 Runtime : 130 [s] Monitor both processes using top # top -H -p 12657,6181 ( press f and j to add CPU affinity flag P ) Cpu(s): 15.6%us, 6.3%sy, 0.0%ni, 77.2%id, 0.4%wa, 0.0%hi, 0.5%si, 0.0%st PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 12657 oracle 5 -15 6544 308 244 R 80.0 0.0 8:47.77 5 sq 6181 oracle 20 0 1540m 32m 28m D 7.6 0.8 0:10.13 5 oracle --> P = 5 means both processes are running on CPU 5 lgwr only gets around 7 % of CPU There is a lot of CPU available ( 77% ) but lgwr still runs into CPU starvation sq process takes about 80% CPUtime from CPU 5 and runs with a higher priority as lgwr process
Running both sq load process and lgwr on different CPUs
Reschedule lgwr process from CPU 5 to CPU 7 $ taskset -pc 7 6181 pid 6181's current affinity list: 5 pid 6181's new affinity list: 7 Run test case: $ java UCPDemo_si 1 5000 -noseq Started at: Fri Nov 15 19:23:17 CET 2013 - Number of Threads: 1 - tx_cnt:5000 - mode: 2 Debug mode: false -> SQL: insert into rac_perftest values (?, ?, ?, sysdate) Connected at: 19:23:18 to instance: grac41 at host: grac41.example.com Released connection to pool - Instance: grac41 - Records inserted: 5000 Runtime : 45 [s] Releated Top Output Cpu(s): 19.0%us, 9.0%sy, 0.0%ni, 70.6%id, 0.0%wa, 0.0%hi, 1.5%si, 0.0%st PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 12657 oracle 5 -15 6544 308 244 S 81.5 0.0 17:50.60 5 sq 6181 oracle 20 0 1540m 32m 28m D 12.0 0.8 0:26.33 7 oracle --> sq process and lgwr runs on different CPUs - lgwr process now takes 12% CPU sq process still takes about 81 % of CPU 7 but lgwr process now running on CPU 5 get enough CPU time Application runtime drops from 130 s to 45 s without changing anything
Summary:
- Always check that lgwr process gets enough CPU time – total CPU shown in top output is not always the limiting factor
- RT processes or lot of high priority process together with low number of CPUs can worthen the problem
- First decrease priority for NON-Oracle RT processes ( resetting the priority of our sq load process fixes this problem : $ renice 0 12657 )
- If this doesn’t help try to renice LGWR oracle shadow processes ( increase priority )
Attractive section of content. I just stumbled upon your blog
and in accession capital to assert that I acquire in fact enjoyed account your blog posts.
Any way I’ll be subscribing to your feeds and even I achievement you
access consistently fast.