Table of Contents
Testcase:
- 3-Node RAC cluster 11.2.0.4
- Insert 10.000 rows per thread using 10 threads but commit data every 100 records to avoid LOG FILE SYNC to become dominating wait event
- Java test program : UCPDemo.java
Run test case: $ java UCPDemo grac4 10 10000 100 -noseq -nodebug Started at: Wed Nov 27 15:24:45 CET 2013 - URL: jdbc:oracle:thin:@//grac4-scan.grid4.example.com:1521/grac4 - Number of Threads: 10 - tx_cnt:10000 - commit_cnt:100 - mode: 2 Debug mode: false -> SQL: insert into rac_perftest values (?, ?, ?, sysdate) ... Thread_id: 8 - Instance: grac42 - Row Inserted : 10000 - Commits done: 93 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 46 - Thread TPS: 0.85 Thread_id: 5 - Instance: grac43 - Row Inserted : 10000 - Commits done: 101 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 18 - Thread TPS: 0.92 Thread_id: 4 - Instance: grac42 - Row Inserted : 10000 - Commits done: 95 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 24 - Thread TPS: 0.86 Thread_id: 1 - Instance: grac42 - Row Inserted : 10000 - Commits done: 110 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 49 - Thread TPS: 1.00 Thread_id: 9 - Instance: grac43 - Row Inserted : 10000 - Commits done: 110 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 40 - Thread TPS: 0.99 Thread_id: 2 - Instance: grac43 - Row Inserted : 10000 - Commits done: 98 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 47 - Thread TPS: 0.88 Thread_id: 7 - Instance: grac43 - Row Inserted : 10000 - Commits done: 95 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 135 - Thread TPS: 0.86 Thread_id: 6 - Instance: grac41 - Row Inserted : 10000 - Commits done: 115 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 26 - Thread TPS: 1.04 Thread_id: 3 - Instance: grac41 - Row Inserted : 10000 - Commits done: 99 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 76 - Thread TPS: 0.89 Thread_id: 10 - Instance: grac41 - Row Inserted : 10000 - Commits done: 84 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 15 - Thread TPS: 0.75 Runtime : 111 [s] - TPS: 8.61
Related AWR report
Reading and understanding related AWR report % Total I# User I/O Sys I/O Other Applic Commit Network Concurcy Config Cluster DB CPU DB time ---- -------- -------- --------- ------- ---------- ------- -------- ------- -------- ------ -------- 1 4.0 0.0 0.4 1.6 0.3 3.4 0.6 0.1 81.5 11.6 32.8 2 0.0 0.0 1.3 0.0 0.2 4.0 0.3 0.1 78.4 6.9 28.2 3 0.1 0.0 2.2 0.0 1.1 5.8 0.3 0.1 78.0 5.9 39.1 ~~~ ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~ ~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~ ~~~~~ ~~~~~~~~~ Avg 1.3 0.0 1.3 0.5 0.5 4.4 0.4 0.1 79.3 8.1 --> huge clusterwaits for all 3 instances ( ~ 80 %) Wait Event Wait Time Summary Avg Wait Time (ms) --------------------------------------------------- ---------------------- ------------------------------ ---------------------------------------- I# Class Event Waits %Timeouts Total(s) Avg(ms) %DB time Avg Min Max Std Dev Cnt ---- ---------- ---------------------------------------- ------------ --------- ------------- ------- -------- -------- -------- -------- -------- ---- * Cluster gc buffer busy release 33,043 0.0 154.97 4.7 31.79 4.49 4.06 5.03 0.49 3 Cluster gc buffer busy acquire 27,306 0.0 121.96 4.5 25.02 4.35 4.08 4.64 0.28 3 Cluster gc current block busy 15,904 0.0 100.46 6.3 20.61 6.32 6.13 6.55 0.21 3 DB CPU N/A N/A 39.26 N/A 8.05 3 Other gcs log flush sync 19,324 0.2 17.90 0.9 3.67 0.85 0.63 1.27 0.36 3 Concurrenc enq: TX - index contention 1,768 0.0 12.05 6.8 2.47 6.83 6.40 7.26 0.43 3 User I/O db file scattered read 233 0.0 5.89 25.3 1.21 22.87 16.40 28.32 6.03 3 User I/O db file sequential read 407 0.0 5.53 13.6 1.13 14.14 12.56 15.52 1.49 3 Other buffer deadlock 23,134 100.0 4.89 0.2 1.00 0.20 0.03 0.33 0.16 3 Concurrenc latch: cache buffers chains 20,540 0.0 4.82 0.2 0.99 0.22 0.19 0.28 0.05 3 --> High gc cluster waits for following events: gc buffer busy release, gc buffer busy acquire, gc current block busy Top Timed Background Events DB/Inst: GRAC4/grac41 Snaps: 859-860 Wait Event Wait Time Summary Avg Wait Time (ms) --------------------------------------------------- ---------------------- ------------------------------ ---------------------------------------- I# Class Event Waits %Timeouts Total(s) Avg(ms) %DB time Avg Min Max Std Dev Cnt ---- ---------- ---------------------------------------- ------------ --------- ------------- ------- -------- -------- -------- -------- -------- ---- * background cpu time N/A N/A 45.00 N/A 46.60 3 Other gcs log flush sync 19,323 0.2 17.90 0.9 18.54 0.85 0.63 1.27 0.36 3 User I/O db file sequential read 228 0.0 3.56 15.6 3.69 15.50 12.73 16.98 2.40 3 User I/O db file scattered read 86 0.0 1.70 19.7 1.76 23.14 16.29 29.28 6.53 3 -> background reports wait time for gcs log flush sync. LMS process needs to wait until LGWR process has written redo info to disk for that block before transfering the block. For details see following link. Current Blocks Served Statistics DB/Inst: GRAC4/grac41 Snaps: 859-860 I# Pins % <1ms % <10ms % <100m % <1s % <10s Flushes % <1ms % <10ms % <100m % <1s % <10s Writes % <1ms % <10ms % <100m % <1s % <10s ---- ----------- ------- ------- ------- ------- ------- ----------- ------- ------- ------- ------- ------- ----------- ------- ------- ------- ------- ------- 1 4,224 94.13 5.40 0.45 0.02 0.00 5,187 0.10 98.98 0.91 0.02 0.00 19 0.00 0.00 68.42 31.58 0.00 2 4,431 59.29 40.60 0.09 0.02 0.00 4,883 67.40 32.48 0.10 0.02 0.00 11 0.00 0.00 63.64 27.27 9.09 3 4,805 39.90 60.06 0.04 0.00 0.00 4,964 54.51 45.37 0.12 0.00 0.00 23 0.00 0.00 47.83 52.17 0.00 ~~~ ~~~~~~~~~~~ ~~~~~~~~~~~ ~~~~~~~~~~~ Sum 13,460 15,034 53 Avg 4,487 5,011 18 Std 294 157 6 -> High number for Pins ( get exclusive access to a block before it can be flushed ) and Flushes ( LGWR needs to write REDO before transfering a block) Tablespace Object Subobject Obj. Statistic Owner Name Name Name Type Value %Total %Capture ------------------------ ---------- ---------- -------------------- ---------- ----- ------------ ------ -------- buffer busy waits SCOTT USERS PK_RAC_PERFTEST INDEX 1,079 N/A 93.8 SCOTT USERS RAC_PERFTEST TABLE 70 N/A 6.1 db block changes SCOTT USERS PK_RAC_PERFTEST INDEX 52,560 N/A 49.4 SCOTT USERS RAC_PERFTEST TABLE 52,240 N/A 49.1 gc buffer busy SCOTT USERS PK_RAC_PERFTEST INDEX 48,978 N/A 98.6 SCOTT USERS RAC_PERFTEST TABLE 675 N/A 1.4 gc cu blocks received SCOTT USERS PK_RAC_PERFTEST INDEX 16,176 90.1 96.4 SCOTT USERS RAC_PERFTEST TABLE 249 1.4 1.5 gc cu blocks served SCOTT USERS PK_RAC_PERFTEST INDEX 16,176 90.8 96.4 SCOTT USERS RAC_PERFTEST TABLE 249 1.4 1.5 row lock waits SCOTT USERS PK_RAC_PERFTEST INDEX 839 N/A 100.0 --> AWR shows about 49.000 gc buffer busy waits for index RAC_PERFTEST and about 16000 gc current blocks received/served for about 100.000 insert. For every 6.th insert operation ( note testcase inserted 100,000 rows ) we see a index block t transfer via GCS and for every 2.nd insert we get a gc buffer busy waits. -> Huge contention !
Related ASH scripts
Script : ash_gcwait_to_block.sql -> Check ASH for Blocks heavily used by gc cluster waits -> Event: gc buffer busy acquire - Threshold: 3 -> Event: gc buffer busy release - Threshold: 3 INST_ID OWNER OBJECT_NAME OBJECT_TYPE CURRENT_OBJ# file block CNT ---------- ------------ ----------------------- -------------------- ------------ ------ -------- ---------- 3 SCOTT PK_RAC_PERFTEST INDEX 89118 4 589615 4 3 SCOTT PK_RAC_PERFTEST INDEX 89118 4 587643 5 3 89112 4 587472 4 1 89112 4 585715 6 -> Event: gc current block busy - Threshold: 3 INST_ID OWNER OBJECT_NAME OBJECT_TYPE CURRENT_OBJ# file block CNT ---------- ----------- ------------------------ ------------- ------------ ------ -------- ---------- 3 SCOTT PK_RAC_PERFTEST INDEX 89118 4 587643 5 3 89109 4 585643 4 2 Undo Block -1 0 0 4 3 Undo Block -1 0 0 6 1 Undo Block -1 0 0 7 --> We see that index PK_RAC_PERFTEST is heavily used by all instances The reported object_id 89112 results from a former drop operation Check ASH for Objects heavily used by gc cluster waits Script : ash_gcwait_to_obj.sql -> Event: gc buffer busy acquire - Threshold: 10 INST_ID OWNER OBJECT_NAME OBJECT_TYPE CNT ---------- ------------ ---------------------- -------------------- ---------- 3 SCOTT PK_RAC_PERFTEST INDEX 119 2 SCOTT PK_RAC_PERFTEST INDEX 127 -> Event: gc buffer busy release - Threshold: 10 INST_ID OWNER OBJECT_NAME OBJECT_TYPE CNT ---------- ------------ ----------------------- -------------------- ---------- 2 SCOTT PK_RAC_PERFTEST INDEX 28 3 SCOTT PK_RAC_PERFTEST INDEX 112 1 SCOTT PK_RAC_PERFTEST INDEX 185 -> Event: gc current block busy - Threshold: 10 INST_ID OWNER OBJECT_NAME OBJECT_TYPE CNT ---------- --------- ----------------------- ------------------- ---------- 3 SCOTT PK_RAC_PERFTEST INDEX 65 1 SCOTT PK_RAC_PERFTEST INDEX 66 2 SCOTT PK_RAC_PERFTEST INDEX 71 -> Breaking down the results at object level shows that is index is heavily used Check ASH for SQL_ID/Waits heavily used by gc cluster waits Script : ash_gcwait_to_sql.sql -> Event: gc buffer busy acquire - Wait-Threshold: 40 -> Find waits grouped by Instance,SessioniID,SQLID,Event INST_ID SESSION_ID SQL_ID EVENT CNT ---------- ---------- ------------ ----------------------- ---------- 1 7 77bfwdy8jxa0v gc current block busy 45 1 7 77bfwdy8jxa0v gc buffer busy release 114 1 35 77bfwdy8jxa0v gc buffer busy release 52 1 39 77bfwdy8jxa0v gc buffer busy release 64 1 223 77bfwdy8jxa0v gc current block busy 42 1 223 77bfwdy8jxa0v gc buffer busy release 127 2 52 77bfwdy8jxa0v gc current block busy 54 2 52 77bfwdy8jxa0v gc buffer busy acquire 107 2 57 77bfwdy8jxa0v gc buffer busy acquire 46 2 61 77bfwdy8jxa0v gc buffer busy acquire 45 3 29 77bfwdy8jxa0v gc buffer busy release 53 3 48 77bfwdy8jxa0v gc buffer busy acquire 47 -> Here we can figure out that gc buffer busy release, gc buffer busy acquire and gc current block busy are the expectedtop events for RHS index growth. -> Only a single SQLID is reported
Getting runtime information using gv$session/gv$session_wait
- re-run that script multiple times to get a clear picture what is going on
SQL> select w.inst_id, w.sid,w.program, w.event , w.STATE , w.SECONDS_IN_WAIT SEC_in_WAIT , w.P1TEXT || ': ' || w.P1 || ' - ' || w.P2TEXT || ': ' || w.P2 || ' - ' || w.P3TEXT || ': ' || w.P3 as P1_P2_P3 from gv$session w where program like '%LGWR%' or program like '%LMS%' or program like '%JDBC%' order by program; INST_ID SID PROGRAM EVENT STATE SEC_IN_WAIT P1_P2_P3 ------- ---------- -------------------------------- --------------------------- ------------------ ----------- ---------------------------------------------------------------- 1 7 JDBC Thin Client log file sync WAITING 0 buffer#: 1574 - sync scn: 18456348 - : 0 3 62 JDBC Thin Client gc buffer busy release WAITING 0 file#: 4 - block#: 587486 - class#: 1 1 35 JDBC Thin Client gc buffer busy release WAITING 0 file#: 4 - block#: 587486 - class#: 1 3 57 JDBC Thin Client gc current request WAITING 0 file#: 4 - block#: 587486 - id#: 33554433 3 29 JDBC Thin Client gc buffer busy acquire WAITING 0 file#: 4 - block#: 587486 - class#: 1 3 1 JDBC Thin Client gc buffer busy acquire WAITING 0 file#: 4 - block#: 587486 - class#: 1 2 59 JDBC Thin Client gc buffer busy acquire WAITING 0 file#: 4 - block#: 587486 - class#: 1 2 54 JDBC Thin Client gc current request WAITING 0 file#: 4 - block#: 587486 - id#: 33554433 2 52 JDBC Thin Client gc buffer busy acquire WAITING 0 file#: 4 - block#: 587486 - class#: 1 1 223 JDBC Thin Client gc buffer busy release WAITING 0 file#: 4 - block#: 587486 - class#: 1 1 96 oracle@grac41.example.com (LGWR) rdbms ipc message WAITED KNOWN TIME 0 timeout: 23 - : 0 - : 0 1 158 oracle@grac41.example.com (LMS0) gcs remote message WAITING 0 waittime: 30 - poll: 0 - event: 0 1 188 oracle@grac41.example.com (LMS1) gcs log flush sync WAITING 0 waittime: 30 - poll: 0 - event: 138 2 18 oracle@grac42.example.com (LGWR) rdbms ipc message WAITING 0 timeout: 52 - : 0 - : 0 2 13 oracle@grac42.example.com (LMS0) gcs remote message WAITING 0 waittime: 30 - poll: 0 - event: 0 3 19 oracle@grac43.example.com (LGWR) rdbms ipc message WAITING 0 timeout: 135 - : 0 - : 0 3 12 oracle@grac43.example.com (LMS0) gcs remote message WAITING 0 waittime: 30 - poll: 0 - event: 0 ---> Event gc buffer busy release, gc buffer busy acquire and even gc current request shows what we are using again and again the same index block All user sessions are requesting the same block: file#: 4 - block#: 587486
Summary
- For RHS index growth it is expect to see top events: gc buffer busy release, gc buffer busy acquire and gc current block busy
- Use GV$session P1,P2,P3 information to get an idea what index blocks are in use by what wait events.
- GC Buffer Busy Acquire/GC Buffer Busy Release are side effects of anther root cause (here we have RHS index growth )
Fixing the performance problem by switch to HASH partiontioning index
Create table and hash index : SQL> create table rac_perftest ( id number, inst_name varchar(8), host_name varchar(24), ins_date date); SQL> CREATE UNIQUE INDEX "HASH_IDX" on rac_perftest (ID) GLOBAL PARTITION BY HASH (id) PARTITIONS 64; Rerun test case: $ java UCPDemo grac4 10 5000 100 -noseq -nodebug Started at: Fri Nov 29 15:07:06 CET 2013 - URL: jdbc:oracle:thin:@//grac4-scan.grid4.example.com:1521/grac4 - Number of Threads: 10 - Row to be inserted:50000 - commit_cnt:100 - mode: 2 Debug mode: false -> SQL: insert into rac_perftest values (?, ?, ?, sysdate) ... Finshed->Thread_id: 9 - Instance: grac41 - Row Inserted : 8992 - Commits done: 89 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 130 - Thread TPS: 2.53 Finshed->Thread_id: 2 - Instance: grac43 - Row Inserted : 3158 - Commits done: 31 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 36 - Thread TPS: 0.88 Finshed->Thread_id: 4 - Instance: grac42 - Row Inserted : 3557 - Commits done: 35 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 89 - Thread TPS: 1.00 Finshed->Thread_id: 3 - Instance: grac42 - Row Inserted : 3491 - Commits done: 34 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 36 - Thread TPS: 0.97 Finshed->Thread_id: 5 - Instance: grac43 - Row Inserted : 3201 - Commits done: 32 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 59 - Thread TPS: 0.91 Finshed->Thread_id: 10 - Instance: grac41 - Row Inserted : 8870 - Commits done: 88 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 43 - Thread TPS: 2.50 Finshed->Thread_id: 6 - Instance: grac41 - Row Inserted : 8893 - Commits done: 88 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 37 - Thread TPS: 2.50 Finshed->Thread_id: 1 - Instance: grac43 - Row Inserted : 3183 - Commits done: 31 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 19 - Thread TPS: 0.88 Finshed->Thread_id: 7 - Instance: grac42 - Row Inserted : 3511 - Commits done: 35 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 24 - Thread TPS: 1.00 Finshed->Thread_id: 8 - Instance: grac43 - Row Inserted : 3154 - Commits done: 31 MIN Commit time [ms]: 1 - MAX Commit time [ms]: 155 - Thread TPS: 0.88 Runtime : 37 [s] - TPS: 13.31 --> Application runtime improved from 72 s to 37 s ( Not bad for only changing the index type ) Review releated AWR reports : Using standard PK Wait Event Wait Time Summary Avg Wait Time (ms) --------------------------------------------------- ---------------------- ------------------------------ ---------------------------------------- I# Class Event Waits %Timeouts Total(s) Avg(ms) %DB time Avg Min Max Std Dev Cnt ---- ---------- ---------------------------------------- ------------ --------- ------------- ------- -------- -------- -------- -------- -------- ---- * Cluster gc buffer busy release 36,995 0.0 184.88 5.0 31.23 4.76 3.96 5.30 0.71 3 Cluster gc buffer busy acquire 28,582 0.0 130.13 4.6 21.98 4.35 3.82 4.90 0.54 3 Cluster gc current block busy 16,675 0.0 111.44 6.7 18.82 6.68 6.47 7.04 0.31 3 Applicatio enq: RO - fast object reuse 455 0.0 59.00 129.7 9.97 101.02 0.63 301.67 173.77 3 DB CPU N/A N/A 39.78 N/A 6.72 3 Other gcs log flush sync 19,571 0.3 18.76 1.0 3.17 0.88 0.66 1.26 0.33 3 Concurrenc enq: TX - index contention 1,895 0.0 14.42 7.6 2.44 7.62 6.79 8.44 0.83 3 Concurrenc latch: cache buffers chains 22,280 0.0 5.53 0.2 0.93 0.24 0.20 0.28 0.04 3 Other buffer deadlock 21,766 100.0 4.08 0.2 0.69 0.18 0.04 0.31 0.14 3 Concurrenc buffer busy waits 1,100 0.0 3.03 2.8 0.51 2.72 2.54 2.90 0.18 3 Using hash partitioning index ( note base table is still a ordinary table ) * DB CPU N/A N/A 55.70 N/A 21.45 3 Cluster gc current block busy 9,018 0.0 55.44 6.1 21.35 6.30 4.71 7.46 1.42 3 Cluster gc current block 2-way 16,224 0.0 29.59 1.8 11.39 1.97 1.21 2.48 0.68 3 Cluster gc current grant busy 11,382 0.0 22.08 1.9 8.50 2.00 1.60 2.21 0.35 3 Other gcs log flush sync 23,589 0.4 20.33 0.9 7.83 0.81 0.52 1.30 0.42 3 Cluster gc current block 3-way 5,774 0.0 14.85 2.6 5.72 2.46 1.67 2.95 0.70 3 Cluster gc buffer busy acquire 1,936 0.0 6.40 3.3 2.46 3.29 2.43 4.29 0.94 3 Concurrenc cursor: pin S wait on X 9 0.0 3.36 373.2 1.29 431.71 77.00 614.80 307.25 3 Cluster gc current block congested 537 0.0 2.96 5.5 1.14 5.43 5.22 5.66 0.22 3
Summary:
- After switching to hash partitioning index gc buffer busy release/aquire wait event was dramatically reduced
- Remaining gc current block busy indicates that LMS needs to perform add. work before sending a block
- gcs log flush sync is identical for both runs – still LGWR needs to be posted about 20.000 x before LMS can send the block
- Overall Application runtime improved from 72 s to 37 s ( Not bad for only changing the index type )
Brilliant writeup for debugging the events.
Hi Haris,
the link should work now.
best regards
Helmut
Hi Helmut
Thank you so much.
Regards
Haris.
Excellent troubleshooting and detail explanations. One request, would you please share the script ash_gcwait_to_sql.sql?
Cheers!
Awesome explaination..Need to setup TestCase for RAC but not able to find the tables and scripts related to UCPDemo.java program. Can we have the complete details on setup and scripts.
Regards
Anwar