RAC Perfomance tuning with AWR/EM reports

Overview

Elapsed time Time: Interval of time between the start and end snapshots            
DB Time          : DB Time is the time spent in the database by Foreground Sessions which includes 
                      CPU time, IO time and No-Idle Wait time 
                   Includes Cluster Waits, Commit Waits ..... 
                   Max DB Time in Cluster: MAX DB Time = Number_of_instances * AWR_Report_time * Number_of_CPUs  
                   If it is more and multiples of elapsed time then one has to jump to the 
                   TOP 5 Timed Events sessions  to investigate  further.
CPU Time         : CPU time represents time spent on CPU and does not include time waiting for CPU

Concurrency Top Wait Event Class

EM Report

em_perf_step1_new

  • Concurrency is the top Wait class
  • Transaction rate is about 140 Tx/sec

AWR snapshot:

awr_perf_pict1   awr_perf_pict2

  • There are 2 CPUs per instance  /  6 cores are active for the complete cluster
  • There is a huge DB Time compared to the Elapsed Time
  • Question:  Are we CPU bound ?

awr_perf_pict3

  • Only about 12% from out DB Time is CPU time –> Lot of waits here
  • For BG processes we don’t see to much Waits about 70% of bg time are on CPU ( bg CPU )
  • The definitely means that we aren’t CPU bound so lets have a look on FG Wait TOP 5 Wait Events first

awr_perf_pict4

  • Top Wait Event is Row cache lock with a Wait Time of 780 seconds
  • Let’s see wether we can identify a SQL statement

awr_perf_pict5

  • Checking Insert statement taking 89 % DB time insert into rac_perftest values (t1_seq.nextval, :1 , :2 , sysdate)
  • Insert statement triggers a Row Cache log by using seq.nextval()
  • Following statement updating/selecting seq$ are the recursive statements triggered by the insert statement
  • Statement 1: update seq$ set increment$=:2, minvalue=:3, maxvalue=:4, cycle#=:5, order$=:6, cache=:7,highwater=:8, audit$=:9, flags=:10 where obj#=:1
  • Statement 2: select increment$, minvalue, maxvalue, cycle#, order$, cache, highwater, audit$, flags from seq$ where obj#=:1
  • Both statments update seq$ and insert into rac_perftest were executed 20.000 times. Looks like a 1:1 relationship

awr_perf_pict6

  • Table SEQ$, table RAC_PERFTEST and Index PK_RAC_PERFTEST are changed more than 20,000 times ( db block changes Wait Event )
  • Access to SEQ$ is responsible for about 20.000 Cluster Waits for gc cr blocks received/gc cu blocks received

Tuning Step 1 :  Avoid sequences by using a JAVA generated sequence

Change SQL statement with sequence usage
   insert into rac_perftest values (t1_seq.nextval, ?, ?, sysdate) ";
to a SQL statment generating a JAVA based sequence using AtomicLong datatype
   private static AtomicLong counter = new AtomicLong();
   insert into rac_perftest values (?, ?, ?, sysdate) ";

 

Cluster Waits Top Wait Class

EM performance report 

em_perf_step2_new

  • After first tuning step ( 12:26)  the Concurrency wait event was nearly completely eliminated
  • Transaction rate nearly doubles
  • Cluster Wait Event becomes the dominant Wait Event

AWR snapshot

awr_perf_pict7

  •  Cluster Wait Events : Gc buffer busy acquire and GC current block busy taking 54 % of our DB time 
  • TX index contention Wait event shows we have index block splits

awr_perf_pict8

  • GCS log fush sync is the top Wait Event for our BG process
  • This is a sample where we have some contention from the Block Sending Side as dirty buffers needs to be flushed  before transfering

awr_perf_pict10

  • GC Buffer Busy wait event listed 40.000 times together with GC current blocks received ( 18.000 times ) for Index PK_RAC_PERFTEST
  • Huge Index Contention

awr_perf_pict9

  • Insert statement insert into rac_perftest is respnsible for 99 % for our Clusterwaits
  • Use 10046 trace to get an idea whether the index keys used for insert triggers a RHS index contention

Verify RHS contention using gv$active_session_history view

SQL> select /*+ materialize */    event, current_obj#, current_file#, current_block#, Count(*) CNT,
  2           min(to_char(sample_time,'DD-MON HH:MI')) First_Wait,  max(to_char(sample_time,'DD-MON HH:MI')) Last_Wait
  3       from gv$active_session_history where event like lower('&event') and sample_time>&sample_time_start and  sample_time<&sample_time_end
  4        group by event, current_obj#, current_file#, current_block#
  5        having count(*) >= &grp_count
  6          order by CNT desc, current_block# ;
old   3:     from gv$active_session_history where event like lower('&event') and sample_time>&sample_time_start and  sample_time<&sample_time_end
new   3:     from gv$active_session_history where event like lower('gc%') and sample_time>trunc(sysdate) and  sample_time<sysdate
old   5:      having count(*) >= &grp_count
new   5:      having count(*) >= 5

EVENT                             CURRENT_OBJ#    file    block         CNT FIRST_WAIT   LAST_WAIT
-------------------------------- ------------ ------ -------- ---------- ------------ ------------
gcs log flush sync               -1           0        0          98 17-FEB 09:06 17-FEB 09:19
gc buffer busy acquire            95093       4   585283          14 17-FEB 09:17 17-FEB 09:19
gc buffer busy acquire            95087       4   585112          10 17-FEB 09:13 17-FEB 09:13
gc buffer busy acquire            95087       4   585027           9 17-FEB 09:12 17-FEB 09:13
gc current block busy             95093       4   585283           8 17-FEB 09:17 17-FEB 09:19
gc buffer busy acquire            95087       4   585100           7 17-FEB 09:12 17-FEB 09:12
gc buffer busy acquire            95087       4   585105           7 17-FEB 09:12 17-FEB 09:12
gc buffer busy acquire            95093       4   586558           7 17-FEB 09:19 17-FEB 09:19
gc buffer busy acquire            95093       4   576094           6 17-FEB 09:18 17-FEB 09:18
gc buffer busy acquire            95087       4   585085           6 17-FEB 09:13 17-FEB 09:13
gc buffer busy acquire            95087       4   585094           6 17-FEB 09:12 17-FEB 09:12
gc buffer busy acquire            95087       4   585218           6 17-FEB 09:13 17-FEB 09:13
gc buffer busy acquire            95093       4   585606           6 17-FEB 09:18 17-FEB 09:18
gc buffer busy acquire            95093       4   586555           6 17-FEB 09:19 17-FEB 09:19
gc buffer busy acquire            95093       4   586619           6 17-FEB 09:20 17-FEB 09:20
gc buffer busy acquire            95087       4   585127           5 17-FEB 09:13 17-FEB 09:13
gc buffer busy acquire            95087       4   585185           5 17-FEB 09:13 17-FEB 09:13
gc buffer busy acquire            95087       4   585196           5 17-FEB 09:13 17-FEB 09:13
gc buffer busy acquire            95093       4   585657           5 17-FEB 09:18 17-FEB 09:18
gc buffer busy acquire            95093       4   586186           5 17-FEB 09:19 17-FEB 09:19
gc buffer busy acquire            95093       4   586324           5 17-FEB 09:19 17-FEB 09:19
gc buffer busy acquire            95093       4   586332           5 17-FEB 09:19 17-FEB 09:19
gc buffer busy acquire            95093       4   586335           5 17-FEB 09:19 17-FEB 09:19
gc buffer busy release            95093       4   586557           5 17-FEB 09:18 17-FEB 09:18

24 rows selected.

SQL> 
SQL> with  ash_gc as
  2  (select * from
  3    (
  4    select /*+ materialize */ inst_id,  current_obj#, current_file#, current_block#, Count(*) CNT,
  5           min(to_char(sample_time,'DD-MON HH:MI')) First_Wait,  max(to_char(sample_time,'DD-MON HH:MI')) Last_Wait
  6       from gv$active_session_history where event like lower('&event') and sample_time>&sample_time_start and  sample_time<&sample_time_end
  7        group by inst_id, current_obj#, current_file#, current_block#
  8        having count(*) >=  &grp_count  order by CNT desc, current_block#
  9      )
 10  )
 11  select * from (
 12  select inst_id,owner, object_name,object_type, current_file#, current_block#, cnt, current_obj#, First_Wait,Last_Wait
 13    from ash_gc a, dba_objects o
 14    where (a.current_obj#(+)=o.data_object_id) and a.current_obj#>=1
 15  union
 16  select inst_id, '','','Undo Header/Undo block' ,current_file#, current_block#, cnt, current_obj#, First_Wait,Last_Wait
 17    from ash_gc a
 18    where a.current_obj#=0
 19  union
 20  select inst_id, '','','Undo Block' , current_file#, current_block#, cnt, current_obj#,First_Wait,Last_Wait
 21    from ash_gc a
 22    where a.current_obj#=-1
 23  ) order by 7 desc
 24  /
old   6:     from gv$active_session_history where event like lower('&event') and sample_time>&sample_time_start and  sample_time<&sample_time_end
new   6:     from gv$active_session_history where event like lower('gc%') and sample_time>trunc(sysdate) and  sample_time<sysdate
old   8:      having count(*) >=  &grp_count  order by CNT desc, current_block#
new   8:      having count(*) >=  5  order by CNT desc, current_block#

   INST_ID OWNER      OBJECT_NAME          OBJECT_TYPE          file    block       CNT CURRENT_OBJ# FIRST_WAIT     LAST_WAIT
---------- ---------- -------------------- -------------------- ------ -------- ---------- ------------ ------------ ------------
     1                                     Undo Block               0        0        50         -1     17-FEB 03:07 17-FEB 09:19
     2                                     Undo Block               0        0        32         -1     17-FEB 03:07 17-FEB 09:19
     3                                     Undo Block               0        0        24         -1     17-FEB 03:07 17-FEB 09:19
     3     SCOTT      PK_RAC_PERFTEST      INDEX                    4   585283        13          95093 17-FEB 09:17 17-FEB 09:19
     2     SCOTT      PK_RAC_PERFTEST      INDEX                    4   585283         9          95093 17-FEB 09:17 17-FEB 09:19
     1     SCOTT      PK_RAC_PERFTEST      INDEX                    4   585283         9          95093 17-FEB 09:17 17-FEB 09:19
     2     SCOTT      PK_RAC_PERFTEST      INDEX                    4   585645         6          95093 17-FEB 09:18 17-FEB 09:18
     3     SCOTT      PK_RAC_PERFTEST      INDEX                    4   586619         6          95093 17-FEB 09:20 17-FEB 09:20
     3                                     Undo Block               2     2771         5         -1     17-FEB 03:03 17-FEB 03:03

 

Tuning Step 2 : Avoid RHS index growth by using Hash Indexes

Avoid Right-Hand-Side index growth by creating a Hash Index:
CREATE UNIQUE INDEX "HASH_IDX" on  rac_perftest (ID) GLOBAL PARTITION BY HASH (id) PARTITIONS 64;
  •  In 99% of OLTP performance issues write hot spots occur on indexes
  • For more details please read following article

 

 

 COMMIT top Wait Event Class

 EM performance report

em_perf_step3_new

  • Cluster Wait Events dropped a lot and TPS increased significant
  • Commit Class becomes now dominant Wait Event
  • CPU Wait also increases  ( which is a good sign as we have 10 Insert Threads for 6 CPUs )

AWR snapshot

awr_perf_pict11

  • Log File Sync is Top Wait Event taking 90 % of our DB Time
  • Monitoring  LOG Writer IO statistics is the next step

awr_perf_pict12

  • LGWR only shows 70 Writes per second – Very slow !!

–> Suggestion :  Test your Redo Write performance with dd

Tuning Step 3:  Move Redo logs from overloaded Disks to SSD

em_perf_step4_new

  • Transaction rate improves by factor 4
  • Test Case runtime drops from 10 minutes to 3 minutes
  • Even COMMIT is our Top Wait Event Class we also see CPU Waits – which is good
  • Newly seen Cluster Waits are ok. They are related to table blocks still transferred between instances
  • Following link explains LOG File Sync Wait Event more in detail.

Leave a Reply

Your email address will not be published. Required fields are marked *