Table of Contents
Problem: Transaction recovery: lock conflict caught and ignored in alert.log
Problem description: Alert.log : 2PC commit crashes due to TEXT Error 604 trapped in 2PC on transaction 39.4.8558636. Cleaning up. Error stack returned to user: ORA-00604: error occurred at recursive SQL level 1 ORA-06550: line 1, column 7: PLS-00306: wrong number or types of arguments in call to 'SYNCRN' ORA-06550: line 1, column 7: PL/SQL: Statement ignored --- Transaction recovery: lock conflict caught and ignored Transaction recovery: lock conflict caught and ignored Transaction recovery: lock conflict caught and ignored Transaction recovery: lock conflict caught and ignored Transaction recovery: lock conflict caught and ignored dba_2pc_pending : "local_tran_id" "global_tran_id" "state" "mixed" "fail_time" "force_time" "retry_time" "os_user" "os_terminal" "host" "db_user" "commit#" 79.5.15008388 1145324612.E8CC44F0C9769846A35406B43734A05500000000 forced rollback no 10.21.2014 23:07:59 10.21.2014 23:10:55 10.29.2014 05:43:28 epcx-qa NLWSL1124 DDNS\NLWSL1124 13756556056355 106.24.1303431 1145324612.04FB8C3C6874764FA5BEEA3449ADEB1E00000000 forced rollback no 10.22.2014 03:13:17 10.22.2014 03:25:55 10.29.2014 05:43:28 epcx-qa NLWSL1124 DDNS\NLWSL1124 13756628791127 31.10.19004224 1145324612.E4A2D008CCE7884AA82662FB01BA6C5700000000 forced rollback no 10.28.2014 10:35:39 10.28.2014 10:40:55 10.29.2014 04:25:16 irecruit-qa NLWSL172 DDNS\NLWSL172 13761020351676 --> Admin has forced rollback that session Lets concentrate on the last TX local_tran_id 31.10.19004224 Tx failed at 10.28.2014 10:35:39 Tx forced at 10.28.2014 10:40:55 <- TX forced rollback Tx retry at 10.29.2014 04:25:16 <- Retry fails due to FORCED ROLLBACK Checking transaction size and status for DEAD transactions see (Doc ID 1494886.1) =========== ktuxeusn – Undo Segment Number ktuxeslt – Slot number ktuxesqn – Sequence ktuxesta – State ktuxesiz – Undo Blocks Remaining ktuxecfl – Flag ================ select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo from x$ktuxe where ktuxesta <> 'INACTIVE' and ktuxecfl like '%DEAD%' order by ktuxesiz asc; USN SLOT SEQ STATE UNDO ---------- ---------- ---------- ---------------- ---------- 43 4 8073724 ACTIVE 1 80 22 12620576 ACTIVE 1 192 21 102488 ACTIVE 1 124 16 58091 ACTIVE 1 85 30 3817135 ACTIVE 1 --> Seems we need only to recover one block !! but tx recovery fails due to some deadlock Root cause : Problem is that during a 2PC we are crashing due to TEXT bug + someone have rollbacked that TX with rollback force 'LOCAL_TRAN_ID' C. installed patch: Bug 17501296 ORA-604 / PLS-306 attempting to delete rows from table with Text index after upgrade to 11.2.0.4 This should avoid new crashed in 2PC but the old TX are still not recovered. In alert.log we can see Serial Transaction recovery caught exception 30319 Serial Transaction recovery caught exception 30319 Bug 17614504 : SMON: SERIAL TRANSACTION RECOVERY CAUGHT EXCEPTION 30319, HIGH CPU LOAD "Serial Transaction recovery caught exception 30319" is just an informative message and indicates that SMON spent more that 2 and 1/2 minutes in txn recovery and the recovery has been interrupted as SMON has other pending tasks. This is common when instance is being recovered and there are very long to be recovered transactions. Diagnostics : 1) --> Set event : events(10015 and 30305) traces see Bug 18526660 : SMON TRANSACTION RECOVERY IS VERY SLOW 0) select * from x$ktuxe 1) alter session set events 'immediate trace name ktpr_debug level 3'; -> This should produce an ora* trace file, find it and upload it. 2) oradebug setospid <ospid of SMON process> oradebug unlimit oradebug dump errorstack 3 oradebug event 10015 trace name context forever, level 10 oradebug event 30305 trace name context forever, level 5 3) let it run for 15 mins with the trace events on 4) turn off the debug: oradebug setospid <ospid of SMON process> oradebug dump errorstack 3 oradebug event 10015 trace name context off oradebug event 30305 trace name context off 5) select x$ktuxe --> Set event : events(10015 and 30305) traces Apply WA from Bug 18853638 Bug 18853638 - GV$FAST_START_TRANSACTIONS SHOWS STANDSTILL RECOVERING TRANSACTIONS (RCA) The issue was not resolved until we suggest the customer to set '_smu_debug_mode' to 1024 on 5 nodes out of the 6 to force the transaction to be recovered only in one instance. If you have a 4-note cluster run on 3 nodes: ALTER SYSTEM SET "_smu_debug_mode"=1024; This will prevent SMON from performing dead-transaction recovery . They can enable txn recovery later at a more convenient time. The good part is that any data modified by this dead transaction but required by other transactions will be recovered on "as needed" basis. To resume transaction recovery do the following: ALTER SYSTEM SET "_smu_debug_mode"=0;
Reference
- PROCEDURE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY Specification (Doc ID 67590.1)
- ORA-30019 When Executing Dbms_transaction.Purge_lost_db_entry (Doc ID 290405.1)
- How To Resolve Stranded DBA_2PC_PENDING Entries (Doc ID 401302.1)
- Bug 1561125 SMON may stop recovering dead transactions causing sessions to hang
- Master Note: Troubleshooting Database Transaction Recovery (Doc ID 1494886.1)
- Bug 18853638 – GV$FAST_START_TRANSACTIONS SHOWS STANDSTILL RECOVERING TRANSACTIONS (RCA)
- Bug 17614504 : SMON: SERIAL TRANSACTION RECOVERY CAUGHT EXCEPTION 30319, HIGH CPU LOAD
- Bug 18526660 : SMON TRANSACTION RECOVERY IS VERY SLOW