Table of Contents
Verify ASM disk status by check RDBMS alert.log, clusterware alert.log, OCSSD trace and OS logs
Rdbms alert.log reports: Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_mmon_26635.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 16384 WARNING: Write Failed. group:1 disk:1 AU:1140 offset:638976 size:16384 Fri Oct 18 03:15:48 2013 Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_lgwr_26623.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 2560 WARNING: Write Failed. group:1 disk:1 AU:2828 offset:3584 size:2560 Fri Oct 18 03:15:47 2013 Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_ckpt_26625.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 16384 WARNING: Write Failed. group:1 disk:1 AU:1127 offset:49152 size:16384 Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_ckpt_26625.trc: ORA-15080: synchronous I/O operation to a disk failed WARNING: failed to write mirror side 2 of virtual extent 0 logical extent 1 of file 260 in group 1 on disk 1 allocation unit 1127 Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_lgwr_26623.trc: CRS alert.log reports that we have problems with disk /dev/oracleasm/disks/DATA2 with an active vote file [cssd(4294)]CRS-1649:An I/O error occured for voting file: /dev/oracleasm/disks/DATA2; details at (:CSSNM00059:) in /u01/app/11203/grid/log/grac1/cssd/ocssd.log. 2013-10-18 03:15:42.850 [cssd(4294)]CRS-1649:An I/O error occured for voting file: /dev/oracleasm/disks/DATA2; details at (:CSSNM00060:) in /u01/app/11203/grid/log/grac1/cssd/ocssd.log. 2013-10-18 03:16:07.872 [cssd(4294)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file /dev/oracleasm/disks/DATA2 will be considered not functional in 49850 milliseconds 2013-10-18 03:16:37.882 [cssd(4294)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file /dev/oracleasm/disks/DATA2 will be considered not functional in 19840 milliseconds 2013-10-18 03:16:48.456 [cssd(4294)]CRS-1626:A Configuration change request completed successfully 2013-10-18 03:16:48.466 [cssd(4294)]CRS-1601:CSSD Reconfiguration complete. Active nodes are grac1 grac2 . 2013-10-18 03:41:37.527 . Get an overview from OCSSD log $ grep voting /u01/app/11203/grid/log/grac1/cssd/ocssd.log 2013-10-17 15:10:38.470: [ CSSD][4259710720] Listing unique IDs for 3 voting files: 2013-10-17 15:10:38.470: [ CSSD][4259710720] voting file 1: 7a73147a-81dc4f71-bfc87573-43aee181 2013-10-17 15:10:38.470: [ CSSD][4259710720] voting file 2: 25fcfbdb-854a4f49-bf0addd0-fa32d0a2 2013-10-17 15:10:38.470: [ CSSD][4259710720] voting file 3: 4cfccf65-b6e24ff0-bfd69b60-297f87c4 2013-10-17 15:10:38.965: [ CSSD][4277974784]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/disks/DATA3 now online 2013-10-17 15:10:39.013: [ CSSD][4277974784]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/disks/DATA2 now online 2013-10-17 15:10:39.081: [ CSSD][4277974784]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/disks/DATA1 now online 2013-10-18 03:15:41.852: [ CSSD][4248069888]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/disks/DATA2 now offline 2013-10-18 03:16:48.258: [ CSSD][4262254336] Listing unique IDs for 2 voting files: 2013-10-18 03:16:48.258: [ CSSD][4262254336] voting file 1: 25fcfbdb-854a4f49-bf0addd0-fa32d0a2 2013-10-18 03:16:48.258: [ CSSD][4262254336] voting file 2: 4cfccf65-b6e24ff0-bfd69b60-297f87c4 2013-10-18 03:16:48.456: [ CSSD][4232300288] Listing unique IDs for 2 voting files: 2013-10-18 03:16:48.456: [ CSSD][4232300288] voting file 1: 25fcfbdb-854a4f49-bf0addd0-fa32d0a2 2013-10-18 03:16:48.456: [ CSSD][4232300288] voting file 2: 4cfccf65-b6e24ff0-bfd69b60-297f87c4 2013-10-18 03:16:48.456: [ CSSD][4232300288]clssnmSetParamsFromConfig: remote SIOT 27000, local SIOT 27000, LIOT 200000, misstime 30000, reboottime 3000, impending misstime 15000, voting file reopen delay 4000 2013-10-18 03:16:48.918: [ CSSD][4249646848]clssnmvDiskDestroy: removing the voting disk /dev/oracleasm/disks/DATA2 Check for details from OCSSD log: 2013-10-18 03:16:48.456: [ CSSD][4232300288]clssnmCompleteConfigChange: Committed configuration change for CIN 0:1382059008:0 2013-10-18 03:16:48.456: [ CSSD][4232300288] misscount 30 reboot latency 3 2013-10-18 03:16:48.456: [ CSSD][4232300288] long I/O timeout 200 short I/O timeout 27 2013-10-18 03:16:48.456: [ CSSD][4232300288] diagnostic wait 0 active version 11.2.0.3.0 2013-10-18 03:16:48.456: [ CSSD][4232300288] Listing unique IDs for 2 voting files: 2013-10-18 03:16:48.456: [ CSSD][4232300288] voting file 1: 25fcfbdb-854a4f49-bf0addd0-fa32d0a2 2013-10-18 03:16:48.456: [ CSSD][4232300288] voting file 2: 4cfccf65-b6e24ff0-bfd69b60-297f87c4 2013-10-18 03:16:48.456: [ CSSD][4232300288]clssnmSetParamsFromConfig: remote SIOT 27000, local SIOT 27000, LIOT 200000, misstime 30000, reboottime 3000, impending misstime 15000, voting file reopen delay 4000 2013-10-18 03:16:48.456: [ CSSD][4232300288]clssnmvDiskStateChange: state from configured to deconfigured disk /dev/oracleasm/disks/DATA2 2013-10-18 03:16:48.456: [ CSSD][4232300288]clssnmCompleteGMReq: Completed request type 1 with status 1 Verify OS log /var/log/messages reports disks errors at this time Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Unhandled error code Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 00 08 00 4f 00 02 00 00 Oct 18 03:15:41 grac1 kernel: end_request: I/O error, dev sdc, sector 524367 Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Unhandled error code Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 00 08 00 50 00 00 01 00 Oct 18 03:15:41 grac1 kernel: end_request: I/O error, dev sdc, sector 524368 Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Unhandled error code Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 00 23 a5 1f 00 00 20 00 Oct 18 03:15:41 grac1 kernel: end_request: I/O error, dev sdc, sector 2336031 Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] READ CAPACITY(16) failed Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Sense not available. Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] READ CAPACITY failed
- At this stage we know we an IO problem with a diskgroup servicing a voting disk
Relocate Voting disk and add a 2.nd OCR location before touching this risky ASM diskgroup
$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 25fcfbdb854a4f49bf0addd0fa32d0a2 (/dev/oracleasm/disks/DATA3) [DATA] 2. ONLINE 4cfccf65b6e24ff0bfd69b60297f87c4 (/dev/oracleasm/disks/DATA1) [DATA] Located 2 voting disk(s). --> As expected we can only see 2 voting disk Replace the Voting disk to a new diskgroup $ crsctl replace votedisk +OCR Successful addition of voting disk 491496d5c8fd4f2bbfcf7a004a362538. Successful addition of voting disk 4413ebdbad0c4f5bbf5644a8e75a076d. Successful addition of voting disk 9b7d7440f2eb4fbabf23c6f2ca92471c. Successful deletion of voting disk 25fcfbdb854a4f49bf0addd0fa32d0a2. Successful deletion of voting disk 4cfccf65b6e24ff0bfd69b60297f87c4. Successfully replaced voting disk group with +OCR. CRS-4266: Voting file(s) successfully replaced $ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 491496d5c8fd4f2bbfcf7a004a362538 (/dev/oracleasm/disks/OCR1) [OCR] 2. ONLINE 4413ebdbad0c4f5bbf5644a8e75a076d (/dev/oracleasm/disks/OCR2) [OCR] 3. ONLINE 9b7d7440f2eb4fbabf23c6f2ca92471c (/dev/oracleasm/disks/OCR3) [OCR] Located 3 voting disk(s). --> We have again 3 voting disk Add a 2.nd OCR location $ $GRID_HOME/bin/ocrconfig -add +OCR $ $GRID_HOME/bin/ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 3400 Available space (kbytes) : 258720 ID : 29535125 Device/File Name : +DATA Device/File integrity check succeeded Device/File Name : +OCR Device/File integrity check succeeded
Add the disk back to our diskgroup
If you try to use the old media ( maybe because only your controller was changed or you use rreuse a Virtual disk from Virtualbox) you may get the following errors: SQL> ALTER DISKGROUP DATA drop disk '/dev/oracleasm/disks/DATA2'; ALTER DISKGROUP DATA drop disk '/dev/oracleasm/disks/DATA2' ERROR at line 1: ORA-15032: not all alterations performed ORA-15054: disk "/DEV/ORACLEASM/DISKS/DATA2" does not exist in diskgroup "DATA" But if you try to add the disk to the diskgroup you get following error: SQL> ALTER DISKGROUP DATA add disk '/dev/oracleasm/disks/DATA2'; ALTER DISKGROUP DATA add disk '/dev/oracleasm/disks/DATA2' ERROR at line 1: ORA-15032: not all alterations performed ORA-15033: disk '/dev/oracleasm/disks/DATA2' belongs to diskgroup "DATA" Fix: Be carefull as the next step can destroy your data. Double check that you know what you are doing. To avoid dd of='/dev/oracleasm/disks/DATA2' .. command to clean up the disk header use SQL> ALTER DISKGROUP DATA add disk '/dev/oracleasm/disks/DATA2' force; $ asmcmd lsdg -g DATA Inst_ID State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name 1 MOUNTED NORMAL Y 512 4096 1048576 15342 9454 5114 -386 1 N DATA/ 3 MOUNTED NORMAL Y 512 4096 1048576 15342 9454 5114 -386 1 N DATA/ 2 MOUNTED NORMAL Y 512 4096 1048576 15342 9454 5114 -386 1 N DATA/ --> Here our diskgroup is back again If you wonder about negative values for Usable_file_MB go on reading this page.