Recover from a lost ASM disk with a voting disk

Verify ASM disk status by check RDBMS  alert.log, clusterware alert.log, OCSSD trace and OS logs

Rdbms alert.log reports:
Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_mmon_26635.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 16384
WARNING: Write Failed. group:1 disk:1 AU:1140 offset:638976 size:16384
Fri Oct 18 03:15:48 2013
Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_lgwr_26623.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 2560
WARNING: Write Failed. group:1 disk:1 AU:2828 offset:3584 size:2560
Fri Oct 18 03:15:47 2013
Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_ckpt_26625.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 16384
WARNING: Write Failed. group:1 disk:1 AU:1127 offset:49152 size:16384
Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_ckpt_26625.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 2 of virtual extent 0 logical extent 1 of file 260 in group 1 on disk 1 allocation unit 1127 
Errors in file /u01/app/oracle/diag/rdbms/grace2/grac21/trace/grac21_lgwr_26623.trc:

CRS alert.log   reports that we have problems with disk /dev/oracleasm/disks/DATA2 with an active vote file 
[cssd(4294)]CRS-1649:An I/O error occured for voting file: /dev/oracleasm/disks/DATA2; details at (:CSSNM00059:) in /u01/app/11203/grid/log/grac1/cssd/ocssd.log.
2013-10-18 03:15:42.850
[cssd(4294)]CRS-1649:An I/O error occured for voting file: /dev/oracleasm/disks/DATA2; details at (:CSSNM00060:) in /u01/app/11203/grid/log/grac1/cssd/ocssd.log.
2013-10-18 03:16:07.872
[cssd(4294)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file /dev/oracleasm/disks/DATA2 will be considered not functional in 49850 milliseconds
2013-10-18 03:16:37.882
[cssd(4294)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file /dev/oracleasm/disks/DATA2 will be considered not functional in 19840 milliseconds
2013-10-18 03:16:48.456
[cssd(4294)]CRS-1626:A Configuration change request completed successfully
2013-10-18 03:16:48.466
[cssd(4294)]CRS-1601:CSSD Reconfiguration complete. Active nodes are grac1 grac2 .
2013-10-18 03:41:37.527
.
Get an overview from OCSSD log
$ grep voting /u01/app/11203/grid/log/grac1/cssd/ocssd.log
2013-10-17 15:10:38.470: [    CSSD][4259710720]  Listing unique IDs for 3 voting files:
2013-10-17 15:10:38.470: [    CSSD][4259710720]    voting file 1: 7a73147a-81dc4f71-bfc87573-43aee181
2013-10-17 15:10:38.470: [    CSSD][4259710720]    voting file 2: 25fcfbdb-854a4f49-bf0addd0-fa32d0a2
2013-10-17 15:10:38.470: [    CSSD][4259710720]    voting file 3: 4cfccf65-b6e24ff0-bfd69b60-297f87c4
2013-10-17 15:10:38.965: [    CSSD][4277974784]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/disks/DATA3 now online
2013-10-17 15:10:39.013: [    CSSD][4277974784]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/disks/DATA2 now online
2013-10-17 15:10:39.081: [    CSSD][4277974784]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/disks/DATA1 now online
2013-10-18 03:15:41.852: [    CSSD][4248069888]clssnmvDiskAvailabilityChange: voting file /dev/oracleasm/disks/DATA2 now offline
2013-10-18 03:16:48.258: [    CSSD][4262254336]  Listing unique IDs for 2 voting files:
2013-10-18 03:16:48.258: [    CSSD][4262254336]    voting file 1: 25fcfbdb-854a4f49-bf0addd0-fa32d0a2
2013-10-18 03:16:48.258: [    CSSD][4262254336]    voting file 2: 4cfccf65-b6e24ff0-bfd69b60-297f87c4
2013-10-18 03:16:48.456: [    CSSD][4232300288]  Listing unique IDs for 2 voting files:
2013-10-18 03:16:48.456: [    CSSD][4232300288]    voting file 1: 25fcfbdb-854a4f49-bf0addd0-fa32d0a2
2013-10-18 03:16:48.456: [    CSSD][4232300288]    voting file 2: 4cfccf65-b6e24ff0-bfd69b60-297f87c4
2013-10-18 03:16:48.456: [    CSSD][4232300288]clssnmSetParamsFromConfig: remote SIOT 27000, local SIOT 27000, LIOT 200000, misstime 30000, reboottime 3000, impending misstime 15000, voting file reopen delay 4000
2013-10-18 03:16:48.918: [    CSSD][4249646848]clssnmvDiskDestroy: removing the voting disk /dev/oracleasm/disks/DATA2

Check for details from OCSSD log:
2013-10-18 03:16:48.456: [    CSSD][4232300288]clssnmCompleteConfigChange: Committed configuration change for CIN 0:1382059008:0
2013-10-18 03:16:48.456: [    CSSD][4232300288]  misscount          30    reboot latency      3
2013-10-18 03:16:48.456: [    CSSD][4232300288]  long I/O timeout  200    short I/O timeout  27
2013-10-18 03:16:48.456: [    CSSD][4232300288]  diagnostic wait     0  active version 11.2.0.3.0
2013-10-18 03:16:48.456: [    CSSD][4232300288]  Listing unique IDs for 2 voting files:
2013-10-18 03:16:48.456: [    CSSD][4232300288]    voting file 1: 25fcfbdb-854a4f49-bf0addd0-fa32d0a2
2013-10-18 03:16:48.456: [    CSSD][4232300288]    voting file 2: 4cfccf65-b6e24ff0-bfd69b60-297f87c4
2013-10-18 03:16:48.456: [    CSSD][4232300288]clssnmSetParamsFromConfig: remote SIOT 27000, local SIOT 27000, LIOT 200000, misstime 30000, reboottime 3000, impending misstime 15000, voting file reopen delay 4000
2013-10-18 03:16:48.456: [    CSSD][4232300288]clssnmvDiskStateChange: state from configured to deconfigured disk /dev/oracleasm/disks/DATA2
2013-10-18 03:16:48.456: [    CSSD][4232300288]clssnmCompleteGMReq: Completed request type 1 with status 1

Verify OS log /var/log/messages reports disks errors at this time
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Unhandled error code
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 00 08 00 4f 00 02 00 00
Oct 18 03:15:41 grac1 kernel: end_request: I/O error, dev sdc, sector 524367
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Unhandled error code
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 00 08 00 50 00 00 01 00
Oct 18 03:15:41 grac1 kernel: end_request: I/O error, dev sdc, sector 524368
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Unhandled error code
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 00 23 a5 1f 00 00 20 00
Oct 18 03:15:41 grac1 kernel: end_request: I/O error, dev sdc, sector 2336031
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] READ CAPACITY(16) failed
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] Sense not available.
Oct 18 03:15:41 grac1 kernel: sd 4:0:0:0: [sdc] READ CAPACITY failed
  • At this stage we know we an IO problem with a diskgroup servicing a voting disk

 

Relocate Voting disk and add a 2.nd OCR location before touching this risky ASM diskgroup

$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   25fcfbdb854a4f49bf0addd0fa32d0a2 (/dev/oracleasm/disks/DATA3) [DATA]
 2. ONLINE   4cfccf65b6e24ff0bfd69b60297f87c4 (/dev/oracleasm/disks/DATA1) [DATA]
Located 2 voting disk(s).
--> As expected we can only see 2 voting disk 

Replace the Voting disk to a new diskgroup 
$ crsctl  replace votedisk +OCR
Successful addition of voting disk 491496d5c8fd4f2bbfcf7a004a362538.
Successful addition of voting disk 4413ebdbad0c4f5bbf5644a8e75a076d.
Successful addition of voting disk 9b7d7440f2eb4fbabf23c6f2ca92471c.
Successful deletion of voting disk 25fcfbdb854a4f49bf0addd0fa32d0a2.
Successful deletion of voting disk 4cfccf65b6e24ff0bfd69b60297f87c4.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced

$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   491496d5c8fd4f2bbfcf7a004a362538 (/dev/oracleasm/disks/OCR1) [OCR]
 2. ONLINE   4413ebdbad0c4f5bbf5644a8e75a076d (/dev/oracleasm/disks/OCR2) [OCR]
 3. ONLINE   9b7d7440f2eb4fbabf23c6f2ca92471c (/dev/oracleasm/disks/OCR3) [OCR]
Located 3 voting disk(s).
--> We have again 3 voting disk 

Add a 2.nd OCR location
$ $GRID_HOME/bin/ocrconfig -add +OCR
$ $GRID_HOME/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
     Version                  :          3
     Total space (kbytes)     :     262120
     Used space (kbytes)      :       3400
     Available space (kbytes) :     258720
     ID                       :   29535125
     Device/File Name         :      +DATA
                                    Device/File integrity check succeeded
     Device/File Name         :       +OCR
                                    Device/File integrity check succeeded

 

Add the disk back to our diskgroup

If you try to use the old media ( maybe because only your controller was changed or you use 
rreuse a Virtual disk from Virtualbox) you may get the following errors:
SQL> ALTER DISKGROUP DATA drop disk  '/dev/oracleasm/disks/DATA2';
ALTER DISKGROUP DATA drop disk    '/dev/oracleasm/disks/DATA2'
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15054: disk "/DEV/ORACLEASM/DISKS/DATA2" does not exist in diskgroup "DATA"

But if you try  to add the disk to the diskgroup you get following error:
SQL>  ALTER DISKGROUP DATA add  disk  '/dev/oracleasm/disks/DATA2';
 ALTER DISKGROUP DATA add  disk  '/dev/oracleasm/disks/DATA2'
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/oracleasm/disks/DATA2' belongs to diskgroup "DATA"

Fix: 
Be carefull as the next step can destroy your data. Double check that you know what you are doing. 
To avoid dd of='/dev/oracleasm/disks/DATA2' .. command to clean up  the disk header use 
SQL> ALTER DISKGROUP DATA  add  disk  '/dev/oracleasm/disks/DATA2' force;

$ asmcmd lsdg -g DATA
Inst_ID  State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
      1  MOUNTED  NORMAL  Y         512   4096  1048576     15342     9454             5114            -386              1             N  DATA/
      3  MOUNTED  NORMAL  Y         512   4096  1048576     15342     9454             5114            -386              1             N  DATA/
      2  MOUNTED  NORMAL  Y         512   4096  1048576     15342     9454             5114            -386              1             N  DATA/
--> Here our diskgroup is back again  
If you wonder about negative values for   Usable_file_MB go on reading this page.

 

References

Leave a Reply

Your email address will not be published. Required fields are marked *