Node Eviction due to failed DHB ( Disk Heartbeat )
Triggering a DHB Note Eviction on Node grac2 :
Disable the first voting disk ( we still have 2 remaining votedisks - no eviction should occur ) ) # echo 1 > /sys/block/sdg/device/delete Monitor $GRID_HOME/log/alertgrac2.log 2013-08-02 18:53:26.148 cssd(21694)]CRS-1649:An I/O error occured for voting file: /dev/oracleasm/disks/OCR1; details at (:CSSNM00059:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log [cssd(21694)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file /dev/oracleasm/disks/OCR1 will be considered not functional in 99340 milliseconds 2013-08-02 18:55:55.848 [cssd(21694)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file /dev/oracleasm/disks/OCR1 will be considered not functional in 49320 milliseconds [cssd(21694)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file /dev/oracleasm/disks/OCR1 will be considered not functional in 19310 milliseconds 2013-08-02 18:56:45.871 [cssd(21694)]CRS-1604:CSSD voting file is offline: /dev/oracleasm/disks/OCR1; details at (:CSSNM00058:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log. Check voting disk status: # $GRID_HOME/bin/crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. OFFLINE 5c9b3326e5a74f5abf9d300f3e76c6d0 (/dev/oracleasm/disks/OCR1) [OCR] 2. ONLINE a6a0becb42874faabf14a50ab74996c2 (/dev/oracleasm/disks/OCR2) [OCR] 3. ONLINE 417df0660a424fb1bf2db116acfdb47b (/dev/oracleasm/disks/OCR3) [OCR] Disable the 2.nd voting disk ( eviction takes place ) # echo 1 > /sys/block/sdg/device/delete Monitor $GRID_HOME/log/alertgrac2.log [cssd(21694)]CRS-1649:An I/O error occured for voting file: /dev/oracleasm/disks/OCR3; details at (:CSSNM00059:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log. 2013-08-02 19:43:06.212 [cssd(21694)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file /dev/oracleasm/disks/OCR3 will be considered not functional in 99760 milliseconds 2013-08-02 19:43:56.239 [cssd(21694)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file /dev/oracleasm/disks/OCR3 will be considered not functional in 49740 milliseconds [cssd(21694)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file /dev/oracleasm/disks/OCR1 will be considered not functional in 19310 milliseconds 2013-08-02 18:56:45.871 [cssd(21694)]CRS-1604:CSSD voting file is offline: /dev/oracleasm/disks/OCR1; details at (:CSSNM00058:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log. c2013-08-02 19:22:20.836 [ctssd(21782)]CRS-2409:The clock on host grac2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. 2013-08-02 19:41:26.958 [cssd(21694)]CRS-1649:An I/O error occured for voting file: /dev/oracleasm/disks/OCR3; details at (:CSSNM00059:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log. 2013-08-02 19:43:06.212 [cssd(21694)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file /dev/oracleasm/disks/OCR3 will be considered not functional in 99760 milliseconds 2013-08-02 19:43:56.239 [cssd(21694)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file /dev/oracleasm/disks/OCR3 will be considered not functional in 49740 milliseconds 2013-08-02 19:44:26.252 [cssd(21694)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file /dev/oracleasm/disks/OCR3 will be considered not functional in 19730 milliseconds 2013-08-02 19:44:46.265 [cssd(21694)]CRS-1604:CSSD voting file is offline: /dev/oracleasm/disks/OCR3; details at (:CSSNM00058:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log. 2013-08-02 19:44:46.265 [cssd(21694)]CRS-1606:The number of voting files available, 1, is less than the minimum number of voting files required, 2, resulting in CSSD termination to ensure data integrity; details at (:CSSNM00018:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log 2013-08-02 19:44:46.265 [cssd(21694)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log 2013-08-02 19:44:46.309 [cssd(21694)]CRS-1652:Starting clean up of CRSD resources. 2013-08-02 19:44:47.792 [/u01/app/11203/grid/bin/oraagent.bin(22056)]CRS-5016:Process "/u01/app/11203/grid/opmn/bin/onsctli" spawned by agent "/u01/app/11203/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11203/grid/log/grac2/agent/crsd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:44:48.422 [/u01/app/11203/grid/bin/oraagent.bin(22056)]CRS-5016:Process "/u01/app/11203/grid/bin/lsnrctl" spawned by agent "/u01/app/11203/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11203/grid/log/grac2/agent/crsd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:44:48.423 [/u01/app/11203/grid/bin/oraagent.bin(22056)]CRS-5016:Process "/u01/app/11203/grid/bin/lsnrctl" spawned by agent "/u01/app/11203/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11203/grid/log/grac2/agent/crsd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:44:48.430 [cssd(21694)]CRS-1654:Clean up of CRSD resources finished successfully. 2013-08-02 19:44:48.431 [cssd(21694)]CRS-1655:CSSD on node grac2 detected a problem and started to shutdown. 2013-08-02 19:44:48.487 [ohasd(21483)]CRS-2765:Resource 'ora.crsd' has failed on server 'grac2'. 2013-08-02 19:44:48.509 [/u01/app/11203/grid/bin/oraagent.bin(22056)]CRS-5822:Agent '/u01/app/11203/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:5} in /u01/app/11203/grid/log/grac2/agent/crsd/oraagent_grid/oraagent_grid.log. 2013-08-02 19:44:48.510 [/u01/app/11203/grid/bin/orarootagent.bin(22073)]CRS-5822:Agent '/u01/app/11203/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:86} in /u01/app/11203/grid/log/grac2/agent/crsd/orarootagent_root/orarootagent_root.log. 2013-08-02 19:44:48.512 [/u01/app/11203/grid/bin/oraagent.bin(22296)]CRS-5822:Agent '/u01/app/11203/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:5:299} in /u01/app/11203/grid/log/grac2/agent/crsd/oraagent_oracle/oraagent_oracle.log. 2013-08-02 19:44:51.916 [crsd(26171)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109:) in /u01/app/11203/grid/log/grac2/crsd/crsd.log. 2013-08-02 19:44:51.935 [ohasd(21483)]CRS-2765:Resource 'ora.ctssd' has failed on server 'grac2'. 2013-08-02 19:44:52.158 [ohasd(21483)]CRS-2765:Resource 'ora.evmd' has failed on server 'grac2'. 2013-08-02 19:44:52.340 [/u01/app/11203/grid/bin/oraagent.bin(21601)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11203/grid/log/grac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:44:52.517 [ohasd(21483)]CRS-2765:Resource 'ora.asm' has failed on server 'grac2'. 2013-08-02 19:44:52.969 [ctssd(26187)]CRS-2402:The Cluster Time Synchronization Service aborted on host grac2. Details at (:ctss_css_init1:) in /u01/app/11203/grid/log/grac2/ctssd/octssd.log. 2013-08-02 19:44:56.940 [ohasd(21483)]CRS-2765:Resource 'ora.crsd' has failed on server 'grac2'. 2013-08-02 19:44:58.212 [ctssd(26221)]CRS-2402:The Cluster Time Synchronization Service aborted on host grac2. Details at (:ctss_css_init1:) in /u01/app/11203/grid/log/grac2/ctssd/octssd.log. 2013-08-02 19:45:03.254 [ctssd(26233)]CRS-2402:The Cluster Time Synchronization Service aborted on host grac2. Details at (:ctss_css_init1:) in /u01/app/11203/grid/log/grac2/ctssd/octssd.log. 2013-08-02 19:45:04.740 [ohasd(21483)]CRS-2765:Resource 'ora.cssd' has failed on server 'grac2'. 2013-08-02 19:45:04.761 [/u01/app/11203/grid/bin/oraagent.bin(21601)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11203/grid/log/grac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:45:04.767 [ohasd(21483)]CRS-2765:Resource 'ora.cluster_interconnect.haip' has failed on server 'grac2'. 2013-08-02 19:45:04.941 [/u01/app/11203/grid/bin/oraagent.bin(21601)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11203/grid/log/grac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:45:05.118 [/u01/app/11203/grid/bin/oraagent.bin(21601)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11203/grid/log/grac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:45:06.593 [ohasd(21483)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'grac2'. 2013-08-02 19:45:07.277 [/u01/app/11203/grid/bin/oraagent.bin(21601)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11203/grid/log/grac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:45:07.488 [/u01/app/11203/grid/bin/oraagent.bin(21601)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11203/grid/log/grac2/agent/ohasd/oraagent_grid/oraagent_grid.log" 2013-08-02 19:45:13.144 [cssd(26283)]CRS-1713:CSSD daemon is started in clustered mode 2013-08-02 19:45:13.187 [cssd(26283)]CRS-1637:Unable to locate configured voting file with ID 5c9b3326-e5a74f5a-bf9d300f-3e76c6d0; details at (:CSSNM00020:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log 2013-08-02 19:45:13.187 [cssd(26283)]CRS-1637:Unable to locate configured voting file with ID 417df066-0a424fb1-bf2db116-acfdb47b; details at (:CSSNM00020:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log 2013-08-02 19:45:13.187 [cssd(26283)]CRS-1705:Found 1 configured voting files but 2 voting files are required, terminating to ensure data integrity; details at (:CSSNM00021:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log 2013-08-02 19:45:13.188 [cssd(26283)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11203/grid/log/grac2/cssd/ocssd.log 2013-08-02 19:45:13.200 [cssd(26283)]CRS-1603:CSSD on node grac2 shutdown by user.
CRS errors reported in CRS alert.log
Node grac3 was triggered for voting disks errors: CRS Alert Log Events: 2013-10-24 09:50:54 No I/O has completed after 50% of the maximum interval files/alertgrac3.log [cssd(3002)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file /dev/oracleasm/disks/OCR3 will be considered not functional in 99010 milliseconds 2013-10-24 09:52:36 No I/O has completed after 50% of the maximum interval files/alertgrac3.log [cssd(3002)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file /dev/oracleasm/disks/OCR2 will be considered not functional in 99550 milliseconds --> here we have lost 2 of your 3 voting disks 2013-10-24 09:54:16 The CSS daemon is terminating due to a fatal error files/alertgrac3.log [cssd(3002)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11203/grid/log/grac3/cssd/ocssd.log 2013-10-24 09:55:06 The CSS daemon is terminating due to a fatal error files/alertgrac3.log [cssd(2203)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11203/grid/log/grac3/cssd/ocssd.log 2013-10-24 09:55:19 CRS-1603:CSSD on node %s shutdown by user. files/alertgrac3.log [cssd(3395)]CRS-1603:CSSD on node grac3 shutdown by user. 2013-10-24 09:55:19 The CSS daemon is terminating due to a fatal error files/alertgrac3.log [cssd(3395)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11203/grid/log/grac3/cssd/ocssd.log 2013-10-24 09:56:08 Node %s, number %n, was manually shut down grac1/alertgrac1.log [cssd(3748)]CRS-1625:Node grac3, number 3, was manually shut down 2013-10-24 09:56:08 CRS-1601:CSSD Reconfiguration complete. Active nodes are %s grac1/alertgrac1.log [cssd(3748)]CRS-1601:CSSD Reconfiguration complete. Active nodes are grac1 grac2 . 2013-10-24 09:56:08 CRS-1601:CSSD Reconfiguration complete. Active nodes are %s grac1/alertgrac1.log [cssd(3748)]CRS-1601:CSSD Reconfiguration complete. Active nodes are grac1 grac2 . 2013-10-24 09:56:08 Node %s, number %n, was manually shut down grac1/alertgrac1.log [cssd(3748)]CRS-1625:Node grac3, number 3, was manually shut down --> grac1 reports that we have lost node grac3 2013-10-24 09:56:35 CRS-1601:CSSD Reconfiguration complete. Active nodes are %s files/alertgrac2.log [cssd(2977)]CRS-1601:CSSD Reconfiguration complete. Active nodes are grac1 grac2 . 2013-10-24 09:56:35 Node %s, number %n, was manually shut down files/alertgrac2.log [cssd(2977)]CRS-1625:Node grac3, number 3, was manually shut down 2013-10-24 10:05:30 CRS-1603:CSSD on node %s shutdown by user. files/alertgrac3.log [cssd(3190)]CRS-1603:CSSD on node grac3 shutdown by user. 2013-10-24 10:05:30 The CSS daemon is terminating due to a fatal error files/alertgrac3.log [cssd(3190)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11203/grid/log/grac3/cssd/ocssd.log --> grac2 reports that we have lost node grac3 Node Eviction Events: 2013-10-24 09:54:16 Voting disks unavailable cssd/ocssd.log Voting disks unavailable 2013-10-24 09:54:16.582: [ CSSD][2156365568](:CSSNM00018:)clssnmvDiskCheck: Aborting, 1 of 3 configured voting disks available, need 2 2013-10-24 09:54:16 Voting disks unavailable cssd/ocssd.log Voting disks unavailable 2013-10-24 09:54:16.582: [ CSSD][2156365568](:CSSNM00018:)clssnmvDiskCheck: Aborting, 1 of 3 configured voting disks available, need 2 2013-10-24 09:54:16 Voting disks unavailable files/alertgrac3.log Voting disks unavailable [cssd(3002)]CRS-1606:The number of voting files available, 1, is less than the minimum number of voting files required, 2, resulting in CSSD termination to ensure data integrity; details at (:CSSNM00018:) in /u01/app/11203/grid/log/grac3/cssd/ocssd.log