Table of Contents
What to do first ?
- Check your disk space using: # df
- Check whether your are a firewall: # service iptables status ( <— this command is very important )
- Use Nslookup and ping to verify you Cluster Interconnect
Scenario 1: Wrong IP Address
Errors: GIPC repot error [29] msg [gipcretConnectionRefused] CHM report clsu_get_private_ip failed Check CRS status [root@grac41 Desktop]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4529: Cluster Synchronization Services is online CRS-4534: Cannot communicate with Event Manager [root@grac41 network-scripts]# my_crs_stat_init NAME TARGET STATE SERVER STATE_DETAILS ------------------------- ---------- ---------- ------------ ------------------ ora.asm ONLINE OFFLINE Instance Shutdown ora.cluster_interconnect.haip ONLINE OFFLINE ora.crf ONLINE ONLINE grac41 ora.crsd ONLINE OFFLINE ora.cssd ONLINE UNKNOWN grac41 ora.cssdmonitor ONLINE ONLINE grac41 ora.ctssd ONLINE OFFLINE ora.diskmon OFFLINE OFFLINE ora.drivers.acfs ONLINE ONLINE grac41 ora.evmd ONLINE OFFLINE ora.gipcd ONLINE ONLINE grac41 ora.gpnpd ONLINE ONLINE grac41 ora.mdnsd ONLINE ONLINE grac41 --> ASM, HAIP, CRSD, CTSSD, DISKMON, EVMD resource are OFFLINE ! Check traces - ohasd trace file [root@grac41 ohasd]# cat ohasd.log | grep -i failed 2014-04-22 15:09:17.966: [ AGFW][2735122176]{0:0:2} ora.cluster_interconnect.haip 1 1 received state from probe request. Old state = UNKNOWN, New state = FAILED 2014-04-22 15:09:30.292: [ GPNP][2745628416]clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result: (26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile. 2014-04-22 15:09:30.602: [ GPNP][2717640448]clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result: (26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile. --> HAIP goes to FAILED status Try to find any repeating updated tracefiles - maybe some RAC process tries to fix the network problem [grid@grac41 grac41]$ date; find . -type f -printf "%CY-%Cm-%Cd %CH:%CM:%CS %h/%f\n" | sort -n | tail -5 Tue Apr 22 13:24:40 CEST 2014 2014-04-22 13:24:30.0571859790 ./gpnpd/gpnpd.log 2014-04-22 13:24:33.0756944610 ./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log 2014-04-22 13:24:38.0881994320 ./ohasd/ohasd.log 2014-04-22 13:24:38.3523314350 ./gipcd/gipcd.log 2014-04-22 13:24:39.0876989250 ./crfmond/crfmond.log [grid@grac41 grac41]$ date; find . -type f -printf "%CY-%Cm-%Cd %CH:%CM:%CS %h/%f\n" | sort -n | tail -5 Tue Apr 22 13:24:43 CEST 2014 2014-04-22 13:24:30.0571859790 ./gpnpd/gpnpd.log 2014-04-22 13:24:33.0756944610 ./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log 2014-04-22 13:24:43.1007044060 ./ohasd/ohasd.log 2014-04-22 13:24:43.3668374000 ./gipcd/gipcd.log 2014-04-22 13:24:43.7580328990 ./crfmond/crfmond.log [grid@grac41 grac41]$ date; find . -type f -printf "%CY-%Cm-%Cd %CH:%CM:%CS %h/%f\n" | sort -n | tail -5 Tue Apr 22 13:24:47 CEST 2014 2014-04-22 13:24:30.0571859790 ./gpnpd/gpnpd.log 2014-04-22 13:24:33.0756944610 ./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log 2014-04-22 13:24:43.1007044060 ./ohasd/ohasd.log 2014-04-22 13:24:44.0972023860 ./crfmond/crfmond.log 2014-04-22 13:24:46.4033548850 ./gipcd/gipcd.log --> Here we cans see that ./ohasd/ohasd.log ./gipcd/gipcd.log ./crfmond/crfmond.log Use tail to see what's going : [grid@grac41 grac41]$ tail -f ./gpnpd/gpnpd.log 2014-04-22 13:19:59.175: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:21:29.469: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:22:59.792: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:24:30.057: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:26:00.383: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:27:30.622: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:29:00.869: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:30:31.203: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:32:01.459: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] 2014-04-22 13:33:31.770: [ OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused] [grid@grac41 grac41]$ tail -f ./ohasd/ohasd.log 2014-04-22 13:33:42.806: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:33:47.817: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:33:52.839: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:33:57.848: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:34:03.859: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:34:09.874: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:34:15.881: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:34:20.900: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:34:25.920: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2014-04-22 13:34:30.934: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd [grid@grac41 grac41]$ tail -f ./crfmond/crfmond.log [ CLWAL][467654400]clsw_Initialize: OLR initlevel [70000] 2014-04-22 13:34:49.349: [ CRFM][467654400]crfm_connstr: clsu_get_private_ip failed(7). 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_connect_to: send fail(gipcret: 13) 2014-04-22 13:34:49.458: [ CRFM][467654400]crfmctx dump follows 2014-04-22 13:34:49.458: [ CRFM][467654400]**************************** 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: connection local name: tcp://0.0.0.0:45871 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: connection peer name: tcp://192.168.1.101:61021 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: connaddr: tcp://grac41:61021 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: ctype: 2 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: mytype: 0 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: hostname grac41 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: myport: 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: rhostname 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: rport: 2014-04-22 13:34:49.458: [ CRFM][467654400]crfm_dumpctx: flags: 1 2014-04-22 13:34:49.458: [ CRFM][467654400]**************************** According to above traces we can see that clsu_get_private_ip failed getting private IP tcp://192.168.1.101 Check Network status and DNS [root@grac41 Desktop]# ifconfig eth1 Link encap:Ethernet HWaddr 08:00:27:89:E9:A2 inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe89:e9a2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:17148 errors:0 dropped:0 overruns:0 frame:0 TX packets:13307 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:22041591 (21.0 MiB) TX bytes:1211055 (1.1 MiB) Interrupt:9 Base address:0xd240 eth2 Link encap:Ethernet HWaddr 08:00:27:6B:E2:BD inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe6b:e2bd/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:17517 errors:0 dropped:0 overruns:0 frame:0 TX packets:13475 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:22191772 (21.1 MiB) TX bytes:1230703 (1.1 MiB) Interrupt:5 Base address:0xd260 --> Check public and private interface for errors / Looks good [root@grac41 Desktop]# nslookup grac41 Name: grac41.example.com Address: 192.168.1.101 [root@grac41 Desktop]# nslookup grac41int Name: grac41int.example.com Address: 192.168.2.101 [root@grac41 Desktop]# nslookup 192.168.1.101 101.1.168.192.in-addr.arpa name = grac41.example.com. [root@grac41 Desktop]# nslookup 192.168.2.101 101.2.168.192.in-addr.arpa name = grac41int.example.com. --> DNS and Network seems to be ok Restart CRS root@grac41 Desktop]# crsctl stop crs -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'grac41' CRS-2673: Attempting to stop 'ora.crf' on 'grac41' CRS-2673: Attempting to stop 'ora.ctssd' on 'grac41' CRS-2673: Attempting to stop 'ora.evmd' on 'grac41' ... CRS-2673: Attempting to stop 'ora.gpnpd' on 'grac41' CRS-2677: Stop of 'ora.gpnpd' on 'grac41' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'grac41' has completed CRS-4133: Oracle High Availability Services has been stopped. Cleanup /var/tmp/.oracle # rm /var/tmp/.oracle/* [root@grac41 Desktop]# crsctl start crs [root@grac41 Desktop]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4529: Cluster Synchronization Services is online CRS-4534: Cannot communicate with Event Manager --> Problem persists Check OS logfile # cat /var/log/messages --> Nothing related Run orcheck ( and orcdump ) to check whether we can access our OCR repostory [root@grac41 Desktop]# ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 4076 Available space (kbytes) : 258044 ID : 630679368 Device/File Name : +OCR Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded Query voting disk : [grid@grac41 grac41]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE b0e94e5d83054fe9bf58b6b98bfacd65 (/dev/asmdisk1_udev_sdf1) [OCR] 2. ONLINE 88c2a08b4c8c4f85bf0109e0990388e4 (/dev/asmdisk1_udev_sdg1) [OCR] 3. ONLINE 1108f9a41e814fb2bfed879ff0039dd0 (/dev/asmdisk1_udev_sdh1) [OCR] Located 3 voting disk(s). Debugging GIPCD and GPnPD daemons using strace As GIPCD and GPnPD daemon traces gets updated every 5s lets check the gipcd process with strace # ps -elf | egrep 'gpnpd.bin|gipcd.bin' # strace -t -f -p 24376 2>&1 | grep '192.168' | grep eth [pid 24872] 09:17:28 <... ioctl resumed> 200, {{"lo", {AF_INET, inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("10.0.2.15")}}, {"eth1", {AF_INET, inet_addr("192.168.2.101")}}, {"eth2", {AF_INET, inet_addr("192.168.1.101")}}, {"virbr0", {AF_INET, inet_addr("192.168.122.1")}}}}) = 0 [pid 24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth1", ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0 [pid 24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth1", ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0 [pid 24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth1", ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0 [pid 24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth2", ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0 [pid 24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth1", ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0 [pid 24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth2", ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0 [pid 24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth2", ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0 [pid 24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth2", ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0 .. [pid 24872] 09:17:33 <... ioctl resumed> 200, {{"lo", {AF_INET, inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("10.0.2.15")}}, {"eth1", {AF_INET, inet_addr("192.168.2.101")}}, {"eth2", {AF_INET, inet_addr("192.168.1.101")}}, {"virbr0", {AF_INET, inet_addr("192.168.122.1")}}}}) = 0 [pid 24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth1", ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0 [pid 24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth1", ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0 [pid 24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth2", ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0 [pid 24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth2", ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0 [pid 24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth1", ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0 [pid 24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth1", ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0 [pid 24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth2", ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0 .. --> Again we don't get an OS error but we are looping running the same ioctl() command Seems the kernel is not happy with the inforamtion we get from ioctl() call and tries to reread the information every 5 seconds Check GPnP profile [root@grac41 Desktop]# gpnptool get > profile.xml Edit profile.xml and extract the adapter usage <gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"> <gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/> <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/> Verify with ifconfig [root@grac41 Desktop]# ifconfig | egrep 'HWaddr|inet addr' eth1 Link encap:Ethernet HWaddr 08:00:27:89:E9:A2 inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0 eth2 Link encap:Ethernet HWaddr 08:00:27:6B:E2:BD inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0 inet addr:127.0.0.1 Mask:255.0.0.0 --> eth1 is using 192.168.2.101 but according GPnP Profile it should use 192.168.1.101 eth2 is using 192.168.1.101 but according GPnP Profile it should use 192.168.2.101 Problem found : During manuall editing ifcfg-eth1 and ifcfg-eth2 HWADR entry was wrongly filled ( /etc/sysconfig/network-scripts ) Reconfiguring/restart network and CRS [root@grac41 network-scripts]# cat ifcfg-eth2 HWADDR=08:00:27:89:E9:A2 IPADDR=192.168.2.101 NAME=eth2 [root@grac41 network-scripts]# cat ifcfg-eth1 IPADDR=192.168.1.101 NAME=eth1 HWADDR=08:00:27:6B:E2:BD After changing HWaddr to follow the above ifconfig output the network looks good [root@grac41 network-scripts] service network restart [root@grac41 network-scripts]# ifconfig | egrep 'HWaddr|inet addr' eth1 Link encap:Ethernet HWaddr 08:00:27:89:E9:A2 inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0 eth2 Link encap:Ethernet HWaddr 08:00:27:6B:E2:BD inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0 Restart CRS [root@grac41 network-scripts]# crsctl stop crs -f [root@grac41 network-scripts]# crsctl start crs [root@grac41 network-scripts]# crsctl check cluster -all ************************************************************** grac41: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** grac42: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** grac43: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** Lessons learned - Verify carefully that IP addresses and Network Device names are clusterwide consistent
Scenario 2: Filesystem full ( 12c )
[root@gract1 Desktop]# crsi ***** Local Resources: ***** Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.asm 1 ONLINE OFFLINE - STABLE ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE ora.crf 1 ONLINE OFFLINE - STABLE ora.crsd 1 ONLINE OFFLINE - STABLE ora.cssd 1 ONLINE OFFLINE - STABLE ora.cssdmonitor 1 OFFLINE OFFLINE - STABLE ora.ctssd 1 ONLINE OFFLINE - STABLE ora.diskmon 1 OFFLINE OFFLINE - STABLE ora.drivers.acfs 1 ONLINE ONLINE gract1 STABLE ora.evmd 1 ONLINE OFFLINE gract1 STARTING ora.gipcd 1 ONLINE OFFLINE - STABLE ora.gpnpd 1 ONLINE OFFLINE - STABLE ora.mdnsd 1 ONLINE OFFLINE gract1 STARTING ora.storage 1 ONLINE OFFLINE - STABLE Related client trace 2014-08-22 10:57:07.750: [ OCRMSG][2296473152]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2) 2014-08-22 10:57:07.750: [ OCRMSG][2296473152]GIPC error [29] msg [gipcretConnectionRefused] 2014-08-22 10:57:07.750: [ OCRMSG][2296473152]prom_connect: error while waiting for connection complete [24] 2014-08-22 10:57:07.821: [ OCRMSG][2296473152]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2) 2014-08-22 10:57:07.821: [ OCRMSG][2296473152]GIPC error [29] msg [gipcretConnectionRefused] 2014-08-22 10:57:07.821: [ OCRMSG][2296473152]prom_connect: error while waiting for connection complete [24] Root Cause : File system full : 100% - No traces can be written # df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_oel64-lv_root 39603624 37798864 0 100% / tmpfs 4194304 272 4194032 1% /dev/shm /dev/sda1 495844 101751 368493 22% /boot
Scenario 3 : Firwall ON
***** Cluster Resources: ***** Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.asm 1 ONLINE OFFLINE - STABLE ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE ora.crf 1 ONLINE OFFLINE - STABLE ora.crsd 1 ONLINE OFFLINE - STABLE ora.cssd 1 ONLINE OFFLINE - STABLE ora.cssdmonitor 1 ONLINE ONLINE gract2 STABLE ora.ctssd 1 ONLINE OFFLINE - STABLE ora.diskmon 1 OFFLINE OFFLINE - STABLE ora.drivers.acfs 1 ONLINE ONLINE gract2 STABLE ora.evmd 1 ONLINE INTERMEDIATE gract2 STABLE ora.gipcd 1 ONLINE ONLINE gract2 STABLE ora.gpnpd 1 ONLINE ONLINE gract2 STABLE ora.mdnsd 1 ONLINE ONLINE gract2 STABLE ora.storage 1 ONLINE OFFLINE - STABLE --> CSSD doesn't become ONLINE Client log : 014-08-23 11:49:21.920: [ OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused] 2014-08-23 11:49:42.948: [ OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused] 2014-08-23 11:50:10.978: [ OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused] 2014-08-23 11:50:46.008: [ OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused] 2014-08-23 11:51:28.042: [ OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused] 2014-08-23 11:51:28.042: [ OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused] 20665 <... connect resumed> ) = 0 20665 connect(66, {sa_family=AF_FILE, path="/var/tmp/.oracle/sOHASD_UI_SOCKET"}, 110 <unfinished ...> 20665 <... connect resumed> ) = 0 20665 connect(73, {sa_family=AF_FILE, path="/var/tmp/.oracle/sprocr_local_conn_0_PROC"}, 110 <unfinished ...> 20665 <... connect resumed> ) = -1 ECONNREFUSED (Connection refused) occsd.log : 2014-08-23 12:32:58.427: [ CSSD][1279260416]clssnmvDHBValidateNCopy: node 1, gract1, has a disk HB, but no network HB, DHB has rcfg 304252836, wrtcnt, 3207223, LATS 4294823390, lastSeqNo 3207220, uniqueness 1408783210, timestamp 1408789980/5988764 2014-08-23 12:32:58.427: [ CSSD][1283991296]clssnmvDHBValidateNCopy: node 1, gract1, has a disk HB, but no network HB, DHB has rcfg 304252836, wrtcnt, 3207224, LATS 4294823390, lastSeqNo 3207221, uniqueness 1408783210, timestamp 1408789980/5988864
- Fix : Disable Firewall
References
- Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (Doc ID 1210883.1)
- Grid Infrastructure Installation root.sh Failed with “Failed to start CTSS” (Doc ID 1277307.1)
- Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)
- Top 5 Grid Infrastructure Startup Issues (Doc ID 1368382.1)
You are my inspiration, I possess few web logs and rarely run out from to brand.