Case V : GIPCD doesn’t start – mismatch between profile.xml and the PUBLIC INTERFACE address
Potential problem:
- /etc/hosts and nslookup not in sync
- PUBLIC interfase was changed without changing profile.xml
- DNS returned a wrong host address
Force that error and monitor Clusterware Resource status after startup: ***** Local Resources: ***** Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.asm 1 ONLINE OFFLINE - STABLE ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE ora.crf 1 ONLINE ONLINE hract21 STABLE ora.crsd 1 ONLINE OFFLINE - STABLE ora.cssd 1 ONLINE OFFLINE - STABLE ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE ora.ctssd 1 ONLINE OFFLINE - STABLE ora.diskmon 1 ONLINE OFFLINE - STABLE ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE ora.gipcd 1 ONLINE OFFLINE - STABLE ora.gpnpd 1 ONLINE ONLINE hract21 STABLE ora.mdnsd 1 ONLINE ONLINE hract21 STABLE ora.storage 1 ONLINE OFFLINE - STABLE --> GIPCS doesn't start CLUVFY: [grid@hract21 CLUVFY]$ cluvfy comp nodecon -n hract21,hract22 Verifying node connectivity ERROR: PRVF-6006 : unable to reach the IP addresses "hract21,hract22" from the local node PRKC-1071 : Nodes "hract21,hract22" did not respond to ping in "3" seconds, PRKN-1035 : Host "hract21" is unreachable PRKN-1035 : Host "hract22" is unreachable Verification cannot proceed Verification of node connectivity was unsuccessful on all the specified nodes. TRACEFILE review : gipcd.trc: 2015-02-17 11:48:39.300878 :GIPCXCPT:3369244416: gipcmodNetworkProcessBind: slos op : sgipcnTcpBind 2015-02-17 11:48:39.300880 :GIPCXCPT:3369244416: gipcmodNetworkProcessBind: slos dep : Cannot assign requested address (99) 2015-02-17 11:48:39.300882 :GIPCXCPT:3369244416: gipcmodNetworkProcessBind: slos loc : bind 2015-02-17 11:48:39.300884 :GIPCXCPT:3369244416: gipcmodNetworkProcessBind: slos info: addr '192.168.7.121:0' 2015-02-17 11:48:39.300920 :GIPCXCPT:3369244416: gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] failed to bind endp 0x7fb6a4027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.7.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fb6a4033bd0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fb6a4033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020 2015-02-17 11:48:39.300928 :GIPCXCPT:3369244416: gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39) Grep Command : [grid@hract21 trace]$ grep "2015-02-17 11:4" * | egrep 'gipcmodNetworkProcessBind' gipcd.trc:2015-02-17 11:48:38.129278 :GIPCXCPT:2967607040: gipcmodNetworkProcessBind: slos op : sgipcnTcpBind gipcd.trc:2015-02-17 11:48:38.129280 :GIPCXCPT:2967607040: gipcmodNetworkProcessBind: slos dep : Cannot assign requested address (99) gipcd.trc:2015-02-17 11:48:38.129281 :GIPCXCPT:2967607040: gipcmodNetworkProcessBind: slos loc : bind gipcd.trc:2015-02-17 11:48:38.129283 :GIPCXCPT:2967607040: gipcmodNetworkProcessBind: slos info: addr '192.168.7.121:0' --> Grep comamnd is quite useful ! DTRACE SCRIPT : /* Generic DTRACE script tracking IP-Address and ports for bind() system calls: */ syscall::bind:entry { self->fd = arg0; self->sockaddr = arg1; sockaddrp =(struct sockaddr *)copyin(self->sockaddr, sizeof(struct sockaddr)); s = (char * )sockaddrp; self->port = ( unsigned short )(*(s+3)) + ( unsigned short ) ((*(s+2)*256)); self->ip1=*(s+4); self->ip2=*(s+5); self->ip3=*(s+6); self->ip4=*(s+7); } /* Generic DTRACE script tracking failed bind() system calls: */ syscall::bind:return /arg0<0 && execname != "crsctl.bin"/ { printf("- Exec: %s - PID: %d bind() failed with error : %d - fd : %d - IP: %d.%d.%d.%d - Port: %d " , execname, pid, arg0, self->fd, self->ip1, self->ip2, self->ip3, self->ip4, self->port ); } DTRACE OUTPUT : [root@hract21 DTRACE]# dtrace -s check_rac.d dtrace: script 'check_rac.d' matched 21 probes CPU ID FUNCTION:NAME 0 1 :BEGIN GRIDHOME: /u01/app/121/grid - GRIDHOME/bin: /u01/app/121/grid/bin - Temp Loc: /var/tmp/.oracle - PIDFILE: hract21.pid - Port for bind: 53 0 9 open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir: /var/tmp/.oracle 0 9 open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir: /var/tmp/.oracle 0 89 connect:return - Exec: mdnsd.bin - PID: 26518 connect() failed with error : -101 - fd : 39 - IP: 17.17.17.17 - Port: 256 0 103 bind:return - Exec: gipcd.bin - PID: 26658 bind() failed with error : -99 - fd : 87 - IP: 192.168.7.121 - Port: 0 0 103 bind:return - Exec: gipcd.bin - PID: 26696 bind() failed with error : -99 - fd : 87 - IP: 192.168.7.121 - Port: 0 0 103 bind:return - Exec: gipcd.bin - PID: 26722 bind() failed with error : -99 - fd : 87 - IP: 192.168.7.121 - Port: 0 0 103 bind:return - Exec: gipcd.bin - PID: 26740 bind() failed with error : -99 - fd : 87 - IP: 192.168.7.121 - Port: 0 0 103 bind:return - Exec: gipcd.bin - PID: 26757 bind() failed with error : -99 - fd : 87 - IP: 192.168.7.121 - Port: 0 Investigate & Fix : Check profile.xml [root@hract21 network-scripts]# $GRID_HOME/bin/gpnptool get 2>/dev/null | xmllint --format - | egrep 'CSS-Profile|ASM-Profile|Network id' <gpnp:HostNetwork id="gen" HostName="*"> <gpnp:Network id="net1" IP="192.168.5.0" Adapter="eth1" Use="public"/> <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="asm,cluster_interconnect"/> <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/> <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*" SPFile="+DATA/ract2/ASMPARAMETERFILE/registry.253.870352347" Mode="remote"/> -> eth1 is our PUBLIC network interface - with 192.168.5.0 as the related NETWORK address [root@hract21 Desktop]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49 inet addr:192.168.5.121 Bcast:192.168.5.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe7d:8e49/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --> ifconfig looks good but why is gipcd.bin picking up 192.168.7.121 ? [root@hract21 Desktop]# ping hract21 PING hract21 (192.168.7.121) 56(84) bytes of data. --> ping uses wrong address too and hangs [root@hract21 Desktop]# grep hract21 /etc/hosts 192.168.7.121 hract21 hract21.example.com FIX --> Modify hostname entry in /etc/hosts
Many thx
This is very helpful
I really like looking through an article that can make people think.
Also, thank you for permitting me to comment!