Overview
- Changing the PUBLIC interface in a RAC env is not that simple and you need to take into account
- Nameserver changes
- DHCP server changes including VIPs
- /etc/hosts changes
- GNS VIP changes
- PUBLIC interface changes
# oifcfg getif -> eth1 192.168.5.0 global public
- In any case you should read : How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)
If you still get problem the here some debugging details:
- Note this tutorial use 12.1.0.2 CW logfiles structure which simplifies using grep command
a lot as all traces can be found at: $GRID_HOME/diag/crs/hract21/crs/trace
- Download script crsi and run this script during booting you CRS stack with watch utility
This gives you a good idea what component is failing or gets restarted and finally switch
to status OFFLINE
- As said again and again cluvfy is your friend to quickly identify the root problem
- If the network adapter info in profile.xml doesn’t match the ifconfig data GIPCD will not start ( This is true for PUBLIC and CLUSTERINTERCONNECT info )
In this tutorial we will debug following scenarios by reading logfiles, running OS command and by running cluvfy:
- Case I : Nameserver not responding – GIPCD not starting
- Case II : Different IP address in /etc/hosts and NameServer Lookup – GIPCD not starting
- Case III : Wrong Cluster Interconnect Address – GIPCD not starting
- Case IV : DHCP server sends wrong IP address – VIPs not starting
- Case V : Wrong GNS VIP address – GNS not starting
Potential Errors and Error types
In generell we have 2 types of Network related error
- OS related errors ( either bind() or getaddrinfo() system call was failing )
- If you you want to find an GIPCD related errors around between 2015-02-03 12:00:00 and 2015-02-03 12:09:50 you may run : $ grep “2015-02-03 12:0″ * | grep ” slos “
- In this tutorial we handle bind() OS system calls but you may check your traces for:
send(),recv(), listen() and connect() system call failures too !
- Note – Only GIPCD errors prints OS errors with slos printout like : slos loc : getaddrinfo
- For other components like MDNSD daemon you may grep your CW traces
for error strings: “Address already in use” , “Error Connection timed out”, “Cannot assign requested address”
- Logical Errors
- Are not easy to debug as we need to read and understand the CW logs more in detail.
Error Details
Error I : Name Server related Errors – getaddrinfo () was failing
OS system call: getaddrinfo() is failing with errno 110: Error Connection timed out (110)
--> see Case I
Search all CW traces with TS 2015-02-03 09:20:00 --> 2015-02-03 09:29:59" for failed OS Call: getaddrinfo
[grid@hract21 trace]$ grep "2015-02-03 09:2" * | grep " getaddrinfo"
gipcd_2.trc:2015-02-03 09:20:09.946273 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos loc : getaddrinfo(
gipcd_2.trc:2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos loc : getaddrinfo
Error II : bind() fails as the local IP address is not avaiable on your system (verify with ifconfig )
OS system call: bind () is failing with errno 99 : Error: Cannot assign requested address (99)
--> see Case II,III
Search all CW traces with TS 2015-02-03 15:30:00 --> 2015-02-03 15:39:59" for failed OS Call: bind
[grid@hract21 trace]$ grep "2015-02-03 15:3" * | grep " bind"
gipcd_2.trc:2015-02-03 15:34:47.898380 :GIPCXCPT:2106038016: gipcmodNetworkProcessBind: slos loc : bind
gipcd_2.trc:2015-02-03 16:39:43.587972 :GIPCXCPT:1288218368: gipcmodNetworkProcessBind: slos loc : bind
--> If OS system call: bind () is failing with errno 98 Error : Address already in use (98)
please read :
Troubleshooting Clusterware and Clusterware component error : Address already in use
Error III: Logical Errros ( not related OS errors )
- Wrong DHCP Server response : see Case IV
- Wrong GNS Server address : see Case V
Case I: Nameserver not responding – GIPCD not starting
[root@hract21 Desktop]# watch crsi
***** Local Resources: *****
Resource NAME INST TARGET STATE SERVER STATE_DETAILS
--------------------------- ---- ------------ ------------ --------------- -----------------------------------------
ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.gipcd 1 ONLINE OFFLINE - STABLE
ora.gpnpd 1 ONLINE ONLINE hract21 STABLE
ora.mdnsd 1 ONLINE ONLINE hract21 STABLE
ora.storage 1 ONLINE OFFLINE - STABLE
--> ora.gipcd in state INTERMEDIATE/OFFLINE ora.evmd in state INTERMEDIATE
As GIPCD doesn't come up review tracefile : gipcd.trc
2015-02-03 09:20:14.952363 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos op : sgipcnPopulateAddrInfo
2015-02-03 09:20:14.952373 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos dep : Connection timed out (110)
2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos loc : getaddrinfo(
2015-02-03 09:20:14.952391 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos info: server not available,try again
2015-02-03 09:20:14.952455 :GIPCXCPT:2157598464: gipcResolveF [gipcInternalBind : gipcInternal.c : 537]: EXCEPTION[ ret gipcretFail (1) ] failed to resolve address 0x7f035c033c10 [0000000000000311] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x4000
2015-02-03 09:20:14.952486 :GIPCXCPT:2157598464: gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretFail (1) ] failed to bind endp 0x7f035c033070 [000000000000030f] { gipcEndpoint : localAddr 'tcp://hract21.example.com', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x40008000, flags-2 0x0, usrFlags 0x240a0 }, addr 0x7f035c034890 [0000000000000316] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x200a0
2015-02-03 09:20:14.952552 :GIPCXCPT:2157598464: gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretFail (1)
--> getaddrinfo() system all is failing -> Nameserver lookup issue
Verify Error with OS commands
[grid@hract21 trace]$ nslookup hract21
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
Verify Error with cluvfy
[grid@hract21 CLUVFY]$ cluvfy comp nodeapp -n hract21
PRVF-0002 : could not retrieve local node name
Fix -> Verify the Nameserver is up and running
1) Is your nameserver running ?
[root@ns1 ~]# service named status
version: 9.9.3-RedHat-9.9.3-P1.el6
CPUs found: 4
worker threads: 4
UDP listeners per interface: 4
number of zones: 101
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running
named (pid 9193) is running...
2) Can you ping your nameserver ?
[oracle@hract21 JAVA]$ ping ns1.example.com
PING ns1.example.com (192.168.5.50) 56(84) bytes of data.
64 bytes from ns1.example.com (192.168.5.50): icmp_seq=1 ttl=64 time=0.124 ms
64 bytes from ns1.example.com (192.168.5.50): icmp_seq=2 ttl=64 time=0.293 ms
3) Verify that nameserver is listening on required IP/Adress and Port
[root@ns1 ~]# netstat -auen | grep ":53 "
udp 0 0 192.168.5.50:53 0.0.0.0:* 25 56734
udp 0 0 127.0.0.1:53 0.0.0.0:* 25 56732
Case II : Different IP address in /etc/hosts and NameServer Lookup – GIPCD not starting
**** Local Resources: *****
Resource NAME INST TARGET STATE SERVER STATE_DETAILS
--------------------------- ---- ------------ ------------ --------------- -----------------------------------------
ora.asm 1 ONLINE OFFLINE - STABLE
ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE
ora.crf 1 ONLINE ONLINE hract21 STABLE
ora.crsd 1 ONLINE OFFLINE - STABLE
ora.cssd 1 ONLINE OFFLINE - STABLE
ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE
ora.ctssd 1 ONLINE OFFLINE - STABLE
ora.diskmon 1 ONLINE OFFLINE - STABLE
ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE
ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.gipcd 1 ONLINE OFFLINE - STABLE
ora.gpnpd 1 ONLINE ONLINE hract21 STABLE
ora.mdnsd 1 ONLINE ONLINE hract21 STABLE
ora.storage 1 ONLINE OFFLINE - STABLE
--> CSSD and GIPCD remains OFFLINE - switches STATE_DETAILS from STABLE to STARTING but doen't up
gipcd.trc:
2015-02-03 15:35:02.928327 :GIPCXCPT:937420544: gipcmodNetworkProcessBind: slos op : sgipcnTcpBind
2015-02-03 15:35:02.928333 :GIPCXCPT:937420544: gipcmodNetworkProcessBind: slos dep : Cannot assign requested address (99)
2015-02-03 15:35:02.928337 :GIPCXCPT:937420544: gipcmodNetworkProcessBind: slos loc : bind
2015-02-03 15:35:02.928342 :GIPCXCPT:937420544: gipcmodNetworkProcessBind: slos info: addr '192.168.6.121:0'
2015-02-03 15:35:02.928391 :GIPCXCPT:937420544: gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] failed to bind endp 0x7f4624027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.6.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7f4624033be0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7f4624033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020
2015-02-03 15:35:02.928405 :GIPCXCPT:937420544: gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39)
2015-02-03 15:35:02.928419 :GIPCXCPT:937420544: gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39)
2015-02-03 15:35:02.928429 :GIPCHDEM:937420544: gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] terminating daemon thread due to exception
2015-02-03 15:35:02.928455 :GIPCXCPT:1281627904: gipchaInternalRegister: daemon thread state invalid gipchaThreadStateFailed (5), ret gipcretFail (1)
2015-02-03 15:35:02.928477 :GIPCHGEN:1281627904: gipchaRegisterF [gipchaInternalResolve : gipchaInternal.c : 1204]: EXCEPTION[ ret gipcretFail (1) ] failed to register ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, name '(null)', flags 0x4000
2015-02-03 15:35:02.928544 :GIPCHGEN:1281627904: gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 863]: EXCEPTION[ ret gipcretFail (1) ] failed to resolve ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, host 'hract21', port 'gipcdha_hract21_', flags 0x0
2015-02-03 15:35:02.928569 :GIPCXCPT:1281627904: gipcInternalResolve: failed to resolve addr 0x7f4638099680 [000000000000016a] { gipcAddress : name 'gipcha://hract21:gipcdha_hract21_', objFlags 0x0, addrFlags 0x4 }, ret gipcretFail (1)
Verify Error with OS commands
[grid@hract21 trace]$ nslookup hract21
Server: 192.168.5.50
Address: 192.168.5.50#53
Name: hract21.example.com
Address: 192.168.5.121
[grid@hract21 trace]$ ping hract21
PING hract21 (192.168.6.121) 56(84) bytes of data.
--> Opps why to different results for nslookup and ping ?
Verify IP address from /etc/hosts
[grid@hract21 trace]$ grep hract21 /etc/hosts
192.168.6.121 hract21 hract21.example.com
Verify Error with cluvfy
[grid@hract21 CLUVFY]$ cluvfy comp nodereach -n hract21
Verifying node reachability
Checking node reachability...
PRVF-6006 : unable to reach the IP addresses "hract21" from the local node
PRKC-1071 : Nodes "hract21" did not respond to ping in "3" seconds,
PRKN-1035 : Host "hract21" is unreachable
Verification of node reachability was unsuccessful on all the specified nodes.
-> Fix : Keep your /etc/hosts and your Bind server in sync
When Changing Bind Server always verify the change in /etc/hosts too
Case III : Wrong Cluster Interconnect Address – GIPCD not starting
[root@hract21 Desktop]# watch crsi
***** Local Resources: *****
Resource NAME INST TARGET STATE SERVER STATE_DETAILS
--------------------------- ---- ------------ ------------ --------------- -----------------------------------------
ora.asm 1 ONLINE OFFLINE - STABLE
ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE
ora.crf 1 ONLINE ONLINE hract21 STABLE
ora.crsd 1 ONLINE OFFLINE - STABLE
ora.cssd 1 ONLINE OFFLINE hract21 STARTING
ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE
ora.ctssd 1 ONLINE OFFLINE - STABLE
ora.diskmon 1 ONLINE OFFLINE - STABLE
ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE
ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.gipcd 1 ONLINE OFFLINE - STABLE
ora.gpnpd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.mdnsd 1 ONLINE ONLINE hract21 STABLE
ora.storage 1 ONLINE OFFLINE - STABLE
--> GPNPD remains in status INTERMEDIATE GIPCD is in state OFFLINE
gipcd.trc:
2015-02-03 16:39:18.324221 :GIPCHDEM:20907776: gipchaDaemonThread: starting daemon thread hctx 0x22d39b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'df31173e-00000000', name2 02ff-37da-c08f-50b4, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xcd60 }
2015-02-03 16:39:23.327691 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc032310 [0000000000000308] { gipcAddress : name 'tcp://192.168.5.121', objFlags 0x0, addrFlags 0x5 }
2015-02-03 16:39:23.327721 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: slos op : sgipcnTcpBind
2015-02-03 16:39:23.327727 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: slos dep : Cannot assign requested address (99)
2015-02-03 16:39:23.327732 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: slos loc : bind
2015-02-03 16:39:23.327736 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: slos info: addr '192.168.5.121:0'
2015-02-03 16:39:23.327806 :GIPCXCPT:20907776: gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020
2015-02-03 16:39:23.327823 :GIPCXCPT:20907776: gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39)
2015-02-03 16:39:23.327838 :GIPCXCPT:20907776: gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39)
2015-02-03 16:39:23.327851 :GIPCHDEM:20907776: gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] terminating daemon thread due to exception
2015-02-03 16:39:23.327943 : GIPCNET:20907776: gipcmodNetworkUnprepare: failed to unprepare waits for endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x8, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x26008000, flags-2 0x0, usrFlags 0x20020 }
--> Here bind system call fails with errno 99 which mean this IP 192.168.5.121 address is not available yet !
[root@hract21 Desktop]# cat /usr/include/asm-generic/errno.h | grep 99
#define EADDRNOTAVAIL 99 /* Cannot assign requested address */
Verify Error with OS commands:
[root@hract21 Desktop]# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49
inet addr:192.168.6.121 Bcast:192.168.6.255 Mask:255.255.255.0
[root@hract21 Desktop]# ifconfig eth2
eth2 Link encap:Ethernet HWaddr 08:00:27:4E:C9:BF
inet addr:192.168.2.121 Bcast:192.168.2.255 Mask:255.255.255.0
[root@hract21 Desktop]# $GRID_HOME/bin/gpnptool get 2>/dev/null | xmllint --format - | egrep 'CSS-Profile|ASM-Profile|Network id'
<gpnp:HostNetwork id="gen" HostName="*">
<gpnp:Network id="net1" IP="192.168.5.0" Adapter="eth1" Use="public"/>
<gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="asm,cluster_interconnect"/>
<orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/>
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*" SPFile="+DATA/ract2/ASMPARAMETERFILE/registry.253.870352347" Mode="remote"/>
--> GPnPD expects PUBLIC interface eth1 to be bound on IP Adress 192.168.5.121 and not 192.168.6.121
Verify Error with cluvfy:
[grid@hract21 CLUVFY]$ cluvfy comp gpnp -n hract21
Verifying GPNP integrity
--> cluvfy comp gpnp hangs
Fix: Change interface eth1 back to 192.168.5.121 and reboot cluster stack
Case IV : DHCP server returns wrong IP address – VIPs not starting
- Multiple DHCP server
- DHCP server not available
Lower CRS stack starts
***** Local Resources: *****
Resource NAME INST TARGET STATE SERVER STATE_DETAILS
--------------------------- ---- ------------ ------------ --------------- -----------------------------------------
ora.asm 1 ONLINE ONLINE hract21 STABLE
ora.cluster_interconnect.haip 1 ONLINE ONLINE hract21 STABLE
ora.crf 1 ONLINE ONLINE hract21 STABLE
ora.crsd 1 ONLINE ONLINE hract21 STABLE
ora.cssd 1 ONLINE ONLINE hract21 STABLE
ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE
ora.ctssd 1 ONLINE ONLINE hract21 OBSERVER,STABLE
ora.diskmon 1 OFFLINE OFFLINE - STABLE
ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE
ora.evmd 1 ONLINE ONLINE hract21 STABLE
ora.gipcd 1 ONLINE ONLINE hract21 STABLE
ora.gpnpd 1 ONLINE ONLINE hract21 STABLE
ora.mdnsd 1 ONLINE ONLINE hract21 STABLE
ora.storage 1 ONLINE ONLINE hract21 STABLE
--> Lower CRS stack is up and running
Vips are in state STARTING
ora.hract21.vip 1 ONLINE OFFLINE hract21 STARTING
ora.hract22.vip 1 ONLINE ONLINE hract22 STABLE
ora.hract23.vip 1 ONLINE ONLINE hract23 STABLE
ora.mgmtdb 1 ONLINE ONLINE hract23 Open,STABLE
ora.oc4j 1 ONLINE ONLINE hract22 STABLE
ora.scan1.vip 1 ONLINE OFFLINE hract21 STARTING
crsd_orarootagent_root.trc
2015-02-03 12:06:42.065910 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP client id = hract21-vip
2015-02-03 12:06:42.065929 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP Server Port = 67
2015-02-03 12:06:42.065940 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet from = 192.168.5.121
2015-02-03 12:06:42.065949 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet to = 255.255.255.255
2015-02-03 12:06:47.068966 :GIPCXCPT:2822174464: gipcWaitF [clsdhcp_sendmessage : clsdhcp.c : 616]:
EXCEPTION[ ret (uknown) (910) ] failed to wait on obj 0x7fcb8c04d770 [0000000000000ddf]
{ gipcEndpoint : localAddr 'udp://0.0.0.0:68', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0,
objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7fcb8c037e70, sendp 0x7fcb8c037cb0 status 13flags 0x20000002, flags-2 0x0, usrFlags 0x8000 }, reqList 0x7fcba8364658, nreq 1, creq 0x7fcba8364b20 timeout 5000 ms, flags 0x4000
--> After sending an DHCP request - we fail in gipcWaitF which means we have some troubles to contact our DHCP server
or getting the reqired DHCP address
Verify Error with OS commands
Download and Install dhcping:
Download location: http://pkgs.repoforge.org/dhcping following package : dhcping-1.2-2.2.el6.rf.x86_64.rpm
[root@hract21 Desktop]# rpm -i /media/sf_kits/Linux/dhcping-1.2-2.2.el6.rf.x86_64.rpm
[root@hract21 Desktop]# dhcping -i eth1
Got answer from: 192.168.3.50
received from 192.168.3.50, expected from 0.0.0.0
Got answer from: 192.168.3.50
received from 192.168.3.50, expected from 0.0.0.0
no answer
--> Here we see that we get a wrong DHCP address
[root@ns1 dhcp]# dhcping -h 08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199
no answer
--> This confirms that our DHCP server is running on wrong IP addess ( 192.168.3.50 ) and
can server an DHCP request for a s 192.168.5.xx address
Working dhcping output - just for reference :
[root@hract21 Desktop]# dhcping -h 08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199
Got answer from: 192.168.5.50
Verify Error with cluvfy commands
[root@hract21 CLUVFY]# cluvfy comp dhcp -clustername ract2 -verbose
Verifying DHCP Check
Checking if any DHCP server exists on the network...
Checking if network CRS resource is configured and online
Network CRS resource is offline or not configured. Proceeding with DHCP checks.
PRVG-5726 : Failed to discover DHCP servers on public network listening on port "67" using command "/u01/app/121/grid/bin/crsctl discover dhcp -clientid ract2-scan1-vip "
CRS-10010: unable to discover DHCP server in the network listening on port 67 for client ID ract2-scan1-vip
CRS-4000: Command discover failed, or completed with errors.
PRVF-5704 : No DHCP server were discovered on the public network listening on port 67
Verification of DHCP Check was unsuccessful on all the specified nodes.
Additonal info about DHCP setup
- I always look at /etc/dhcpd.conf wich is wrong - use /etc/dhcp/dhcpd.conf file instead !
- Note if changing /etc/dhcpd.conf you may need change /etc/sysconfig/dhcpd
DHCP config files:
/etc/dhcp/dhcpd.conf
/etc/sysconfig/dhcpd
Case V : Wrong GNS VIP address – GNS not starting
[root@hract21 network-scripts]# watch 'crs | grep gns'
ora.gns 1 ONLINE OFFLINE - STABLE
ora.gns.vip 1 ONLINE ONLINE hract21 STABLE
-> GNS VIP is ONLINE but GNS doesn't sart
gnsd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
CLSB:489064000: Argument count (argc) for this daemon is 7
CLSB:489064000: Argument 0 is: /u01/app/121/grid/bin/gnsd.bin
CLSB:489064000: Argument 1 is: -trace-level
CLSB:489064000: Argument 2 is: 1
CLSB:489064000: Argument 3 is: -ip-address
CLSB:489064000: Argument 4 is: 192.168.6.58
CLSB:489064000: Argument 5 is: -startup-endpoint
CLSB:489064000: Argument 6 is: ipc://GNS_hract21_4625_9fe54b1833d5fbd2
2015-02-03 17:29:15.339039 : CLSNS:489064000: main::clsns_SetTraceLevel:trace level set to 1.
2015-02-03 17:29:16.226261 : GNS:489064000: main::clsgndmain: ##########################################
2015-02-03 17:29:16.226283 : GNS:489064000: main::clsgndmain: GNS starting on hract21. Process ID: 29196
2015-02-03 17:29:16.226299 : GNS:489064000: main::clsgndmain: ##########################################
2015-02-03 17:29:16.226338 : GNS:489064000: main::clsgnSetTraceLevel: trace level set to 1.
..
2015-02-03 17:29:17.490335 : GNS:489064000: main::clsgndGetInstanceInfo: version: 12.1.0.2.0 (0xc100200)
endpoints: tcp://192.168.6.58:63806 process ID: "29196" state: "Initializing".
2015-02-03 17:29:17.491219 : GNS:489064000: main::clsgndadvAdvertise: Listening for commands on endpoint(s): tcp://192.168.6.58:63806.
2015-02-03 17:29:17.496441 : GNS:349841152: Resolve::clsgndnsCreateContainerCallback: listening on port 53 address "192.168.6.58"
2015-02-03 17:29:17.499552 : CLSDMT:351942400: PID for the Process [29196], connkey 12
2015-02-03 17:29:17.505626 : GNS:343537408: Command #0::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.512072 : GNS:4160747264: Command #1::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.516675 : GNS:4156544768: Command #2::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.518326 : GNS:4154443520: Command #3::clsgndcpRunProcessor: Waiting for client command
2015-02-03 17:29:17.747693 : GNS:4152342272: Self-check::clsgndscRun: Name: "GNSTESTHOST.grid12c.example.com" Address: 1.2.3.4.
2015-02-03 17:29:53.882538 : GNS:351942400: main::clsgndCLSDMExit: CLSDM request to quit received - requester: agent.
2015-02-03 17:29:53.882610 : GNS:351942400: main::clsgndCLSDMExit: terminating GNSD on behalf of CLSDM - requester: agent.
--> Here we have some troubles as GNS was terminated
crsd_orarootagent_root.trc:
2015-02-03 17:29:24.470729 : CLSNS:292816640: main::clsnsgFind:(:CLSNS00230:):query to find
GNS using service name "_Oracle-GNS._tcp" failed.: 1: clskec:has:CLSNS:5 3 args[has:CLSNS:5][mod=clsns_DNSSD_FindServers][loc=(:CLSNS00152:)]
2015-02-03 17:29:24.470771 :
GNS:292816640: main::clsgnctrGetGNSAddressUsingCLSNS: (:CLSGN01053:) GNS address retrieval failed with
error CLSNS-00025 (GNS_SERV_FIND_FAIL) - throwing CLSGN-00070. 1: clskec:has:CLSNS:25 3 args[has:CLSNS:25][mod=clsnsgFind][loc=(:CLSNS00216:)]
Verify Error with OS commands:
Check GNS and PUBLIC network interface
[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.6.58
Domain served by GNS: grid12c.example.com
Check the PUBLIC network interface
[root@hract21 network-scripts]# ifconfig
eth1:1 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49
inet addr:192.168.5.156 Bcast:192.168.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1:2 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49
inet addr:192.168.5.157 Bcast:192.168.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1:3 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49
inet addr:192.168.5.153 Bcast:192.168.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1:4 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49
inet addr:192.168.5.151 Bcast:192.168.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1:5 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49
inet addr:192.168.5.152 Bcast:192.168.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1:6 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49
inet addr:192.168.6.58 Bcast:192.168.6.255 Mask:255.255.255.0
--> VIPs are using 192.168.5.X as base address whereas our GNS VIP is using: 192.168.6.58
This is not correct VIPs a GNS VIP should have the same Network address !
[root@hract21 Desktop]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.6.58
Domain served by GNS: grid12c.example.com
Let's investigate whether somebody changed the GNS base add
[grid@hract21 trace]$ grep clsgndadvAdvertise gnsd.trc
Lets check wether the GNS base address was changed :
2015-02-02 12:32:09.447471 : GNS:3141969472: main::clsgndadvAdvertise:
Listening for commands on endpoint(s): tcp://192.168.5.58:46453.
2015-02-03 17:22:00.410829 : GNS:4114409024: main::clsgndadvAdvertise:
Listening for commands on endpoint(s): tcp://192.168.5.58:25702.
2015-02-03 17:24:51.165609 : GNS:2221307456: main::clsgndadvAdvertise:
Listening for commands on endpoint(s):tcp://192.168.6.58:27105.
2015-02-03 17:29:17.491219 : GNS:489064000: main::clsgndadvAdvertise:
Listening for commands on endpoint(s): tcp://192.168.6.58:63806.
--> GNS base address was changed from 192.168.5.58 to 192.168.6.58 !
Verify Error with cluvy
[grid@hract21 CLUVFY]$ cluvfy comp gns -postcrsinst -verbose
Verifying GNS integrity
Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "grid12c.example.com" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
PRVF-5213 : GNS resource configuration check failed
PRCI-1156 : The GNS VIP 192.168.6.58 does not match any of the available subnets 192.168.5.0, 192.168.2.0.
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.6.58" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "grid12c.example.com" are reachable
WARNING:
PRVF-5218 : "hract21-vip.grid12c.example.com" did not resolve into any IP address
PRVF-5827 : The response time for name lookup for name "hract21-vip.grid12c.example.com" exceeded 15 seconds
Checking status of GNS resource...
Node Running? Enabled?
------------ ------------------------ ------------------------
hract21 no yes
hract22 no yes
hract23 no yes
PRVF-5211 : GNS resource is not running on any node of the cluster
Checking status of GNS VIP resource...
Node Running? Enabled?
------------ ------------------------ ------------------------
hract21 yes yes
hract22 no yes
hract23 no yes
GNS integrity check failed
Verification of GNS integrity was unsuccessful.
Checks did not pass for the following node(s):
hract21
--> Cluvfy is very helpfull here as cluvfy compares the network adresses with the GNS address
If GNS and network addresses don't match cluvfy throws PRVF-5213, PRCI-1156 error.
Fix -> Change GNS VIP back to the original address and restart GNS
[root@hract21 network-scripts]# srvctl modify gns -vip 192.168.5.58
[root@hract21 network-scripts]# srvctl config gns
GNS is enabled.
GNS VIP addresses: 192.168.5.58
Domain served by GNS: grid12c.example.com
[root@hract21 network-scripts]# srvctl start gns
[root@hract21 network-scripts]# srvctl config gns -a -l
GNS is enabled.
GNS is listening for DNS server requests on port 53
GNS is using port 5353 to connect to mDNS
GNS status: OK
Domain served by GNS: grid12c.example.com
GNS version: 12.1.0.2.0
Globally unique identifier of the cluster where GNS is running: 3d7c30fc9a0eeff3ff12b79970a14c12
Name of the cluster where GNS is running: ract2
Cluster type: server.
GNS log level: 1.
GNS listening addresses: tcp://192.168.5.58:30218.
GNS is individually enabled on nodes:
GNS is individually disabled on nodes:
Reference