Table of Contents
Overview
- Changing the PUBLIC interface in a RAC env is not that simple and you need to take into account
- Nameserver changes
- DHCP server changes including VIPs
- /etc/hosts changes
- GNS VIP changes
- PUBLIC interface changes
# oifcfg getif -> eth1 192.168.5.0 global public
- In any case you should read : How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)
If you still get problem the here some debugging details:
- Note this tutorial use 12.1.0.2 CW logfiles structure which simplifies using grep command
a lot as all traces can be found at: $GRID_HOME/diag/crs/hract21/crs/trace - Download script crsi and run this script during booting you CRS stack with watch utility
This gives you a good idea what component is failing or gets restarted and finally switch
to status OFFLINE - As said again and again cluvfy is your friend to quickly identify the root problem
- If the network adapter info in profile.xml doesn’t match the ifconfig data GIPCD will not start ( This is true for PUBLIC and CLUSTERINTERCONNECT info )
In this tutorial we will debug following scenarios by reading logfiles, running OS command and by running cluvfy:
- Case I : Nameserver not responding – GIPCD not starting
- Case II : Different IP address in /etc/hosts and NameServer Lookup – GIPCD not starting
- Case III : Wrong Cluster Interconnect Address – GIPCD not starting
- Case IV : DHCP server sends wrong IP address – VIPs not starting
- Case V : Wrong GNS VIP address – GNS not starting
Potential Errors and Error types
In generell we have 2 types of Network related error
- OS related errors ( either bind() or getaddrinfo() system call was failing )
- If you you want to find an GIPCD related errors around between 2015-02-03 12:00:00 and 2015-02-03 12:09:50 you may run : $ grep “2015-02-03 12:0″ * | grep ” slos “
- In this tutorial we handle bind() OS system calls but you may check your traces for:
send(),recv(), listen() and connect() system call failures too ! - Note – Only GIPCD errors prints OS errors with slos printout like : slos loc : getaddrinfo
- For other components like MDNSD daemon you may grep your CW traces
for error strings: “Address already in use” , “Error Connection timed out”, “Cannot assign requested address”
- Logical Errors
- Are not easy to debug as we need to read and understand the CW logs more in detail.
- Are not easy to debug as we need to read and understand the CW logs more in detail.
Error Details
Error I : Name Server related Errors – getaddrinfo () was failing
OS system call: getaddrinfo() is failing with errno 110: Error Connection timed out (110) --> see Case I Search all CW traces with TS 2015-02-03 09:20:00 --> 2015-02-03 09:29:59" for failed OS Call: getaddrinfo [grid@hract21 trace]$ grep "2015-02-03 09:2" * | grep " getaddrinfo" gipcd_2.trc:2015-02-03 09:20:09.946273 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos loc : getaddrinfo( gipcd_2.trc:2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos loc : getaddrinfo
Error II : bind() fails as the local IP address is not avaiable on your system (verify with ifconfig )
OS system call: bind () is failing with errno 99 : Error: Cannot assign requested address (99) --> see Case II,III Search all CW traces with TS 2015-02-03 15:30:00 --> 2015-02-03 15:39:59" for failed OS Call: bind [grid@hract21 trace]$ grep "2015-02-03 15:3" * | grep " bind" gipcd_2.trc:2015-02-03 15:34:47.898380 :GIPCXCPT:2106038016: gipcmodNetworkProcessBind: slos loc : bind gipcd_2.trc:2015-02-03 16:39:43.587972 :GIPCXCPT:1288218368: gipcmodNetworkProcessBind: slos loc : bind --> If OS system call: bind () is failing with errno 98 Error : Address already in use (98) please read : Troubleshooting Clusterware and Clusterware component error : Address already in use
Error III: Logical Errros ( not related OS errors )
- Wrong DHCP Server response : see Case IV
- Wrong GNS Server address : see Case V
Case I: Nameserver not responding – GIPCD not starting
[root@hract21 Desktop]# watch crsi ***** Local Resources: ***** Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE ora.gipcd 1 ONLINE OFFLINE - STABLE ora.gpnpd 1 ONLINE ONLINE hract21 STABLE ora.mdnsd 1 ONLINE ONLINE hract21 STABLE ora.storage 1 ONLINE OFFLINE - STABLE --> ora.gipcd in state INTERMEDIATE/OFFLINE ora.evmd in state INTERMEDIATE As GIPCD doesn't come up review tracefile : gipcd.trc 2015-02-03 09:20:14.952363 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos op : sgipcnPopulateAddrInfo 2015-02-03 09:20:14.952373 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos dep : Connection timed out (110) 2015-02-03 09:20:14.952381 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos loc : getaddrinfo( 2015-02-03 09:20:14.952391 :GIPCXCPT:2157598464: gipcmodNetworkResolve: slos info: server not available,try again 2015-02-03 09:20:14.952455 :GIPCXCPT:2157598464: gipcResolveF [gipcInternalBind : gipcInternal.c : 537]: EXCEPTION[ ret gipcretFail (1) ] failed to resolve address 0x7f035c033c10 [0000000000000311] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x4000 2015-02-03 09:20:14.952486 :GIPCXCPT:2157598464: gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretFail (1) ] failed to bind endp 0x7f035c033070 [000000000000030f] { gipcEndpoint : localAddr 'tcp://hract21.example.com', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x40008000, flags-2 0x0, usrFlags 0x240a0 }, addr 0x7f035c034890 [0000000000000316] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x8 }, flags 0x200a0 2015-02-03 09:20:14.952552 :GIPCXCPT:2157598464: gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretFail (1) --> getaddrinfo() system all is failing -> Nameserver lookup issue Verify Error with OS commands [grid@hract21 trace]$ nslookup hract21 ;; connection timed out; trying next origin ;; connection timed out; trying next origin ;; connection timed out; no servers could be reached Verify Error with cluvfy [grid@hract21 CLUVFY]$ cluvfy comp nodeapp -n hract21 PRVF-0002 : could not retrieve local node name Fix -> Verify the Nameserver is up and running 1) Is your nameserver running ? [root@ns1 ~]# service named status version: 9.9.3-RedHat-9.9.3-P1.el6 CPUs found: 4 worker threads: 4 UDP listeners per interface: 4 number of zones: 101 debug level: 0 xfers running: 0 xfers deferred: 0 soa queries in progress: 0 query logging is OFF recursive clients: 0/0/1000 tcp clients: 0/100 server is up and running named (pid 9193) is running... 2) Can you ping your nameserver ? [oracle@hract21 JAVA]$ ping ns1.example.com PING ns1.example.com (192.168.5.50) 56(84) bytes of data. 64 bytes from ns1.example.com (192.168.5.50): icmp_seq=1 ttl=64 time=0.124 ms 64 bytes from ns1.example.com (192.168.5.50): icmp_seq=2 ttl=64 time=0.293 ms 3) Verify that nameserver is listening on required IP/Adress and Port [root@ns1 ~]# netstat -auen | grep ":53 " udp 0 0 192.168.5.50:53 0.0.0.0:* 25 56734 udp 0 0 127.0.0.1:53 0.0.0.0:* 25 56732
Case II : Different IP address in /etc/hosts and NameServer Lookup – GIPCD not starting
**** Local Resources: ***** Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.asm 1 ONLINE OFFLINE - STABLE ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE ora.crf 1 ONLINE ONLINE hract21 STABLE ora.crsd 1 ONLINE OFFLINE - STABLE ora.cssd 1 ONLINE OFFLINE - STABLE ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE ora.ctssd 1 ONLINE OFFLINE - STABLE ora.diskmon 1 ONLINE OFFLINE - STABLE ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE ora.gipcd 1 ONLINE OFFLINE - STABLE ora.gpnpd 1 ONLINE ONLINE hract21 STABLE ora.mdnsd 1 ONLINE ONLINE hract21 STABLE ora.storage 1 ONLINE OFFLINE - STABLE --> CSSD and GIPCD remains OFFLINE - switches STATE_DETAILS from STABLE to STARTING but doen't up gipcd.trc: 2015-02-03 15:35:02.928327 :GIPCXCPT:937420544: gipcmodNetworkProcessBind: slos op : sgipcnTcpBind 2015-02-03 15:35:02.928333 :GIPCXCPT:937420544: gipcmodNetworkProcessBind: slos dep : Cannot assign requested address (99) 2015-02-03 15:35:02.928337 :GIPCXCPT:937420544: gipcmodNetworkProcessBind: slos loc : bind 2015-02-03 15:35:02.928342 :GIPCXCPT:937420544: gipcmodNetworkProcessBind: slos info: addr '192.168.6.121:0' 2015-02-03 15:35:02.928391 :GIPCXCPT:937420544: gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] failed to bind endp 0x7f4624027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.6.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7f4624033be0 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7f4624033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020 2015-02-03 15:35:02.928405 :GIPCXCPT:937420544: gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39) 2015-02-03 15:35:02.928419 :GIPCXCPT:937420544: gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39) 2015-02-03 15:35:02.928429 :GIPCHDEM:937420544: gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] terminating daemon thread due to exception 2015-02-03 15:35:02.928455 :GIPCXCPT:1281627904: gipchaInternalRegister: daemon thread state invalid gipchaThreadStateFailed (5), ret gipcretFail (1) 2015-02-03 15:35:02.928477 :GIPCHGEN:1281627904: gipchaRegisterF [gipchaInternalResolve : gipchaInternal.c : 1204]: EXCEPTION[ ret gipcretFail (1) ] failed to register ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, name '(null)', flags 0x4000 2015-02-03 15:35:02.928544 :GIPCHGEN:1281627904: gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 863]: EXCEPTION[ ret gipcretFail (1) ] failed to resolve ctx 0xfd09b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'a94decf7-00000000', name2 5132-2561-c03c-e03e, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xd68 }, host 'hract21', port 'gipcdha_hract21_', flags 0x0 2015-02-03 15:35:02.928569 :GIPCXCPT:1281627904: gipcInternalResolve: failed to resolve addr 0x7f4638099680 [000000000000016a] { gipcAddress : name 'gipcha://hract21:gipcdha_hract21_', objFlags 0x0, addrFlags 0x4 }, ret gipcretFail (1) Verify Error with OS commands [grid@hract21 trace]$ nslookup hract21 Server: 192.168.5.50 Address: 192.168.5.50#53 Name: hract21.example.com Address: 192.168.5.121 [grid@hract21 trace]$ ping hract21 PING hract21 (192.168.6.121) 56(84) bytes of data. --> Opps why to different results for nslookup and ping ? Verify IP address from /etc/hosts [grid@hract21 trace]$ grep hract21 /etc/hosts 192.168.6.121 hract21 hract21.example.com Verify Error with cluvfy [grid@hract21 CLUVFY]$ cluvfy comp nodereach -n hract21 Verifying node reachability Checking node reachability... PRVF-6006 : unable to reach the IP addresses "hract21" from the local node PRKC-1071 : Nodes "hract21" did not respond to ping in "3" seconds, PRKN-1035 : Host "hract21" is unreachable Verification of node reachability was unsuccessful on all the specified nodes. -> Fix : Keep your /etc/hosts and your Bind server in sync When Changing Bind Server always verify the change in /etc/hosts too
Case III : Wrong Cluster Interconnect Address – GIPCD not starting
[root@hract21 Desktop]# watch crsi ***** Local Resources: ***** Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.asm 1 ONLINE OFFLINE - STABLE ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE ora.crf 1 ONLINE ONLINE hract21 STABLE ora.crsd 1 ONLINE OFFLINE - STABLE ora.cssd 1 ONLINE OFFLINE hract21 STARTING ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE ora.ctssd 1 ONLINE OFFLINE - STABLE ora.diskmon 1 ONLINE OFFLINE - STABLE ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE ora.gipcd 1 ONLINE OFFLINE - STABLE ora.gpnpd 1 ONLINE INTERMEDIATE hract21 STABLE ora.mdnsd 1 ONLINE ONLINE hract21 STABLE ora.storage 1 ONLINE OFFLINE - STABLE --> GPNPD remains in status INTERMEDIATE GIPCD is in state OFFLINE gipcd.trc: 2015-02-03 16:39:18.324221 :GIPCHDEM:20907776: gipchaDaemonThread: starting daemon thread hctx 0x22d39b0 [0000000000000011] { gipchaContext : host 'hract21', name 'gipcd_ha_name', luid 'df31173e-00000000', name2 02ff-37da-c08f-50b4, numNode 0, numInf 0, maxPriority 0, clientMode 1, nodeIncarnation 00000000-00000000 usrFlags 0x0, flags 0xcd60 } 2015-02-03 16:39:23.327691 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc032310 [0000000000000308] { gipcAddress : name 'tcp://192.168.5.121', objFlags 0x0, addrFlags 0x5 } 2015-02-03 16:39:23.327721 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: slos op : sgipcnTcpBind 2015-02-03 16:39:23.327727 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: slos dep : Cannot assign requested address (99) 2015-02-03 16:39:23.327732 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: slos loc : bind 2015-02-03 16:39:23.327736 :GIPCXCPT:20907776: gipcmodNetworkProcessBind: slos info: addr '192.168.5.121:0' 2015-02-03 16:39:23.327806 :GIPCXCPT:20907776: gipcBindF [gipcInternalEndpoint : gipcInternal.c : 468]: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] failed to bind endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x20008000, flags-2 0x0, usrFlags 0x20020 }, addr 0x7fa3dc033070 [000000000000030d] { gipcAddress : name 'tcp://hract21.example.com', objFlags 0x0, addrFlags 0x4 }, flags 0x20020 2015-02-03 16:39:23.327823 :GIPCXCPT:20907776: gipcInternalEndpoint: failed to bind address to endpoint name 'tcp://hract21.example.com', ret gipcretAddressNotAvailable (39) 2015-02-03 16:39:23.327838 :GIPCXCPT:20907776: gipchaDaemonThread: gipcEndpointPtr failed (tcp://hract21.example.com), ret gipcretAddressNotAvailable (39) 2015-02-03 16:39:23.327851 :GIPCHDEM:20907776: gipchaDaemonThreadEntry: EXCEPTION[ ret gipcretAddressNotAvailable (39) ] terminating daemon thread due to exception 2015-02-03 16:39:23.327943 : GIPCNET:20907776: gipcmodNetworkUnprepare: failed to unprepare waits for endp 0x7fa3dc027990 [0000000000000306] { gipcEndpoint : localAddr 'tcp://192.168.5.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x8, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7fa3dc033c80 status 13flags 0x26008000, flags-2 0x0, usrFlags 0x20020 } --> Here bind system call fails with errno 99 which mean this IP 192.168.5.121 address is not available yet ! [root@hract21 Desktop]# cat /usr/include/asm-generic/errno.h | grep 99 #define EADDRNOTAVAIL 99 /* Cannot assign requested address */ Verify Error with OS commands: [root@hract21 Desktop]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49 inet addr:192.168.6.121 Bcast:192.168.6.255 Mask:255.255.255.0 [root@hract21 Desktop]# ifconfig eth2 eth2 Link encap:Ethernet HWaddr 08:00:27:4E:C9:BF inet addr:192.168.2.121 Bcast:192.168.2.255 Mask:255.255.255.0 [root@hract21 Desktop]# $GRID_HOME/bin/gpnptool get 2>/dev/null | xmllint --format - | egrep 'CSS-Profile|ASM-Profile|Network id' <gpnp:HostNetwork id="gen" HostName="*"> <gpnp:Network id="net1" IP="192.168.5.0" Adapter="eth1" Use="public"/> <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="asm,cluster_interconnect"/> <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/> <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*" SPFile="+DATA/ract2/ASMPARAMETERFILE/registry.253.870352347" Mode="remote"/> --> GPnPD expects PUBLIC interface eth1 to be bound on IP Adress 192.168.5.121 and not 192.168.6.121 Verify Error with cluvfy: [grid@hract21 CLUVFY]$ cluvfy comp gpnp -n hract21 Verifying GPNP integrity --> cluvfy comp gpnp hangs Fix: Change interface eth1 back to 192.168.5.121 and reboot cluster stack
Case IV : DHCP server returns wrong IP address – VIPs not starting
- Multiple DHCP server
- DHCP server not available
Lower CRS stack starts ***** Local Resources: ***** Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.asm 1 ONLINE ONLINE hract21 STABLE ora.cluster_interconnect.haip 1 ONLINE ONLINE hract21 STABLE ora.crf 1 ONLINE ONLINE hract21 STABLE ora.crsd 1 ONLINE ONLINE hract21 STABLE ora.cssd 1 ONLINE ONLINE hract21 STABLE ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE ora.ctssd 1 ONLINE ONLINE hract21 OBSERVER,STABLE ora.diskmon 1 OFFLINE OFFLINE - STABLE ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE ora.evmd 1 ONLINE ONLINE hract21 STABLE ora.gipcd 1 ONLINE ONLINE hract21 STABLE ora.gpnpd 1 ONLINE ONLINE hract21 STABLE ora.mdnsd 1 ONLINE ONLINE hract21 STABLE ora.storage 1 ONLINE ONLINE hract21 STABLE --> Lower CRS stack is up and running Vips are in state STARTING ora.hract21.vip 1 ONLINE OFFLINE hract21 STARTING ora.hract22.vip 1 ONLINE ONLINE hract22 STABLE ora.hract23.vip 1 ONLINE ONLINE hract23 STABLE ora.mgmtdb 1 ONLINE ONLINE hract23 Open,STABLE ora.oc4j 1 ONLINE ONLINE hract22 STABLE ora.scan1.vip 1 ONLINE OFFLINE hract21 STARTING crsd_orarootagent_root.trc 2015-02-03 12:06:42.065910 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP client id = hract21-vip 2015-02-03 12:06:42.065929 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP Server Port = 67 2015-02-03 12:06:42.065940 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet from = 192.168.5.121 2015-02-03 12:06:42.065949 :CLSDYNAM:2822174464: [ora.hract21.vip]{1:35451:9} [start] DHCP sending packet to = 255.255.255.255 2015-02-03 12:06:47.068966 :GIPCXCPT:2822174464: gipcWaitF [clsdhcp_sendmessage : clsdhcp.c : 616]: EXCEPTION[ ret (uknown) (910) ] failed to wait on obj 0x7fcb8c04d770 [0000000000000ddf] { gipcEndpoint : localAddr 'udp://0.0.0.0:68', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7fcb8c037e70, sendp 0x7fcb8c037cb0 status 13flags 0x20000002, flags-2 0x0, usrFlags 0x8000 }, reqList 0x7fcba8364658, nreq 1, creq 0x7fcba8364b20 timeout 5000 ms, flags 0x4000 --> After sending an DHCP request - we fail in gipcWaitF which means we have some troubles to contact our DHCP server or getting the reqired DHCP address Verify Error with OS commands Download and Install dhcping: Download location: http://pkgs.repoforge.org/dhcping following package : dhcping-1.2-2.2.el6.rf.x86_64.rpm [root@hract21 Desktop]# rpm -i /media/sf_kits/Linux/dhcping-1.2-2.2.el6.rf.x86_64.rpm [root@hract21 Desktop]# dhcping -i eth1 Got answer from: 192.168.3.50 received from 192.168.3.50, expected from 0.0.0.0 Got answer from: 192.168.3.50 received from 192.168.3.50, expected from 0.0.0.0 no answer --> Here we see that we get a wrong DHCP address [root@ns1 dhcp]# dhcping -h 08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199 no answer --> This confirms that our DHCP server is running on wrong IP addess ( 192.168.3.50 ) and can server an DHCP request for a s 192.168.5.xx address Working dhcping output - just for reference : [root@hract21 Desktop]# dhcping -h 08:00:27:7D:8E:49 -s 192.168.5.50 -c 192.168.5.199 Got answer from: 192.168.5.50 Verify Error with cluvfy commands [root@hract21 CLUVFY]# cluvfy comp dhcp -clustername ract2 -verbose Verifying DHCP Check Checking if any DHCP server exists on the network... Checking if network CRS resource is configured and online Network CRS resource is offline or not configured. Proceeding with DHCP checks. PRVG-5726 : Failed to discover DHCP servers on public network listening on port "67" using command "/u01/app/121/grid/bin/crsctl discover dhcp -clientid ract2-scan1-vip " CRS-10010: unable to discover DHCP server in the network listening on port 67 for client ID ract2-scan1-vip CRS-4000: Command discover failed, or completed with errors. PRVF-5704 : No DHCP server were discovered on the public network listening on port 67 Verification of DHCP Check was unsuccessful on all the specified nodes. Additonal info about DHCP setup - I always look at /etc/dhcpd.conf wich is wrong - use /etc/dhcp/dhcpd.conf file instead ! - Note if changing /etc/dhcpd.conf you may need change /etc/sysconfig/dhcpd DHCP config files: /etc/dhcp/dhcpd.conf /etc/sysconfig/dhcpd
Case V : Wrong GNS VIP address – GNS not starting
[root@hract21 network-scripts]# watch 'crs | grep gns' ora.gns 1 ONLINE OFFLINE - STABLE ora.gns.vip 1 ONLINE ONLINE hract21 STABLE -> GNS VIP is ONLINE but GNS doesn't sart gnsd.trc Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved. CLSB:489064000: Argument count (argc) for this daemon is 7 CLSB:489064000: Argument 0 is: /u01/app/121/grid/bin/gnsd.bin CLSB:489064000: Argument 1 is: -trace-level CLSB:489064000: Argument 2 is: 1 CLSB:489064000: Argument 3 is: -ip-address CLSB:489064000: Argument 4 is: 192.168.6.58 CLSB:489064000: Argument 5 is: -startup-endpoint CLSB:489064000: Argument 6 is: ipc://GNS_hract21_4625_9fe54b1833d5fbd2 2015-02-03 17:29:15.339039 : CLSNS:489064000: main::clsns_SetTraceLevel:trace level set to 1. 2015-02-03 17:29:16.226261 : GNS:489064000: main::clsgndmain: ########################################## 2015-02-03 17:29:16.226283 : GNS:489064000: main::clsgndmain: GNS starting on hract21. Process ID: 29196 2015-02-03 17:29:16.226299 : GNS:489064000: main::clsgndmain: ########################################## 2015-02-03 17:29:16.226338 : GNS:489064000: main::clsgnSetTraceLevel: trace level set to 1. .. 2015-02-03 17:29:17.490335 : GNS:489064000: main::clsgndGetInstanceInfo: version: 12.1.0.2.0 (0xc100200) endpoints: tcp://192.168.6.58:63806 process ID: "29196" state: "Initializing". 2015-02-03 17:29:17.491219 : GNS:489064000: main::clsgndadvAdvertise: Listening for commands on endpoint(s): tcp://192.168.6.58:63806. 2015-02-03 17:29:17.496441 : GNS:349841152: Resolve::clsgndnsCreateContainerCallback: listening on port 53 address "192.168.6.58" 2015-02-03 17:29:17.499552 : CLSDMT:351942400: PID for the Process [29196], connkey 12 2015-02-03 17:29:17.505626 : GNS:343537408: Command #0::clsgndcpRunProcessor: Waiting for client command 2015-02-03 17:29:17.512072 : GNS:4160747264: Command #1::clsgndcpRunProcessor: Waiting for client command 2015-02-03 17:29:17.516675 : GNS:4156544768: Command #2::clsgndcpRunProcessor: Waiting for client command 2015-02-03 17:29:17.518326 : GNS:4154443520: Command #3::clsgndcpRunProcessor: Waiting for client command 2015-02-03 17:29:17.747693 : GNS:4152342272: Self-check::clsgndscRun: Name: "GNSTESTHOST.grid12c.example.com" Address: 1.2.3.4. 2015-02-03 17:29:53.882538 : GNS:351942400: main::clsgndCLSDMExit: CLSDM request to quit received - requester: agent. 2015-02-03 17:29:53.882610 : GNS:351942400: main::clsgndCLSDMExit: terminating GNSD on behalf of CLSDM - requester: agent. --> Here we have some troubles as GNS was terminated crsd_orarootagent_root.trc: 2015-02-03 17:29:24.470729 : CLSNS:292816640: main::clsnsgFind:(:CLSNS00230:):query to find GNS using service name "_Oracle-GNS._tcp" failed.: 1: clskec:has:CLSNS:5 3 args[has:CLSNS:5][mod=clsns_DNSSD_FindServers][loc=(:CLSNS00152:)] 2015-02-03 17:29:24.470771 : GNS:292816640: main::clsgnctrGetGNSAddressUsingCLSNS: (:CLSGN01053:) GNS address retrieval failed with error CLSNS-00025 (GNS_SERV_FIND_FAIL) - throwing CLSGN-00070. 1: clskec:has:CLSNS:25 3 args[has:CLSNS:25][mod=clsnsgFind][loc=(:CLSNS00216:)] Verify Error with OS commands: Check GNS and PUBLIC network interface [root@hract21 Desktop]# srvctl config gns GNS is enabled. GNS VIP addresses: 192.168.6.58 Domain served by GNS: grid12c.example.com Check the PUBLIC network interface [root@hract21 network-scripts]# ifconfig eth1:1 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49 inet addr:192.168.5.156 Bcast:192.168.5.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth1:2 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49 inet addr:192.168.5.157 Bcast:192.168.5.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth1:3 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49 inet addr:192.168.5.153 Bcast:192.168.5.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth1:4 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49 inet addr:192.168.5.151 Bcast:192.168.5.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth1:5 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49 inet addr:192.168.5.152 Bcast:192.168.5.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth1:6 Link encap:Ethernet HWaddr 08:00:27:7D:8E:49 inet addr:192.168.6.58 Bcast:192.168.6.255 Mask:255.255.255.0 --> VIPs are using 192.168.5.X as base address whereas our GNS VIP is using: 192.168.6.58 This is not correct VIPs a GNS VIP should have the same Network address ! [root@hract21 Desktop]# srvctl config gns GNS is enabled. GNS VIP addresses: 192.168.6.58 Domain served by GNS: grid12c.example.com Let's investigate whether somebody changed the GNS base add [grid@hract21 trace]$ grep clsgndadvAdvertise gnsd.trc Lets check wether the GNS base address was changed : 2015-02-02 12:32:09.447471 : GNS:3141969472: main::clsgndadvAdvertise: Listening for commands on endpoint(s): tcp://192.168.5.58:46453. 2015-02-03 17:22:00.410829 : GNS:4114409024: main::clsgndadvAdvertise: Listening for commands on endpoint(s): tcp://192.168.5.58:25702. 2015-02-03 17:24:51.165609 : GNS:2221307456: main::clsgndadvAdvertise: Listening for commands on endpoint(s):tcp://192.168.6.58:27105. 2015-02-03 17:29:17.491219 : GNS:489064000: main::clsgndadvAdvertise: Listening for commands on endpoint(s): tcp://192.168.6.58:63806. --> GNS base address was changed from 192.168.5.58 to 192.168.6.58 ! Verify Error with cluvy [grid@hract21 CLUVFY]$ cluvfy comp gns -postcrsinst -verbose Verifying GNS integrity Checking GNS integrity... Checking if the GNS subdomain name is valid... The GNS subdomain name "grid12c.example.com" is a valid domain name Checking if the GNS VIP belongs to same subnet as the public network... PRVF-5213 : GNS resource configuration check failed PRCI-1156 : The GNS VIP 192.168.6.58 does not match any of the available subnets 192.168.5.0, 192.168.2.0. Checking if the GNS VIP is a valid address... GNS VIP "192.168.6.58" resolves to a valid IP address Checking the status of GNS VIP... Checking if FDQN names for domain "grid12c.example.com" are reachable WARNING: PRVF-5218 : "hract21-vip.grid12c.example.com" did not resolve into any IP address PRVF-5827 : The response time for name lookup for name "hract21-vip.grid12c.example.com" exceeded 15 seconds Checking status of GNS resource... Node Running? Enabled? ------------ ------------------------ ------------------------ hract21 no yes hract22 no yes hract23 no yes PRVF-5211 : GNS resource is not running on any node of the cluster Checking status of GNS VIP resource... Node Running? Enabled? ------------ ------------------------ ------------------------ hract21 yes yes hract22 no yes hract23 no yes GNS integrity check failed Verification of GNS integrity was unsuccessful. Checks did not pass for the following node(s): hract21 --> Cluvfy is very helpfull here as cluvfy compares the network adresses with the GNS address If GNS and network addresses don't match cluvfy throws PRVF-5213, PRCI-1156 error. Fix -> Change GNS VIP back to the original address and restart GNS [root@hract21 network-scripts]# srvctl modify gns -vip 192.168.5.58 [root@hract21 network-scripts]# srvctl config gns GNS is enabled. GNS VIP addresses: 192.168.5.58 Domain served by GNS: grid12c.example.com [root@hract21 network-scripts]# srvctl start gns [root@hract21 network-scripts]# srvctl config gns -a -l GNS is enabled. GNS is listening for DNS server requests on port 53 GNS is using port 5353 to connect to mDNS GNS status: OK Domain served by GNS: grid12c.example.com GNS version: 12.1.0.2.0 Globally unique identifier of the cluster where GNS is running: 3d7c30fc9a0eeff3ff12b79970a14c12 Name of the cluster where GNS is running: ract2 Cluster type: server. GNS log level: 1. GNS listening addresses: tcp://192.168.5.58:30218. GNS is individually enabled on nodes: GNS is individually disabled on nodes:
Reference
- How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)
- Troubleshooting Clusterware and Clusterware component error : Address already in use
- RAC Scripts: http://www.hhutzler.de/blog/rac-scripts