Case IV: GPNPD doesn’t start – mismatch between profile.xml and the PRIVATE INTERFACE address
Potential problem:
- PUBLIC interface was changed without changing profile.xml
Monitor Clusterware Resource status after startup: ***** Local Resources: ***** Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.asm 1 ONLINE OFFLINE - STABLE ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE ora.crf 1 ONLINE ONLINE hract21 STABLE ora.crsd 1 ONLINE OFFLINE - STABLE ora.cssd 1 ONLINE OFFLINE hract21 STARTING ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE ora.ctssd 1 ONLINE OFFLINE - STABLE ora.diskmon 1 ONLINE OFFLINE - STABLE ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE ora.gipcd 1 ONLINE ONLINE hract21 STABLE ora.gpnpd 1 ONLINE INTERMEDIATE hract21 STABLE ora.mdnsd 1 ONLINE ONLINE hract21 STABLE ora.storage 1 ONLINE OFFLINE - STABLE --> GPnPD daemon does not start CLUVFY: Cluvfy fails with PRVG-11050 error [grid@hract21 CLUVFY]$ ssh hract22 ~/CLUVFY/bin/cluvfy stage -post crsinst -n hract21,hract22 Performing post-checks for cluster services setup Checking node reachability... Node reachability check passed from node "hract22" Checking user equivalence... User equivalence check passed for user "grid" Checking node connectivity... Checking hosts config file... Verification of the hosts config file successful ERROR: PRVG-11050 : No matching interfaces "eth2" for subnet "192.168.2.0" on nodes "hract21" TRACEFILE review : alert.log: 2015-02-17 09:42:27.823 [OCSSD(15855)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/hract21/crs/trace/ocssd.trc 2015-02-17 09:42:27.824 [OCSSD(15855)]CRS-1603: CSSD on node hract21 shutdown by user. 2015-02-17 09:42:27.823 [CSSDAGENT(15844)]CRS-5818: Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/grid/diag/crs/hract21/crs/trace/ohasd_cssdagent_root.trc. Tue Feb 17 09:42:32 2015 Errors in file /u01/app/grid/diag/crs/hract21/crs/trace/ocssd.trc (incident=2977): CRS-8503 [] [] [] [] [] [] [] [] [] [] [] [] Incident details in: /u01/app/grid/diag/crs/hract21/crs/incident/incdir_2977/ocssd_i2977.trc 2015-02-17 09:42:33.019 [OCSSD(15855)]CRS-8503: Oracle Clusterware OCSSD process with operating system process ID 15855 experienced fatal signal or exception code 6 Sweep [inc][2977]: completed 2015-02-17 09:42:38.005 [OHASD(11954)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.cssd'. Details at (:CRSPE00163:) {0:0:2} in /u01/app/grid/diag/crs/hract21/crs/trace/ohasd.trc. ocssd.trc: 2015-02-17 09:42:32.451021 : CSSD:2417551104: clssnmvDHBValidateNCopy: node 2, hract22, has a disk HB, but no network HB, DHB has rcfg 319544228, wrtcnt, 963949, LATS 92477974, lastSeqNo 963946, uniqueness 1424074596, timestamp 1424162551/21220694 2015-02-17 09:42:32.451113 : CSSD:2422281984: clssnmvDHBValidateNCopy: node 2, hract22, has a disk HB, but no network HB, DHB has rcfg 319544228, wrtcnt, 963950, LATS 92477974, lastSeqNo 963947, uniqueness 1424074596, timestamp 1424162552/21220904 Trace file /u01/app/grid/diag/crs/hract21/crs/trace/ocssd.trc Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved. DDE: Flood control is not active CLSB:2467473152: Oracle Clusterware infrastructure error in OCSSD (OS PID 15855): Fatal signal 6 has occurred in program ocssd thread 2467473152; nested signal count is 1 Incident 2977 created, dump file: /u01/app/grid/diag/crs/hract21/crs/incident/incdir_2977/ocssd_i2977.trc CRS-8503 [] [] [] [] [] [] [] [] [] [] [] [] 2015-02-17 09:42:33.108629 : CSSD:2450904832: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait status 4294967186 2015-02-17 09:42:33.451785 : CSSD:2417551104: clssnmvDHBValidateNCopy: node 2, hract22, has a disk HB, but no network HB, DHB has rcfg 319544228, wrtcnt, 963952, LATS 92478974, lastSeqNo 963949, uniqueness 1424074596, timestamp 1424162552/21221694 2015-02-17 09:42:33.451933 : CSSD:2422281984: clssnmvDHBValidateNCopy: node 2, hract22, has a disk HB, but no network HB, DHB has rcfg 319544228, wrtcnt, 963953, LATS 92478974, lastSeqNo 963950, uniqueness 1424074596, timestamp 1424162553/21221904 --> Here we know that we have a networking problem DTRACE OUTPUT: - In this case DTRACE will no help . Oracle will retrieve the IP-Addresses via ioctl can compare to profile.xml 32373 <... ioctl resumed> 200, {{"lo", {AF_INET, inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("192.168.1.7")}}, {"eth1", {AF_INET, inet_addr("192.168.5.121")}}, {"eth2", {AF_INET, inet_addr("192.168.7.121")}}, {"eth3", {AF_INET, inet_addr("192.168.3.121")}}}}) = 0 Investigate & Fix : Check profile.xml [root@hract21 network-scripts]# $GRID_HOME/bin/gpnptool get 2>/dev/null | xmllint --format - | egrep 'CSS-Profile|ASM-Profile|Network id' <gpnp:HostNetwork id="gen" HostName="*"> <gpnp:Network id="net1" IP="192.168.5.0" Adapter="eth1" Use="public"/> <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="asm,cluster_interconnect"/> <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/> <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*" SPFile="+DATA/ract2/ASMPARAMETERFILE/registry.253.870352347" Mode="remote"/> -> eth2 is our CI network interface - with 192.168.2.0 as the related NETWORK address [grid@hract21 trace]$ ping -I eth2 192.168.2.122 Warning: cannot bind to specified iface, falling back: Operation not permitted PING 192.168.2.122 (192.168.2.122) from 192.168.1.7 eth2: 56(84) bytes of data --> This tells us we have a problem with our CI ! [root@hract21 network-scripts]# ifconfig eth2 eth2 Link encap:Ethernet HWaddr 08:00:27:4E:C9:BF inet addr:192.168.7.121 Bcast:192.168.7.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe4e:c9bf/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --> eth2 is up and running but listening on the wrong network address Fix : Change address for eth2 back to inet addr:192.168.2.121 and restart network and CW
Many thx
This is very helpful
I really like looking through an article that can make people think.
Also, thank you for permitting me to comment!