- For Detail read : List of gipc defects that prevent GI from starting/joining after private network is restored or node rebooted (Doc ID 1488378.1)
Table of Contents
Bug 9593552 – fixed in 11.2.0.2 GI PSU3, 11.2.0.3 and above, crsd fails to join, refer to note 1337730.1 for details
Note : CRSD Fails to Start due to GIPC Communication Failure with Master (Doc ID 1337730.1) BUG : Bug 9593552 : GIPCCONNECT IS NOT ASYNC 11.2.0.2GIBTWO gipcd.log : gipchaLowerProcessNode: no valid interfaces found crsd.log : gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound Invoking member kill Root Cause : BUG 9593552 is fixed in 11.2.0.2 PSU3, 11.2.0.3 and above
Bug 12720728 – fixed in 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU3, 11.2.0.4 and above, cssd fails to join, refer to note 1352887.1 for details
Node : 11gR2 Grid Infrastructure Node May not Join the Cluster After Evicted With Error sgipcnUdpSend "No buffer space available (74)" (Doc ID 1352887.1) BUG : 12720728 : GIPCHALOWERPROCESSNODE: NO VALID INTERFACES FOUND TO NODE ocssd.log : [ GIPCNET][1543] gipcmodNetworkProcessSend: slos op : sgipcnUdpSend [ GIPCNET][1543] gipcmodNetworkProcessSend: slos dep : No buffer space available (74) ==>> key rediscovery error [GIPCHALO][1543] gipchaLowerProcessNode: no valid interfaces found to node for 595773 ms, node 111c093d0 { host 'racnode2', haName 'CSS_fcrprd', srcLuid 9f9bc4e8-26101e05, dstLuid 3559ec4f-06cd6c73 numInf 0, contigSeq 124472, lastAck 124393, lastValidAck 124471, sendSeq [125035 : 125035], createTime 1729131589, flags 0x2408 } [GIPCHALO][1543] gipchaLowerProcessNode: bootstrap node considered dead because of idle connection time 600001 ms, node 111c093d0 { host 'racnode2', haName 'CSS_fcrprd', srcLuid 9f9bc4e8-26101e05, dstLuid 3559ec4f-06cd6c73 numInf 0, contigSeq 124472, lastAck 124393, lastValidAck 124471, sendSeq [125038 : 125038], createTime 1729131589, flags 0x2408 } Bug Descr. : CSSD may report the following errors if a sendto() system call fails due to some underlying UDP issues at the OS level: gipcmodNetworkProcessSend: slos op : sgipcnUdpSend gipcmodNetworkProcessSend: slos dep : No buffer space available (74) This fix enables the CSSD to handle this error and retry the sendto() operation.
Bug 13334158 – fixed in 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU1, 11.2.0.4 and above, cssd fails to join, refer to note 1456977.1 for details
Node : 11gR2 GI CSS is not Coming up After Private Network Related Problem Recovered due to gipc Issue (Doc ID 1456977.1) BUG : Bug 13334158 : REBOOT OF ONE OF THE SWITCH EVICTS INSTANCES ocssd.log : 2012-03-20 21:04:45.369: [GIPCHGEN][1102465344] gipchaInterfaceFail: marking interface failing 0x1ac10730 { host '', haName 'CSS_crsrtmpdrdbdm', local (nil), ip '192.168.224.1', subnet '192.168.224.0', mask '255.255.255.0', mac '00-17-a4-77-88-48', ifname 'bond1', numRef 4, numFail 0, idxBoot 0, flags 0x184d } [GIPCHGEN][1102465344] gipchaInterfaceFail: marking interface failing 0x1ac10730 { host '', haName 'CSS_crsrtmpdrdbdm', local (nil), ip '192.168.224.1', subnet '192.168.224.0', mask '255.255.255.0', mac '00-17-a4-77-88-48', ifname 'bond1', numRef 4, numFail 0, idxBoot 0, flags 0x184d } or [GIPCHGEN][8] gipchaInterfaceDisable: disabling interface 102352750 { host 'racnode2', haName 'CSS_crs-webyours', local 101525550, ip '192.168.104.131:21974', subnet '192.168.104.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 0, flags 0xa6 } [GIPCHALO][8] gipchaLowerProcessNode: no valid interfaces found to node for 25839 ms, node 101463b90 { host 'nxswdd02', haName 'CSS_crs-webyours', srcLuid af93e29f-a31d34d8, dstLuid 6e41cc16-5dd93ae6 numInf 1, contigSeq 75317, lastAck 75301, lastValidAck 75317, sendSeq [75302 : 75360], createTime 111734562, sentRegister 1, localMonitor 1, flags 0x408 } Bug Descr. : This problem is introduced in 11.2.0.2 GIPSU 3 and 11.2.0.3 by the fix for bug 10231906 . This fix supersedes that fix - for interim patches use this fix instead of that one. After a network problem, the network information in gipc is not restored, causing communication problems between the clusterware processes. Rediscovery Notes : 1. processes like CRSD should show that the network endpoint is closed or the interface is invalidated: [GIPCHDEM][1112189248] gipchaDaemonProcessHAInvalidate: completed ha name invalidate for node 0x2aaaac01fd80 { host 'node1', haName '9ef5-c63e-d216-3b7f', srcLuid 259e2eb0-52aca06a, dstLuid cc2582a4-bce1999e numInf 1, contigSeq 290425, lastAck 281019, lastValidAck 290424, sendSeq [281019 : 281019], createTime 4294577560, sentRegister 1, localMonitor 0, flags 0x28 } 2. gipcd log shows tthat a problem was found and the interface disabled: [ GIPCNET][1109211456] gipcmodNetworkProcessSend: [network] failed send attempt endp 0x17d49a20 [00000000000002e0] { gipcEndpoint : localAddr 'udp://172.16.30.101:13707', remoteAddr '', numPend 5, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x2, usrFlags 0x4000 }, req 0x17f66090 [00000000053ad8fb] { gipcSendRequest : addr 'udp://IP address:16195', data 0x17f68648, len 80, olen 0, parentEndp 0x17d49a20, ret gipcretFail (1), objFlags 0x0, reqFlags 0x2 } ... [GIPCHGEN][1109211456] gipchaInterfaceDisable: disabling interface 0x2aaaac231420 { host 'esemdmdb1', haName 'gipcd_ha_name', local (nil), ip '172.16.30.100:16195',subnet 'IP address', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x6 } [GIPCHALO][1109211456] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x2aaaac231420 { host 'esemdmdb1', haName 'gipcd_ha_name', local (nil), ip 'IP address:16195', subnet 'IP address', mask 'IP address0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x226 } [GIPCDCLT][1075992896] gipcdDeleteAllInterfaces: interface (ip: IP address:56120, mask: 255.255.255.0, subnet: IP address, mac: , ifname: ) deleted [GIPCDCLT][1075992896] gipcdDeleteAllInterfaces: interface (ip: IP address:47081, mask: 255.255.255.0, subnet: IP address, mac: , ifname: ) deleted but the network interface is not restored. Workaround : Reboot the machine
Bug 13811209 – fixed in 11.2.0.3 GI PSU3, 11.2.0.4 and above, cssd fails to join, refer to note 1456977.1 for details
Note :11gR2 Grid Infrastructure CSS fails to start after recovered from cluster_interconnect (network adapter, cable, switch etc) related problems ocssd.log : from surviving node [GIPCHGEN][1102465344] gipchaInterfaceFail: marking interface failing 0x1ac10730 { host '', haName 'CSS_crsrtmpdrdbdm', local (nil), ip '192.168.224.1', subnet '192.168.224.0', mask '255.255.255.0', mac '00-17-a4-77-88-48', ifname 'bond1', numRef 4, numFail 0, idxBoot 0, flags 0x184d } OR [GIPCHGEN][8] gipchaInterfaceDisable: disabling interface 102352750 { host 'racnode2', haName 'CSS_crs-webyours', local 101525550, ip '192.168.104.131:21974', subnet '192.168.104.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 0, flags 0xa6 } [GIPCHALO][8] gipchaLowerProcessNode: no valid interfaces found to node for 25839 ms, node 101463b90 { host 'nxswdd02', haName 'CSS_crs-webyours', srcLuid af93e29f-a31d34d8, dstLuid 6e41cc16-5dd93ae6 numInf 1, contigSeq 75317, lastAck 75301, lastValidAck 75317, sendSeq [75302 : 75360], createTime 111734562, sentRegister 1, localMonitor 1, flags 0x408 } Related : bug 13334158 is fixed in 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU1, 11.2.0.4 and above (see above BUG description ) bug 13811209 is fixed in 11.2.0.3 GI PSU3, 11.2.0.4 and above ( This is the continuation of Bug13334158, the fix in Bug13334158 is incomplete )
bug 13653178 – fixed in 11.2.0.3 GI PSU5, 11.2.0.4 and above, cssd fails to join, refer to note 1479380.1 for details.
The fix caused regression and has been superseded by bug 16547309, refer to note 1564555.1 for details.
Bug : 16547309 : GIPC SHOWS RANK 0 OR -1 AFTER APPLIED PSU 11.2.0.3.5
Multicast is not working for private network for 11.2.0.2.x (expected behavior) or 11.2.0.3 PSU5/PSU6/PSU7 or 12.1.0.1 (due to Bug 16547309)
Note : 11gR2 Grid Infrastructure, cssd fails to join the cluster (GI fails to start as a result) after recovered from private network failure caused by pulling cluster
interconnect cables etc
11.2.0.3 PSU5/PSU6/PSU7 or 12.1.0.1 CSSD Fails to Start if Multicast Fails on Private Network (Doc ID 1564555.1)
Bug desc : After applying 11.2.0.3.5 cssd can not establish connection with peer cssd using broadcast since broadcast address is not created with correct
multicast port number. this issue does not happen where multicast is enabled
OS /var/log/messages - node1
Jul 16 16:12:54 <0.6> racnode1 kernel: e1000e: pci1p2 NIC Link is Down
Jul 16 16:12:55 <0.6> racnode1 kernel: bonding: bond0: link status definitely down for interface pci1p2, disabling it
Jul 16 16:12:55 <0.6> racnode1 kernel: bonding: bond0: now running without any active interface ! ##>> private network failed
--..
Jul 16 16:15:00 <0.6> racnode1 kernel: bnx2 0000:02:00.1: em2: NIC Copper Link is Up, 1000 Mbps full duplex ##>> private network recovered
gipcd.log : [GIPCDCLT][1086585152] gipcdRawInterfaceUpdates: ([update(ip: 192.168.44.5, mask: 255.255.255.0, subnet: 192.168.44.0, mac: 00-26-55-52-75-32,
ifname: bond0), state(gipcdadapterstateUp)])
[GIPCDMON][1106753856] gipcdMonitorSaveInfMetrics: inf[ 0] bond0 - rank 0, avgms 30000000000.000000 [ 4 / 0 / 0 ]
##>> rank stayed 0 after network is restored
[GIPCDMON][1106753856] gipcdMonitorSaveInfMetrics: inf[ 0] bond0 - rank 0, avgms 30000000000.000000 [ 19 / 0 / 0 ]
[ CLSINET][1106753856] Returning NETDATA: 1 interfaces
[ CLSINET][1106753856] # 0 Interface 'bond0',ip='192.168.44.5',mac='00-26-55-52-75-32',mask='255.255.255.0',net='192.168.44.0',use='cluster_interconnect'
..
[GIPCDMON][1106753856] gipcdMonitorSaveInfMetrics: inf[ 0] bond0 - rank -1, avgms 30000000000.000000 [ 0 / 0 / 0 ]
##>> rank changed to "-1" 10 minutes after the failure although it was restored a few minutes earlier
REDISCOVERY INFORMATION:
enable GIPC_TRACE_LEVEL=3 and look at ocssd log. check if message from gipcInternalAddress shows IP address like below.
{ gipcAddress : name 'udp://192.10.100.255:58375:42424', objFlags 0x0, addrFlags 0x0 }
the correct address should be 'udp://192.10.100.255:42424'
WORKAROUND: enable MULTICAST
Multicast is not working for private network for 11.2.0.2.x (expected behavior) or 11.2.0.3 PSU5/PSU6/PSU7 or 12.1.0.1 (due to Bug 16547309)
Bug 16867451 : SOLX64-11.2.0.4-CSS: CSSD DID NOT COME BACK AFTER RESUME ONE OF PRIVATE NETWORKS
Fixed in 11.2.0.4, 12.1.0.2 onward, GI does not start after recovery of private network Duplicate bug : 17831538 gipcd.log : [GIPCDMON][7] gipcdMonitorUpdate: interface DOWN - [ ip 192.168.1.101, subnet 192.168.1.0, mask 255.255.255.0, mac00-21-28-25-a2-09-00-00-00-00-00-00-00-00-2f-00-00-00-00-00, ifname nge1 ] [GIPCDMON][7] gipcdMonitorUpdate: interface DOWN - [ ip 192.168.2.101, subnet 192.168.2.0, mask 255.255.255.0, mac 00-21-28-25-a2-0a-00-00-00-00-00-00-00-00-2f-00-00-00-00-00, ifname e1000g2 ] [GIPCDMON][7] gipcdMonitorUpdate: interface UP - [ ip 192.168.2.101, subnet 192.168.2.0, mask 255.255.255.0, mac 00-21-28-25-a2-0a-00-00-00-00-00-00-00-00-2f-00-00-00-00-00, ifname e1000g2 ] cssd.log [GIPCHALO][8] gipchaLowerProcessNode: no valid interfaces found to node for 392 4690957 ms, node 18cab50 { host 'popen1', haName 'CSS_popen-c13', srcLuid 60abf35b-e660fd80, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 3924690956, sentRegister 0, localMonitor 1, flags 0x4 } [ CSSD][18]clssnmvDHBValidateNcopy: node 1, popen1, has a disk HB, but no network HB, DHB has rcfg 265080551, wrtcnt, 9452, LATS 3924691956, lastSeqNo 9451, uniqueness 1369643316,
Bug 14693336 : GI does not start after recovery of private network ( Duplicate bug 19125577 bug 18667717 )
Bug 14693336 : THE CONNECTION IN GM LAYER FAILS IN GIPC AFTER NIC RESUME Test Env : Two nodes rwsbc03/04 involved in the test and 2 private nics(eth4/eth5) configured, bring down eth4 and eth5 on rwsbc04 from console, after CSSD aborted on rwsbc04, enable eth4 on rwsbc04; $ oifcfg getif eth0 10.209.0.0 global public eth2 10.196.108.0 global asm eth4 192.168.4.0 global cluster_interconnect eth5 192.168.5.0 global cluster_interconnect Timestamps to disable/enable private network on rwsbc04 Dec 23 16:40:17 rwsbc04 eth4: NIC Copper Link is Down Dec 23 16:43:18 rwsbc04 eth5: NIC Copper Link is Down Dec 23 16:49:35 rwsbc04 eth4: NIC Copper Link is Up Fixed in : fixed in 11.2.0.4 GI PSU2, 12.1.0.2