Add a new network interface to the private Cluster Interconnect ( HAIP )

Table of Contents

Overview

private network should be configured with non-routable IP Adresseses ( according to RFC 1918 : 10.1 – 10.255, 172.16 – 172.31, 192.168.0 – 192.168.255 )
use multiple network interfaces with different subnets to avoid any single point of failure ( like: eth2 192.168.2.0 – eth3 192.168.3.0 )
oifcfg updates profile.xml – this means the cluster should be up an running before adding a new network via HAIP
before running oifcfg shutdown the interfaces with: ifconfig ethX down

Add a new network interface to the cluster_interconnect

 
Verfiy currently used networks by RAC
$ oifcfg getif 
eth1  192.168.1.0  global  public
eth2  192.168.2.0  global  cluster_interconnect
-> We only use eth2 interface for cluster_interconnect 


Configure eth3 interface and verify that oifcfg iflist can list this device ( verify this on all nodes ) 
Verify network using ping an ocicfg 
# ping 192.168.3.121
# ping 192.168.3.122
# ping 192.168.3.123
Verify the new network with oifcfg iflist
$ oifcfg iflist
eth0  10.0.2.0        <--local router
eth1  192.168.1.0     <-- public Interface
eth2  192.168.2.0     <-- RAC cluster_interconnect
eth2  169.254.0.0     <-- RAC used  
eth3  192.168.3.0     <-- Our new device we want to add to the cluster_interconnect
--> Run above commands on the remaining cluster nodes 

Shutdown the network interface eth3 on all nodes
# ssh grac31 ifconfig eth3 down
# ssh grac32 ifconfig eth3 down
# ssh grac33 ifconfig eth3 down

Add interface eth3 to the private RAC cluster_interconnect 
$ oifcfg setif -global eth3/192.168.3.0:cluster_interconnect

Stop clusterware on all nodes and verify that current HAIP are shutdown cleanly
# /u01/app/11203/grid/bin/crsctl stop  cluster -all
..
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'grac31'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'grac31' succeeded
..
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'grac32'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'grac32' succeeded
..
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'grac33'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'grac33' succeeded

Enable network interface eth3 on all nodes
# ssh grac31 ifconfig eth3 up  
# ssh grac32 ifconfig eth3 up 
# ssh grac33 ifconfig eth3 up 

Restart cluster 
$ crsctl start cluster -all 
# /u01/app/11203/grid/bin/crsctl start  cluster -all

Verify cluster_interconnect status
$ oifcfg getif
eth1  192.168.1.0  global  public
eth2  192.168.2.0  global  cluster_interconnect
eth3  192.168.3.0  global  cluster_interconnect

Verify profile.xml  ( on all nodes ) 
/u01/app/11203/grid/gpnp/grac31/profiles/peer/profile.xml
<gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/>
<gpnp:Network id="net3" Adapter="eth3" IP="192.168.3.0" Use="cluster_interconnect"/>

SQL> SELECT * FROM V$CLUSTER_INTERCONNECTS;
NAME        IP_ADDRESS     IS_ SOURCE
--------------- ---------------- --- -------------------------------
eth2:1        169.254.21.215     NO
eth3:1        169.254.139.240  NO

Test I: Shutdown and restart one of two HAIP interfaces

Stop eth2 at OS level and verify eth3 
# ifconfig eth2 down
# ifconfig eth3
eth3:1    Link encap:Ethernet  HWaddr 08:00:27:26:60:C8  
          inet addr:169.254.139.240  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth3:2    Link encap:Ethernet  HWaddr 08:00:27:26:60:C8  
          inet addr:169.254.21.215  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
--> Custer_interconnect failed over  from eth2:1 to eth3:2 - eth2 is not used anymore

Re-enable eth2 again at OS level
# ifconfig eth2 up
--> Wait some seconds to see that failed_over cluster_interconnect is back on eth2

eth2:1    Link encap:Ethernet  HWaddr 08:00:27:69:87:B1  
          inet addr:169.254.21.215  Bcast:169.254.127.255  Mask:255.255.128.0

eth3:1    Link encap:Ethernet  HWaddr 08:00:27:26:60:C8  
          inet addr:169.254.139.240  Bcast:169.254.255.255  Mask:255.255.128.0
--> Cluster interconnect eth3:2 is failed back to eth2:1
Verify current network load using dstat 
# dstat -Neth2,eth3
----total-cpu-usage---- -dsk/total- --net/eth2----net/eth3- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send: recv  send|  in   out | int   csw 
  5   5  59  31   0   0| 180k 2574k|   0     0 :   0     0 | 478B 2450B|2093  4957 
  8   5  85   3   0   0|  16k   98k|3420B 7794B:3244B 4252B|   0     0 |2471  5774 
  4   5  91   0   0   0|  48k 1536B|1501B 1086B: 965B 1276B|   0     0 |2023  4923 
  2   4  94   0   0   0|  32k 1536B|2418B 1371B:1904B 3225B|   0     0 |2060  5179 
  5   4  91   0   0   0|  16k   50k|  72k   34k: 114k   18k|   0     0 |2115  4695 
  6   7  86   0   0   1|  32k 1536B|  94k   36k:  12k   43k|   0     0 |2113  4961 
  8   5  85   3   0   0|  32k  146k|2640B 7284B:1614B 2472B|   0     0 |2374  5716 
  5   5  90   0   0   0|  96k   50k|3554B 3881B:  20k 5483B|   0     0 |2046  5064 
  4   5  92   0   0   0|  32k 1536B|2094B 1146B:1406B 1824B|   0     0 |1996  4965 
  5   5  90   0   0   0|  16k 9728B|2666B 5113B:1470B 2759B|   0     0 |1836  4554 
  7   6  87   0   0   0|  32k   50k|3826B 3918B:2862B 2546B|   0     0 |1838  4435
--> Here we see that our cluster_interconnect is using both interfaces again: eth2 and eth3

Test II: Shutdown down all HAIP interfaces but restart interfaces before a Node Eviction takes place

Verify current Cluster interconnect setup 
$  oifcfg getif
eth1  192.168.1.0  global  public
eth2  192.168.2.0  global  cluster_interconnect
eth3  192.168.3.0  global  cluster_interconnect

Shutdown eth2 and eth3  -monitor network connection and CRS alert.log
# ifconfig eth2 down
# ifconfig eth3 down
# dstat -Neth2,eth3
----total-cpu-usage---- -dsk/total- --net/eth2----net/eth3- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send: recv  send|  in   out | int   csw 
  4   3  46  46   0   0|   0  1536B|   0     0 :   0     0 |   0     0 |1649  4216 
  4   6  45  45   0   0| 692k  366k|   0     0 :   0     0 |   0     0 |2058  4805 
  2   4  49  45   0   0|1136k  862k|   0     0 :   0     0 |   0     0 |1797  4746 
  4   5  46  44   0   1|   0    38M|   0     0 :   0     0 |   0     0 |1821  4221
--> alertgrac33.log
2014-01-25 17:40:43.781
[cssd(4283)]CRS-1612:Network communication with node grac31 (1) missing for 50% of timeout interval.  Removal of this node from cluster in 14.440 seconds
2014-01-25 17:40:43.781
[cssd(4283)]CRS-1612:Network communication with node grac32 (2) missing for 50% of timeout interval.  Removal of this node from cluster in 14.700 seconds
2014-01-25 17:40:50.784
[cssd(4283)]CRS-1611:Network communication with node grac31 (1) missing for 75% of timeout interval.  Removal of this node from cluster in 7.440 seconds
2014-01-25 17:40:51.784
[cssd(4283)]CRS-1611:Network communication with node grac32 (2) missing for 75% of timeout interval.  Removal of this node from cluster in 6.700 seconds

--> Now immediate restart both interfaces before node eviction takes place 
#  ifconfig eth2  up 
#  ifconfig eth3 up
--> CRS stack remains up and running and survived a short network outage of both cluster interconnect interfaces

Reference

How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)
Expert Oracle RAC 12c – Chapter 9 – Network Practices

2 thoughts on “Add a new network interface to the private Cluster Interconnect ( HAIP )”

Excellent post , I am going to spend a lot more time researching this subject

Very help full and nice explanation. I tried it but one instance out of two is not coming up. When i remove new added NIC then everything start working perfectly.

regards
syed