Table of Contents
Overview
- private network should be configured with non-routable IP Adresseses ( according to RFC 1918 : 10.1 – 10.255, 172.16 – 172.31, 192.168.0 – 192.168.255 )
- use multiple network interfaces with different subnets to avoid any single point of failure ( like: eth2 192.168.2.0 – eth3 192.168.3.0 )
- oifcfg updates profile.xml – this means the cluster should be up an running before adding a new network via HAIP
- before running oifcfg shutdown the interfaces with: ifconfig ethX down
Add a new network interface to the cluster_interconnect
Verfiy currently used networks by RAC $ oifcfg getif eth1 192.168.1.0 global public eth2 192.168.2.0 global cluster_interconnect -> We only use eth2 interface for cluster_interconnect Configure eth3 interface and verify that oifcfg iflist can list this device ( verify this on all nodes ) Verify network using ping an ocicfg # ping 192.168.3.121 # ping 192.168.3.122 # ping 192.168.3.123 Verify the new network with oifcfg iflist $ oifcfg iflist eth0 10.0.2.0 <--local router eth1 192.168.1.0 <-- public Interface eth2 192.168.2.0 <-- RAC cluster_interconnect eth2 169.254.0.0 <-- RAC used eth3 192.168.3.0 <-- Our new device we want to add to the cluster_interconnect --> Run above commands on the remaining cluster nodes Shutdown the network interface eth3 on all nodes # ssh grac31 ifconfig eth3 down # ssh grac32 ifconfig eth3 down # ssh grac33 ifconfig eth3 down Add interface eth3 to the private RAC cluster_interconnect $ oifcfg setif -global eth3/192.168.3.0:cluster_interconnect Stop clusterware on all nodes and verify that current HAIP are shutdown cleanly # /u01/app/11203/grid/bin/crsctl stop cluster -all .. CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'grac31' CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'grac31' succeeded .. CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'grac32' CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'grac32' succeeded .. CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'grac33' CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'grac33' succeeded Enable network interface eth3 on all nodes # ssh grac31 ifconfig eth3 up # ssh grac32 ifconfig eth3 up # ssh grac33 ifconfig eth3 up Restart cluster $ crsctl start cluster -all # /u01/app/11203/grid/bin/crsctl start cluster -all Verify cluster_interconnect status $ oifcfg getif eth1 192.168.1.0 global public eth2 192.168.2.0 global cluster_interconnect eth3 192.168.3.0 global cluster_interconnect Verify profile.xml ( on all nodes ) /u01/app/11203/grid/gpnp/grac31/profiles/peer/profile.xml <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/> <gpnp:Network id="net3" Adapter="eth3" IP="192.168.3.0" Use="cluster_interconnect"/> SQL> SELECT * FROM V$CLUSTER_INTERCONNECTS; NAME IP_ADDRESS IS_ SOURCE --------------- ---------------- --- ------------------------------- eth2:1 169.254.21.215 NO eth3:1 169.254.139.240 NO
Test I: Shutdown and restart one of two HAIP interfaces
Stop eth2 at OS level and verify eth3 # ifconfig eth2 down # ifconfig eth3 eth3:1 Link encap:Ethernet HWaddr 08:00:27:26:60:C8 inet addr:169.254.139.240 Bcast:169.254.255.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth3:2 Link encap:Ethernet HWaddr 08:00:27:26:60:C8 inet addr:169.254.21.215 Bcast:169.254.127.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --> Custer_interconnect failed over from eth2:1 to eth3:2 - eth2 is not used anymore Re-enable eth2 again at OS level # ifconfig eth2 up --> Wait some seconds to see that failed_over cluster_interconnect is back on eth2 eth2:1 Link encap:Ethernet HWaddr 08:00:27:69:87:B1 inet addr:169.254.21.215 Bcast:169.254.127.255 Mask:255.255.128.0 eth3:1 Link encap:Ethernet HWaddr 08:00:27:26:60:C8 inet addr:169.254.139.240 Bcast:169.254.255.255 Mask:255.255.128.0 --> Cluster interconnect eth3:2 is failed back to eth2:1 Verify current network load using dstat # dstat -Neth2,eth3 ----total-cpu-usage---- -dsk/total- --net/eth2----net/eth3- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send: recv send| in out | int csw 5 5 59 31 0 0| 180k 2574k| 0 0 : 0 0 | 478B 2450B|2093 4957 8 5 85 3 0 0| 16k 98k|3420B 7794B:3244B 4252B| 0 0 |2471 5774 4 5 91 0 0 0| 48k 1536B|1501B 1086B: 965B 1276B| 0 0 |2023 4923 2 4 94 0 0 0| 32k 1536B|2418B 1371B:1904B 3225B| 0 0 |2060 5179 5 4 91 0 0 0| 16k 50k| 72k 34k: 114k 18k| 0 0 |2115 4695 6 7 86 0 0 1| 32k 1536B| 94k 36k: 12k 43k| 0 0 |2113 4961 8 5 85 3 0 0| 32k 146k|2640B 7284B:1614B 2472B| 0 0 |2374 5716 5 5 90 0 0 0| 96k 50k|3554B 3881B: 20k 5483B| 0 0 |2046 5064 4 5 92 0 0 0| 32k 1536B|2094B 1146B:1406B 1824B| 0 0 |1996 4965 5 5 90 0 0 0| 16k 9728B|2666B 5113B:1470B 2759B| 0 0 |1836 4554 7 6 87 0 0 0| 32k 50k|3826B 3918B:2862B 2546B| 0 0 |1838 4435 --> Here we see that our cluster_interconnect is using both interfaces again: eth2 and eth3
Test II: Shutdown down all HAIP interfaces but restart interfaces before a Node Eviction takes place
Verify current Cluster interconnect setup $ oifcfg getif eth1 192.168.1.0 global public eth2 192.168.2.0 global cluster_interconnect eth3 192.168.3.0 global cluster_interconnect Shutdown eth2 and eth3 -monitor network connection and CRS alert.log # ifconfig eth2 down # ifconfig eth3 down # dstat -Neth2,eth3 ----total-cpu-usage---- -dsk/total- --net/eth2----net/eth3- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send: recv send| in out | int csw 4 3 46 46 0 0| 0 1536B| 0 0 : 0 0 | 0 0 |1649 4216 4 6 45 45 0 0| 692k 366k| 0 0 : 0 0 | 0 0 |2058 4805 2 4 49 45 0 0|1136k 862k| 0 0 : 0 0 | 0 0 |1797 4746 4 5 46 44 0 1| 0 38M| 0 0 : 0 0 | 0 0 |1821 4221 --> alertgrac33.log 2014-01-25 17:40:43.781 [cssd(4283)]CRS-1612:Network communication with node grac31 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.440 seconds 2014-01-25 17:40:43.781 [cssd(4283)]CRS-1612:Network communication with node grac32 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.700 seconds 2014-01-25 17:40:50.784 [cssd(4283)]CRS-1611:Network communication with node grac31 (1) missing for 75% of timeout interval. Removal of this node from cluster in 7.440 seconds 2014-01-25 17:40:51.784 [cssd(4283)]CRS-1611:Network communication with node grac32 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.700 seconds --> Now immediate restart both interfaces before node eviction takes place # ifconfig eth2 up # ifconfig eth3 up --> CRS stack remains up and running and survived a short network outage of both cluster interconnect interfaces
Reference
- How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)
- Expert Oracle RAC 12c – Chapter 9 – Network Practices
Excellent post , I am going to spend a lot more time researching this subject
Very help full and nice explanation. I tried it but one instance out of two is not coming up. When i remove new added NIC then everything start working perfectly.
regards
syed