Table of Contents
Overview
- The bash script works for 11.2 RAC systems with a valid SSH configuration
- Script may have Bugs and needs modification to run on a 2-node or 4-node cluster
- It’s a good idea to have this script ready configured and tested before any network problem comes up
- Local listener ora.LISTENER.lsnr in grac43 depends on VIP resource ora.scan3.vip
- Script ./rac_net_testing.sh: download location
- Run this script as user root or grid – ssh must work and tools like svrctl, olsndides must be in your PATH and work via ssh ( short test : ssh grac42 $GRID_HOME/bin/olsnodes )
- root or grid user must use bash shell ( if using csh ssh commands may return: Ambiguous output redirect error )
- If ping,nslookup commands are failing very intermittent check whether Linux firewall is disabled. For details check following article.
- Configure this script by running
- Stage I: ./rac_net_testing.sh -precheck_rac
- Stage II: ./rac_net_testing.sh -mtu
- Stage III: ./rac_net_testing.sh -ipaddr
- Stage IV: ./rac_net_testing.sh -gns
- Collect static Network : ./rac_net_testing.sh -precheck_perf
- Run specific Networking tests
- Ping public nodenames: ./rac_net_testing.sh -pingpubip
- Ping private IP addresses: ./rac_net_testing.sh -pingprivip
- Traceroute PRIVATE network: ./rac_net_testing.sh -traceroute
- Testing Name Resolution: ./rac_net_testing.sh -nslookup
- Testing VIP status: ./rac_net_testing.sh -vip
- Test SCAN VIP status: ./rac_net_testing.sh -scan
- Finally configure and run script with -netall options : ./rac_net_testing.sh -netall
- -netall runs all specific Networking tests and can be configured to run multiple times
Linux Command Reference
- /bin/netstat -in
- /sbin/ifconfig
- /bin/ping -s <MTU> -c 2 -I source_IP nodename
- /bin/traceroute -s source_IP -r -F nodename-priv <MTU-28>
- /usr/bin/nslookup
Preparing and configuring script ./rac_net_testing.sh
- Note you must collect the need information by following Stage 0 – Stage IV
- We will collect all the needed data to configure parameters for our script in a step by step approach
- You only need to know your RAC hostnames
Stage I: Run ./rac_net_testing.sh and gather further parameters we need to run this script Explore script parameters: PUB_IF PRIV_IF host1 host2 host3 priv1 priv2 priv3 scan scan1 scan2 scan3 fullscan # ./rac_net_testing.sh -precheck_rac ************************************************* *** Generic RAC check *** ************************************************* *** CLuster-Name: grac4 *** Nodeapps Info: GNS/ONS/VIP/Network device Network exists: 1/192.168.1.0/255.255.255.0/eth1, type dhcp VIP exists: /192.168.1.167/192.168.1.167/192.168.1.0/255.255.255.0/eth1, hosting node grac41 VIP exists: /192.168.1.178/192.168.1.178/192.168.1.0/255.255.255.0/eth1, hosting node grac42 VIP exists: /192.168.1.177/192.168.1.177/192.168.1.0/255.255.255.0/eth1, hosting node grac43 GSD exists ONS exists: Local port 6100, remote port 6200, EM port 2016 *** SCAN Info: SCAN name: grac4-scan.grid4.example.com, Network: 1/192.168.1.0/255.255.255.0/eth1 SCAN VIP name: scan1, IP: /grac4-scan.grid4.example.com/192.168.1.171 SCAN VIP name: scan2, IP: /grac4-scan.grid4.example.com/192.168.1.251 SCAN VIP name: scan3, IP: /grac4-scan.grid4.example.com/192.168.1.173 *** CLuster INFO : Host Cluster-No Private-Interc. VIP grac41 1 192.168.2.101 192.168.1.167 grac42 2 192.168.2.102 192.168.1.178 grac43 3 192.168.2.103 192.168.1.177 *** GPnP Info - Verify profile.xml on all nodes Is GPNPD daemon running ? - If not CLSGPNP_NO_DAEMON error should be reported grac41.example.com ---- Is GPNPD daemon running ? - If not CLSGPNP_NO_DAEMON error should be reported grac42.example.com Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). Error CLSGPNP_NO_DAEMON getting profile. --> GPnPD not running - only local profile is available - cleck whether CW is up on grac42 ---- Is GPNPD daemon running ? - If not CLSGPNP_NO_DAEMON error should be reported grac43.example.com ---- --> Check ProfileSequence: grac41.example.com ProfileSequence="11" ClusterName="grac4" ---- --> Check ProfileSequence: grac42.example.com ProfileSequence="11" ClusterName="grac4" ---- --> Check ProfileSequence: grac43.example.com ProfileSequence="11" ClusterName="grac4" ---- --> Profile.xml extract grac41.example.com <gpnp:HostNetwork id="gen" HostName="*"> <gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/> <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/> <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/> <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*,/dev/oracleasm/disks/*" SPFile="+OCR/grac4/asmparameterfile/spfileCopyASM.ora"/> ---- --> Profile.xml extract grac42.example.com <gpnp:HostNetwork id="gen" HostName="*"> <gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/> <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/> <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/> <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*,/dev/oracleasm/disks/*" SPFile="+OCR/grac4/asmparameterfile/spfileCopyASM.ora"/> ---- --> Profile.xml extract grac43.example.com <gpnp:HostNetwork id="gen" HostName="*"> <gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/> <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/> <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/> <orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm*,/dev/oracleasm/disks/*" SPFile="+OCR/grac4/asmparameterfile/spfileCopyASM.ora"/> ---- --> Note GPnP data from all nodes should be identical --> Parameters explored running Stage II: Variable settings Adapter="eth1" Use="public" PUB_IF=eth1 Adapter="eth2" Use="cluster_interconnect" PRIV_IF=eth2 Host Cluster-No Private-Interc. VIP grac41 1 192.168.2.101 192.168.1.167 host1=grac41 priv1=192.168.2.101 vip1=192.168.1.167 grac42 2 192.168.2.102 192.168.1.178 host2=grac43 priv2=192.168.2.102 vip2=192.168.1.178 grac43 3 192.168.2.103 192.168.1.177 host3=grac43 priv3=192.168.2.103 vip3=192.168.1.177 SCAN VIP name: scan1, IP: /grac4-scan.grid4.example.com/192.168.1.171 scan1=192.168.1.171 SCAN VIP name: scan2, IP: /grac4-scan.grid4.example.com/192.168.1.251 scan2=192.168.1.251 SCAN VIP name: scan3, IP: /grac4-scan.grid4.example.com/192.168.1.173 scan3=192.168.1.173 SCAN name: grac4-scan.grid4.example.com scan=grac4-scan ( short name ) fullscan=grac4-scan.grid4.example.com ( FQDN) Stage II: Explore MTU size Explore script parameters: MTU MTU28 # ./rac_net_testing.sh -mtu TESTING MTU Size grac41.example.com Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth1 1500 0 2296946 1 0 0 2177438 0 0 0 BMRU eth1:1 1500 0 - no statistics available - BMRU eth1:3 1500 0 - no statistics available - BMRU eth1:4 1500 0 - no statistics available - BMRU eth2 1500 0 19155395 2055 0 0 13978212 0 0 0 BMRU eth2:1 1500 0 - no statistics available - BMRU grac42.example.com Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth1 1500 0 93245 0 0 0 78976 0 0 0 BMRU eth1:1 1500 0 - no statistics available - BMRU eth1:2 1500 0 - no statistics available - BMRU eth2 1500 0 4622591 0 0 0 4648030 0 0 0 BMRU eth2:1 1500 0 - no statistics available - BMRU grac43.example.com Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth1 1500 0 4566 0 0 0 3023 0 0 0 BMRU eth1:1 1500 0 - no statistics available - BMRU eth1:2 1500 0 - no statistics available - BMRU eth2 1500 0 206817 0 0 0 150402 0 0 0 BMRU eth2:1 1500 0 - no statistics available - BMRU --- TESTING MTU Size done --- --> MTU size is 1500 ( MTU28 = MTU - 28 = 1500 - 28 = 1472 ) Change now variables MTU and MTU28 in ./rac_net_testing.sh MTU=1500 MTU28=1472 Stage III: Explore IP adresses, Broadcast address, Netmask and Device status Verfify script parameters: priv1 priv2 priv3 Explore script parameters: pub1 pub2 pub3 [root@grac41 NET]# ./rac_net_testing.sh -ipaddr TESTING - Info Public Interfaces grac41.example.com eth1 Link encap:Ethernet HWaddr 08:00:27:89:E9:A2 inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 grac42.example.com eth1 Link encap:Ethernet HWaddr 08:00:27:63:08:07 inet addr:192.168.1.102 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 grac43.example.com eth1 Link encap:Ethernet HWaddr 08:00:27:F6:18:43 inet addr:192.168.1.103 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --- TESTING Public Interaces done --- TESTING - Info Private Interfaces grac41.example.com eth2 Link encap:Ethernet HWaddr 08:00:27:6B:E2:BD inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 grac42.example.com eth2 Link encap:Ethernet HWaddr 08:00:27:DF:79:B9 inet addr:192.168.2.102 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 grac43.example.com eth2 Link encap:Ethernet HWaddr 08:00:27:1C:30:DD inet addr:192.168.2.103 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --- TESTING Private Interaces done --- --> PRIVATE Interfaces should be already configured but we can again confirm settings in rac_net_testing.sh priv1=192.168.2.101 priv2=192.168.2.102 priv3=192.168.2.103 For PUBLIC interface the above output translates to pub1=192.168.1.101 pub2=192.168.1.102 pub3=192.168.1.103 Stage IV: Explore GNS and retrieve VIP hostnames and SCAN VIP addresses Verify script parameters: scan1 scan2 scan3 Explore script parameters: vip1 vip1 vip3 root@grac41 NET]# ./rac_net_testing.sh -gns TESTING GNS GNS is enabled. GNS is listening for DNS server requests on port 53 GNS is using port 5353 to connect to mDNS GNS status: OK Domain served by GNS: grid4.example.com GNS version: 11.2.0.4.0 GNS VIP network: ora.net1.network Name Type Value Parameters set in ./rac_net_testing.sh grac4-scan A 192.168.1.171 grac4-scan A 192.168.1.173 grac4-scan A 192.168.1.251 grac4-scan1-vip A 192.168.1.171 --> scan1=192.168.1.171 grac4-scan2-vip A 192.168.1.251 --> scan2=192.168.1.251 grac4-scan3-vip A 192.168.1.173 --> scan3=192.168.1.173 grac41-vip A 192.168.1.167 --> vip1=grac41-vip grac42-vip A 192.168.1.178 --> vip2=grac42-vip grac43-vip A 192.168.1.177 --> vip3=grac43-vip --- TESTING GNS done -- --> Now script rac_net_testing.sh is configured and we can start Network testing
Collect static Network data
Usage
# ./rac_net_testing.sh -precheck_perf
# ./rac_net_testing.sh -precheck_perf 2>&1 | tee rac_pre_perf.TRC
[root@grac41 NET]# ./rac_net_testing.sh -precheck_perf 2>&1 | tee rac_pre_perf.TRC
*************************************************
*** Firewall should be disabled on all nodes ***
*************************************************
grac41.example.com
iptables: Firewall is not running.
grac42.example.com
iptables: Firewall is not running.
grac43.example.com
iptables: Firewall is not running.
--> Status ok
*******************************************************************************
*** netstat should report the following ***
*** - MTU size sould be equal in all nodes ***
*** - Network Devices should be up and running ( Flg: RU ) ***
*** - Check statistics for RX/TX packets ( RX-ERR RX-DRP RX-OVR,... ) ***
*** - Compare Broadcast and Netmask on all nodes ***
*******************************************************************************
grac41.example.com
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth1 1500 0 2378871 1 0 0 2254772 0 0 0 BMRU
eth1:1 1500 0 - no statistics available - BMRU
eth1:3 1500 0 - no statistics available - BMRU
eth1:4 1500 0 - no statistics available - BMRU
eth2 1500 0 22782549 2431 0 0 17100522 0 0 0 BMRU
eth2:1 1500 0 - no statistics available - BMRU
--> Not to much errors - looks good - Flg - RU means RUNNING UP
grac42.example.com
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth1 1500 0 155655 0 0 0 142486 0 0 0 BMRU
eth1:1 1500 0 - no statistics available - BMRU
eth1:2 1500 0 - no statistics available - BMRU
eth2 1500 0 8488235 0 0 0 8962759 0 0 0 BMRU
eth2:1 1500 0 - no statistics available - BMRU
grac43.example.com
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth1 1500 0 72237 0 0 0 53788 0 0 0 BMRU
eth1:1 1500 0 - no statistics available - BMRU
eth1:2 1500 0 - no statistics available - BMRU
eth2 1500 0 2839288 0 0 0 2781127 0 0 0 BMRU
eth2:1 1500 0 - no statistics available - BMRU
*************************************************************************
*** 11.2 RAC manual suggest following search order: hosts: dns files ***
*************************************************************************
grac41.example.com
hosts: files dns
grac42.example.com
hosts: files dns
grac43.example.com
hosts: files dns
--> Order should be changed: dns files
*********************************************************
*** /etc/hosts should be consistent an all nodes files ***
*********************************************************
grac41.example.com
127.0.0.1 localhost localhost.localdomain
192.168.1.101 grac41.example.com grac41
grac42.example.com
127.0.0.1 localhost localhost.localdomain
192.168.1.102 grac42.example.com grac42
grac43.example.com
127.0.0.1 localhost localhost.localdomain
192.168.1.103 grac43.example.com grac43
--> Even if you are using a DNS, Oracle recommends that you add lines to the /etc/hosts
file on each node,
specifying the public IP addresses.
127.0.0.1 should not map to SCAN name, public, private and VIP hostname
****************************************************************
*** /etc/resolv.conf should be consistent an all nodes files ***
****************************************************************
# Generated by NetworkManager
search example.com grid4.example.com de.oracle.com
nameserver 192.168.1.50
# Generated by NetworkManager
search example.com grid4.example.com de.oracle.com
nameserver 192.168.1.50
nameserver 192.135.82.44
nameserver 192.168.1.1
# Generated by NetworkManager
search example.com grid4.example.com de.oracle.com
nameserver 192.168.1.50
nameserver 192.135.82.44
nameserver 192.168.1.1
--> /etc/resolv not consistent - needs to be fixed
**********************************************************************************
*** SCAN listner , SCAN VIPS and nslookup SCAN Info should be consistent ***
*** - all SCAN VIPs should be ONLINE ***
*** - for each IP address returned from nslookup for SCAN NAME - there ***
*** should be SCAN VIP in status ONLINE ***
*** - as a first test ping all IP addresss returned from nslookup SCAN NAME ***
**********************************************************************************
SCAN name: grac4-scan.grid4.example.com, Network: 1/192.168.1.0/255.255.255.0/eth1
SCAN VIP name: scan1, IP: /grac4-scan.grid4.example.com/192.168.1.171
SCAN VIP name: scan2, IP: /grac4-scan.grid4.example.com/192.168.1.251
SCAN VIP name: scan3, IP: /grac4-scan.grid4.example.com/192.168.1.173
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node grac43
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node grac41
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node grac42
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.23.rc1.el6_5.1 <<>> grac4-scan.grid4.example.com +noall +answer
;; global options: +cmd
grac4-scan.grid4.example.com. 11 IN A 192.168.1.251
grac4-scan.grid4.example.com. 11 IN A 192.168.1.171
grac4-scan.grid4.example.com. 11 IN A 192.168.1.173
--> DNS zone delegation is working
$ nslookup grac4-scan
Server: 192.168.1.50
Address: 192.168.1.50#53
Non-authoritative answer:
Name: grac4-scan.grid4.example.com
Address: 192.168.1.171
Name: grac4-scan.grid4.example.com
Address: 192.168.1.173
Name: grac4-scan.grid4.example.com
Address: 192.168.1.251
--> SCAN address resolved by DNS and GNS using zone delagation
**********************************************************************************************************
*** For further Info please read: ***
*** How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1) ***
**********************************************************************************************************
Ping all public nodenames from the local public IP with packet size of MTU
# ./rac_net_testing.sh -pingpubip | egrep 'TESTING|EXECUTE|SUCCESS|ERROR'
TESTING : Ping all public nodenames from the local public IP with packet size of 1500 bytes on node: grac41
EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 grac41"
SUCCESS Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 grac41" - : Status 0
EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 grac41"
SUCCESS Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 grac41" - : Status 0
....
SUCCESS Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 grac43" - : Status 0
EXECUTE Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 grac43"
SUCCESS Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 grac43" - : Status 0
--- TESTING public nodenames from the local public IP on node grac43 done ---
Ping all public nodenames from the local public IP with packet size of MTU
# ./rac_net_testing.sh - pingpubip | egrep 'TESTING|EXECUTE|SUCCESS|ERROR'
TESTING : Ping all public nodenames from the local public IP with packet size of 1500 bytes on node: grac41
EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 grac41"
SUCCESS Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 grac41" - : Status 0
EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 grac41"
SUCCESS Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 grac41" - : Status 0
....
EXECUTE Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 grac43"
SUCCESS Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 grac43" - : Status 0
EXECUTE Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 grac43"
SUCCESS Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 grac43" - : Status 0
--- TESTING public nodenames from the local public IP on node grac43 done ---
Ping all private IPS from the local private IP with packet size of MTU
[root@grac41 NET]# ./rac_net_testing.sh -pingprivip | egrep 'TESTING|EXECUTE|SUCCESS|ERROR'
TESTING Ping all private IP(s) from all local private IP(s) with packet size of 1500 bytes: 192.168.2.101
EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.2.101 192.168.2.101"
SUCCESS Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.2.101 192.168.2.101" - : Status 0
EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.2.101 192.168.2.101"
SUCCESS Command - ....
SUCCESS Command - ssh grac42 "/bin/ping -s 1500 -c 2 -I 192.168.2.102 192.168.2.103" - : Status 0
EXECUTE Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.2.103 192.168.2.103"
SUCCESS Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.2.103 192.168.2.103" - : Status 0
EXECUTE Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.2.103 192.168.2.103"
SUCCESS Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.2.103 192.168.2.103" - : Status 0
--- TESTING Private IP done ---
Traceroute PRIVATE network : Size MTU-28 = 1472 bytes
# ./rac_net_testing.sh -traceroute
***********************************************************************************************
*** TESTING Traceroute PRIVATE network ***
*** - MTU size packet traceroute complete in 1 hop without going through the routing table ***
*** - For MTU size 1500 traceroute packets should be MTU-28=1472 bytes ***
***********************************************************************************************
EXECUTE Command - - ssh grac41 "/bin/traceroute -s 192.168.2.101 -r -F 192.168.2.101 1472"
traceroute to 192.168.2.101 (192.168.2.101), 30 hops max, 1472 byte packets
1 grac41int.example.com (192.168.2.101) 0.016 ms 0.005 ms 0.009 ms
SUCCESS Command - ssh grac41 "/bin/traceroute -s 192.168.2.101 -r -F 192.168.2.101 1472" - : Status 0
EXECUTE Command - - ssh grac42 "/bin/traceroute -s 192.168.2.102 -r -F 192.168.2.101 1472"
traceroute to 192.168.2.101 (192.168.2.101), 30 hops max, 1472 byte packets
1 grac41int.example.com (192.168.2.101) 0.523 ms 0.271 ms 0.192 ms
SUCCESS Command - ssh grac42 "/bin/traceroute -s 192.168.2.102 -r -F 192.168.2.101 1472" - : Status 0
EXECUTE Command - - ssh grac43 "/bin/traceroute -s 192.168.2.103 -r -F 192.168.2.101 1472"
traceroute to 192.168.2.101 (192.168.2.101), 30 hops max, 1472 byte packets
1 grac41int.example.com (192.168.2.101) 3.616 ms 3.529 ms 3.477 ms
SUCCESS Command - ssh grac43 "/bin/traceroute -s 192.168.2.103 -r -F 192.168.2.101 1472" - : Status 0
...
EXECUTE Command - - ssh grac43 "/bin/traceroute -s 192.168.2.103 -r -F 192.168.2.103 1472"
traceroute to 192.168.2.103 (192.168.2.103), 30 hops max, 1472 byte packets
1 grac43int.example.com (192.168.2.103) 0.017 ms 0.004 ms 0.004 ms
SUCCESS Command - ssh grac43 "/bin/traceroute -s 192.168.2.103 -r -F 192.168.2.103 1472" - : Status 0
--- TESTING Traceroute PRIVATE network done ---
Testing Name Resolution
[root@grac41 NET]# ./rac_net_testing.sh -nslookup
TESTING Name Resolution
EXECUTE Command - ssh grac41 "/usr/bin/nslookup grac4-scan "
Server: 192.168.1.50
Address: 192.168.1.50#53
Non-authoritative answer:
Name: grac4-scan.grid4.example.com
Address: 192.168.1.173
Name: grac4-scan.grid4.example.com
Address: 192.168.1.251
Name: grac4-scan.grid4.example.com
Address: 192.168.1.171
SUCCESS Command - ssh grac41 "/usr/bin/nslookup grac4-scan" - : Status 0
EXECUTE Command - ssh grac41 "/usr/bin/nslookup grac41-vip "
Server: 192.168.1.50
Address: 192.168.1.50#53
Non-authoritative answer:
Name: grac41-vip.grid4.example.com
Address: 192.168.1.167
SUCCESS Command - ssh grac41 "/usr/bin/nslookup grac41-vip" - : Status 0
...
EXECUTE Command - ssh grac43 "/usr/bin/nslookup grac43-vip "
Server: 192.168.1.50
Address: 192.168.1.50#53
Non-authoritative answer:
Name: grac43-vip.grid4.example.com
Address: 192.168.1.177
SUCCESS Command - ssh grac43 "/usr/bin/nslookup grac43-vip" - : Status 0
--- TESTING Name Resolution done ---
Testings VIPs connectivty – create and solve a VIP related problem
Create Test Scenario - Stop a VIP on node grac43 [root@grac41 Desktop]# srvctl stop vip -n grac43 -f Test VIP status using ./rac_net_testing.sh [root@grac41 NET]# ./rac_net_testing.sh -vip | egrep 'TESTING|EXECUTE|SUCCESS|ERROR' TESTING VIP EXECUTE Command - ssh grac41 "/bin/ping -c 2 grac41-vip " SUCCESS Command - ssh grac41 "/bin/ping -c 2 grac41-vip" - : Status 0 EXECUTE Command - ssh grac41 "/bin/ping -c 2 grac42-vip " SUCCESS Command - ssh grac41 "/bin/ping -c 2 grac42-vip" - : Status 0 EXECUTE Command - ssh grac41 "/bin/ping -c 2 grac43-vip " ERROR:: Command - ssh grac41 "/bin/ping -c 2 grac43-vip " - failed: Status 1 EXECUTE Command - ssh grac42 "/bin/ping -c 2 grac41-vip " SUCCESS Command - ssh grac42 "/bin/ping -c 2 grac41-vip" - : Status 0 EXECUTE Command - ssh grac42 "/bin/ping -c 2 grac42-vip " SUCCESS Command - ssh grac42 "/bin/ping -c 2 grac42-vip" - : Status 0 EXECUTE Command - ssh grac42 "/bin/ping -c 2 grac43-vip " ERROR:: Command - ssh grac42 "/bin/ping -c 2 grac43-vip " - failed: Status 1 EXECUTE Command - ssh grac43 "/bin/ping -c 2 grac41-vip " SUCCESS Command - ssh grac43 "/bin/ping -c 2 grac41-vip" - : Status 0 EXECUTE Command - ssh grac43 "/bin/ping -c 2 grac42-vip " SUCCESS Command - ssh grac43 "/bin/ping -c 2 grac42-vip" - : Status 0 EXECUTE Command - ssh grac43 "/bin/ping -c 2 grac43-vip " ERROR:: Command - ssh grac43 "/bin/ping -c 2 grac43-vip " - failed: Status 1 --> From all nodes grac43-vip is not reachable Verify CW status [root@grac41 NET]# crs NAME TARGET STATE SERVER STATE_DETAILS ------------------------- ---------- ---------- ------------ ------------------ ora.grac41.vip ONLINE ONLINE grac41 ora.grac42.vip ONLINE ONLINE grac42 ora.grac43.vip OFFLINE OFFLINE .. ora.LISTENER.lsnr ONLINE ONLINE grac41 ora.LISTENER.lsnr ONLINE ONLINE grac42 ora.LISTENER.lsnr OFFLINE OFFLINE grac43 --> ora.grac43.vip OFFLINE and ora.LISTENER.lsnr on grac43 OFFLINE FIX: Start ora.grac43.vip resource and local listener : ora.LISTENER.lsnr # srvctl start vip -n grac43 # srvctl start listener -n grac43
Testings SCAN VIPs connectivty – create and solve a SCAN VIP related problem
Create Test Scenario - Stop SCAN VIP on node grac43 [root@grac41 Desktop]# srvctl stop scan -i 3 -f Test SCAN VIP status runnning ./rac_net_testing.sh [root@grac41 NET]# ./rac_net_testing.sh -scan | egrep 'TESTING|EXECUTE|SUCCESS|ERROR' TESTING SCAN EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 192.168.1.171" SUCCESS Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 192.168.1.171" - : Status 0 EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 192.168.1.171" SUCCESS Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 192.168.1.171" - : Status 0 .. ERROR:: Command failed - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 192.168.1.173" - failed: Status 1 EXECUTE Command - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 192.168.1.173" ERROR:: Command failed - ssh grac41 "/bin/ping -s 1500 -c 2 -I 192.168.1.101 192.168.1.173" - failed: Status 1 EXECUTE Command - ssh grac42 "/bin/ping -s 1500 -c 2 -I 192.168.1.102 192.168.1.171" SUCCESS Command - ssh grac42 "/bin/ping -s 1500 -c 2 -I 192.168.1.102 192.168.1.171" - : Status 0 EXECUTE Command - ssh grac42 "/bin/ping -s 1500 -c 2 -I 192.168.1.102 192.168.1.171" SUCCESS Command - ssh grac42 "/bin/ping -s 1500 -c 2 -I 192.168.1.102 192.168.1.171" - : Status 0 EXECUTE.. ERROR:: Command failed - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 192.168.1.173" - failed: Status 1 EXECUTE Command - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 192.168.1.173" ERROR:: Command failed - ssh grac43 "/bin/ping -s 1500 -c 2 -I 192.168.1.103 192.168.1.173" - failed: Status 1 --- TESTING SCAN done --- --> SCAN VIP 192.168.1.102 192.168.1.173 has problems ! Verify CW status [root@grac41 NET]# crs NAME TARGET STATE SERVER STATE_DETAILS ------------------------- ---------- ---------- ------------ ------------------ .. ora.scan1.vip ONLINE ONLINE grac41 ora.scan2.vip ONLINE ONLINE grac43 ora.scan3.vip OFFLINE OFFLINE .. ora.LISTENER_SCAN1.lsnr ONLINE ONLINE grac43 ora.LISTENER_SCAN2.lsnr ONLINE ONLINE grac41 ora.LISTENER_SCAN3.lsnr ONLINE OFFLINE --> ora.scan3.vip and ora.LISTENER_SCAN3.lsnr are OFFLINE [root@grac41 NET]# srvctl status scan SCAN VIP scan1 is enabled SCAN VIP scan1 is running on node grac41 SCAN VIP scan2 is enabled SCAN VIP scan2 is running on node grac43 SCAN VIP scan3 is enabled SCAN VIP scan3 is not running [root@grac41 NET]# srvctl status scan_listener SCAN Listener LISTENER_SCAN1 is enabled SCAN listener LISTENER_SCAN1 is running on node grac41 SCAN Listener LISTENER_SCAN2 is enabled SCAN listener LISTENER_SCAN2 is running on node grac43 SCAN Listener LISTENER_SCAN3 is enabled SCAN listener LISTENER_SCAN3 is not running Identify the failing IP address in OCR [root@grac43 ~]# crsctl status resource ora.scan3.vip -f NAME=ora.scan3.vip TYPE=ora.scan_vip.type STATE=OFFLINE TARGET=OFFLINE .. SCAN_NAME=grac4-scan.grid4.example.com USR_ORA_VIP=192.168.1.173 FIX: Start SCAN VIP and SCAN Listener - starting SCAN VIP also starts SCAN Listener # srvctl start scan -i 3
Run script ./rac_net_testing.sh with -netall option for repeated network conectivity testing
- -netall section is the heart of the script
Afer you have configured the script ( Stage 0 - Stage IV ) you may configuring the -netall section This section runs ping, traceroute and nslookup commands and can be used to rerun these tests - you can add/remove options from the netall section - you can add/remove grep options to limit/expand the output - $runcount and $sleeptime parameter controls the script usage for -netall section runcount=3 sleeptime=1 Usage # ./rac_net_testing.sh -netall # ./rac_net_testing.sh -netall 2>&1 | tee rac_net_testing.TRC For a quick review you can check rac_net_testing.TRC - but problems like too many hops need a manual review # grep ERROR rac_net_testing.TRC Script Details : ... elif [ "$arg" == "-netall" ]; then for (( i=1; i<=$runcount; i++ )) do echo "***** RUN : $i ( Run count: $runcount ) ***** " run_test_ipaddr run_test_pingpubip | egrep 'TESTING|EXECUTE|SUCCESS|ERROR' # Use egrep to limit ping Output run_test_pingprivip | egrep 'TESTING|EXECUTE|SUCCESS|ERROR' # Use egrep to limit ping Output run_test_traceroute # Don't use egrep here as we need to check the hops run_test_vip | egrep 'TESTING|EXECUTE|SUCCESS|ERROR' # Use egrep to limit ping Output run_test_gns run_test_scan | egrep 'TESTING|EXECUTE|SUCCESS|ERROR' # Use egrep to limit ping Output run_test_nslookup | egrep 'TESTING|EXECUTE|SUCCESS|ERROR' # Use egrep to limit nslookup Output # echo "***** DONE RUN : $i ***** " sleep $sleeptime done else .. Output from a successfull run of ./rac_net_testing.sh -netall on a 3 node cluster.
Error Handling
The script should return ERROR:: for most of the failed command and also print the failed command ERROR:: Command - ssh grac43 "/usr/bin/nslookup grac43-vip " - failed: Status 1 After getting an error run the printed command standalone to get more error details: # ssh grac43 "/usr/bin/nslookup grac43-vip " ;; Got SERVFAIL reply from 192.168.1.50, trying next server ;; connection timed out; trying next origin Server: 192.168.1.50 Address: 192.168.1.50#53 ** server can't find grac43-vip: NXDOMAIN To get a quick overview of all potential errors you may run # ./rac_net_testing.sh -netall 2>&1 | tee rac_net_testing.TRC # grep ERROR rac_net_testing.TRC ERROR:: Command - ssh grac41 "/usr/bin/nslookup grac43-vip " - failed: Status 1 ERROR:: Command - ssh grac42 "/usr/bin/nslookup grac43-vip " - failed: Status 1 ERROR:: Command - ssh grac43 "/usr/bin/nslookup grac43-vip " - failed: Status 1 --> As said for further debugging run the printed commands after ERROR:: label
Multicast requirements
Reference
- How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1)
- http://maleshg.wordpress.com/2013/11/03/identify-your-rac-newtork-details/
- Multicasting Requirement for RAC
Hi Helmut
Wow, very very very useful script for prechecking when setting up Oracle RAC environment. Can it be used in HPUX Operating System?
Thank you
Haris.