Case VII : MDNSD doesn’t start as Port 5353 is already in use
Status lower Clusterware resources aftert startup:
***** Local Resources: *****
Resource NAME INST TARGET STATE SERVER STATE_DETAILS
--------------------------- ---- ------------ ------------ --------------- -----------------------------------------
ora.asm 1 ONLINE OFFLINE - STABLE
ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE
ora.crf 1 ONLINE ONLINE hract21 STABLE
ora.crsd 1 ONLINE OFFLINE - STABLE
ora.cssd 1 ONLINE OFFLINE - STABLE
ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE
ora.ctssd 1 ONLINE OFFLINE - STABLE
ora.diskmon 1 ONLINE OFFLINE - STABLE
ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE
ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.gipcd 1 ONLINE OFFLINE - STABLE
ora.gpnpd 1 ONLINE ONLINE hract21 STABLE
ora.mdnsd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.storage 1 ONLINE OFFLINE - STABLE
--> MDNSD doesn't start
GREP command:
[grid@hract21 trace]$ grep "2015-02-17 12:4" * | egrep 'Address already in use'
mdnsd.trc:
2015-02-17 12:43:26.211079 : CLSDMT:2281699072: PID for the Process [19764], connkey 9
2015-02-17 12:43:27.193282 : MDNS:2353129024: mdnsd interface eth0 (0x2 AF=2 f=0x1043 mcast=-1)
192.168.1.7 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.194932 : MDNS:2353129024: mdnsd interface eth1 (0x3 AF=2 f=0x1043 mcast=-1) 192.168.5.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.194986 : MDNS:2353129024: mdnsd interface eth2 (0x4 AF=2 f=0x1043 mcast=-1)
192.168.2.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.198670 : MDNS:2353129024: mdnsd interface eth3 (0x5 AF=2 f=0x1043 mcast=-1)
192.168.3.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.198723 : MDNS:2353129024: mdnsd interface lo (0x1 AF=2 f=0x49 mcast=-1) 127.0.0.1 mask 255.0.0.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.198726 : MDNS:2353129024: Error! No valid netowrk interfaces found to setup mDNS.
2015-02-17 12:43:27.198729 : MDNS:2353129024: Oracle mDNSResponder ver. mDNSResponder-1076 (Jun 30 2014 19:39:45) , init_rv=-65537
2015-02-17 12:43:27.198818 : MDNS:2353129024: stopping
CLUVFY :
Following cluvfy command doesn't detect the problem:
[grid@hract21 CLUVFY]$ ssh hract22 cluvfy stage -pre crsinst -n hract21,hract22 -networks eth1:192.168.5.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect
DTRACE SCRIPT :
syscall::bind:entry
{
self->fd = arg0;
self->sockaddr = arg1;
sockaddrp =(struct sockaddr *)copyin(self->sockaddr, sizeof(struct sockaddr));
s = (char * )sockaddrp;
self->port = ( unsigned short )(*(s+3)) + ( unsigned short ) ((*(s+2)*256));
self->ip1=*(s+4);
self->ip2=*(s+5);
self->ip3=*(s+6);
self->ip4=*(s+7);
}
/*
Generic DTRACE script tracking failed bind() system calls:
*/
syscall::bind:return
/arg0<0 && execname != "crsctl.bin"/
{
printf("- Exec: %s - PID: %d bind() failed with error : %d - fd : %d - IP: %d.%d.%d.%d - Port: %d " , execname, pid, arg0, self->fd,
self->ip1, self->ip2, self->ip3, self->ip4, self->port );
}
DTRACE OUTPUT :
[root@hract21 DTRACE]# dtrace -s check_rac.d
dtrace: script 'check_rac.d' matched 19 probes
CPU ID FUNCTION:NAME
0 1 :BEGIN GRIDHOME: /u01/app/121/grid - GRIDHOME/bin: /u01/app/121/grid/bin - Temp Loc: /var/tmp/.oracle - PIDFILE: hract21.pid - Port for bind: 53
0 93 sendto:return - Exec: ohasd.bin - PID: 17321 sendto() failed with error : -32 - fd : 173
0 9 open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir: /var/tmp/.oracle
0 9 open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir: /var/tmp/.oracle
0 9 open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir: /var/tmp/.oracle
0 103 bind:return - Exec: mdnsd.bin - PID: 18943 bind() failed with error : -98 - fd : 33 - IP: 0.0.0.0 - Port: 5353
0 103 bind:return - Exec: mdnsd.bin - PID: 18943 bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
0 103 bind:return - Exec: mdnsd.bin - PID: 18943 bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
0 103 bind:return - Exec: mdnsd.bin - PID: 18943 bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
0 103 bind:return - Exec: mdnsd.bin - PID: 18943 bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
0 103 bind:return - Exec: mdnsd.bin - PID: 18943 bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
Investigate & Fix :
[root@hract21 network-scripts]# netstat -taupen | egrep ":53 |:5353 |:42424"
udp 0 0 0.0.0.0:5353 0 36230279 18804/ohasd.bin
udp 0 0 230.0.1.0:42424 0 36230278 18804/ohasd.bin
udp 0 0 224.0.0.251:42424 0 35356639 12631/java
--> The clusterware port 5353 is used by a java program with PID 17263
FIX : kill that process with and restart CW
# kill -9 17263
Pages: Page 1, Page 2, Page 3, Page 4, Page 5, Page 6, Page 7, Page 8, Page 9, Page 10, Page 11, Page 12, Page 13, Page 14, Page 15, Page 16
Many thx
This is very helpful
I really like looking through an article that can make people think.
Also, thank you for permitting me to comment!