Case VI : CSSD doesn’t start as Port 42424 is already in use
Force that error and monitor Clusterware Resource status after startup:
[root@hract21 JAVA]# java JavaUDPServer 42424
Listening on UDP Port: 42424
..
**** Local Resources: *****
Resource NAME INST TARGET STATE SERVER STATE_DETAILS
--------------------------- ---- ------------ ------------ --------------- -----------------------------------------
ora.asm 1 ONLINE OFFLINE - STABLE
ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE
ora.crf 1 ONLINE ONLINE hract21 STABLE
ora.crsd 1 ONLINE OFFLINE - STABLE
ora.cssd 1 ONLINE OFFLINE hract21 STARTING
ora.cssdmonitor 1 ONLINE ONLINE hract21 STABLE
ora.ctssd 1 ONLINE OFFLINE - STABLE
ora.diskmon 1 ONLINE OFFLINE - STABLE
ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE
ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.gipcd 1 ONLINE ONLINE hract21 STABLE
ora.gpnpd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.mdnsd 1 ONLINE ONLINE hract21 STABLE
ora.storage 1 ONLINE OFFLINE - STABLE
--> GPNPD, CSSD daemon doen't start
CLUVFY:
Note: Was not able to to identify the problem with following cluvfy command:
[grid@hract21 CLUVFY]$ ssh hract22 cluvfy stage -pre crsinst -n hract21 -networks eth1:192.168.5.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect -verbose
TRACEFILE review :
ocssd.trc:
2015-02-17 12:23:48.485840 :GIPCXCPT:884037376: gipcmodNetworkProcessBind: failed to bind endp 0x7f4f28237090 [0000000000002bcf] { gipcEndpoint : localAddr 'mcast://224.0.0.251:42424/192.168.2.121', remoteAddr '', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp 0x7f4f282391b0 status 13flags 0x20000000, flags-2 0x0, usrFlags 0xc000 }, addr 0x7f4f28237b50 [0000000000002bd1] { gipcAddress : name 'mcast://224.0.0.251:42424/192.168.2.121', objFlags 0x0, addrFlags 0x5 }
2015-02-17 12:23:48.485840 :GIPCXCPT:884037376: gipcmodNetworkProcessBind: slos op : sgipcnMctBind
2015-02-17 12:23:48.485840 :GIPCXCPT:884037376: gipcmodNetworkProcessBind: slos dep : Address already in use (98)
2015-02-17 12:23:48.485840 :GIPCXCPT:884037376: gipcmodNetworkProcessBind: slos loc : bind
2015-02-17 12:23:48.485840 :GIPCXCPT:884037376: gipcmodNetworkProcessBind: slos info: Invalid argument
GREP command :
$ grep "2015-02-17 12:" * | egrep 'gipcmodNetworkProcessBind'
DTRACE SCRIPT :
syscall::bind:entry
{
self->fd = arg0;
self->sockaddr = arg1;
sockaddrp =(struct sockaddr *)copyin(self->sockaddr, sizeof(struct sockaddr));
s = (char * )sockaddrp;
self->port = ( unsigned short )(*(s+3)) + ( unsigned short ) ((*(s+2)*256));
self->ip1=*(s+4);
self->ip2=*(s+5);
self->ip3=*(s+6);
self->ip4=*(s+7);
}
/*
Generic DTRACE script tracking failed bind() system calls:
*/
syscall::bind:return
/arg0<0 && execname != "crsctl.bin"/
{
printf("- Exec: %s - PID: %d bind() failed with error : %d - fd : %d - IP: %d.%d.%d.%d - Port: %d " , execname, pid, arg0, self->fd,
self->ip1, self->ip2, self->ip3, self->ip4, self->port );
}
DTRACE OUTPUT :
[root@hract21 DTRACE]# dtrace -s check_rac.d
dtrace: script 'check_rac.d' matched 19 probes
dtrace: buffer size lowered to 1m
dtrace: dynamic variable size lowered to 1m
CPU ID FUNCTION:NAME
0 1 :BEGIN GRIDHOME: /u01/app/121/grid - GRIDHOME/bin: /u01/app/121/grid/bin - Temp Loc: /var/tmp/.oracle - PIDFILE: hract21.pid - Port for bind: 53
0 103 bind:return - Exec: ocssd.bin - PID: 13064 bind() failed with error : -98 - fd : 211 - IP: 224.0.0.251 - Port: 42424
..
0 89 connect:return - Exec: gipcd.bin - PID: 12957 connect() - fd : 27 - IP: 192.168.5.50 - Port: 53
0 89 connect:return - Exec: gipcd.bin - PID: 12957 connect() - fd : 87 - IP: 192.168.5.50 - Port: 53
0 103 bind:return - Exec: gipcd.bin - PID: 12957 bind() failed with error : -98 - fd : 76 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: gipcd.bin - PID: 12957 bind() failed with error : -98 - fd : 114 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: gipcd.bin - PID: 12957 bind() failed with error : -98 - fd : 124 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: gipcd.bin - PID: 12957 bind() failed with error : -98 - fd : 125 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: ocssd.bin - PID: 13064 bind() failed with error : -98 - fd : 133 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: gipcd.bin - PID: 12957 bind() failed with error : -98 - fd : 126 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: ocssd.bin - PID: 13064 bind() failed with error : -98 - fd : 136 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: gipcd.bin - PID: 12957 bind() failed with error : -98 - fd : 150 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: ocssd.bin - PID: 13064 bind() failed with error : -98 - fd : 137 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: gipcd.bin - PID: 12957 bind() failed with error : -98 - fd : 151 - IP: 224.0.0.251 - Port: 42424
0 103 bind:return - Exec: ocssd.bin - PID: 13064 bind() failed with error : -98 - fd : 138 - IP: 224.0.0.251 - Port: 42424
Investigate & Fix :
[root@hract21 Desktop]# netstat -taupen | egrep ":53 |:5353 |:42424"
udp 0 0 0.0.0.0:5353 501 35360021 12828/mdnsd.bin
udp 0 0 0.0.0.0:5353 501 35360016 12828/mdnsd.bin
udp 0 0 0.0.0.0:5353 501 35360015 12828/mdnsd.bin
udp 0 0 0.0.0.0:5353 501 35360014 12828/mdnsd.bin
udp 264064 0 0.0.0.0:42424 0 35356639 12631/java
--> The clusterware port 42422 is used by a java program with PID 12631
FIX --> kill that process with
# kill -9 12631
After that Clusterware should start immediate
Pages: Page 1, Page 2, Page 3, Page 4, Page 5, Page 6, Page 7, Page 8, Page 9, Page 10, Page 11, Page 12, Page 13, Page 14, Page 15, Page 16
Many thx
This is very helpful
I really like looking through an article that can make people think.
Also, thank you for permitting me to comment!