Case II : GIPCD daemon doesn’t start as HOSTNAME.pid file is not readable
Clusterware 12.1.0.2 uses the follow HOSTNAME.pid files for reporting the PID .
If CW can't write to that PID the related CW componet may not start
/u01/app/121/grid/ohasd/init/hract21.pid
/u01/app/121/grid/osysmond/init/hract21.pid
/u01/app/121/grid/gpnp/init/hract21.pid
/u01/app/121/grid/gipc/init/hract21.pid
/u01/app/121/grid/log/hract21/gpnpd/hract21.pid
/u01/app/121/grid/ctss/init/hract21.pid
/u01/app/121/grid/gnsd/init/hract21.pid
/u01/app/121/grid/crs/init/hract21.pid
/u01/app/121/grid/crf/admin/run/crflogd/lhract21.pid
/u01/app/121/grid/crf/admin/run/crfmond/shract21.pid
/u01/app/121/grid/evm/init/hract21.pid
/u01/app/121/grid/mdns/init/hract21.pid
/u01/app/121/grid/ologgerd/init/hract21.pid
Create error and monotor Clusterware Resource status after startup:
[root@hract21 DTRACE]# ls -l /u01/app/121/grid/gipc/init/hract21.pid
-rw-r--r-- 1 grid oinstall 6 Feb 16 10:30 /u01/app/121/grid/gipc/init/hract21.pid
[root@hract21 DTRACE]# chmod 000 /u01/app/121/grid/gipc/init/hract21.pid
***** Local Resources: *****
Resource NAME INST TARGET STATE SERVER STATE_DETAILS
--------------------------- ---- ------------ ------------ --------------- -----------------------------------------
ora.asm 1 ONLINE OFFLINE - STABLE
ora.cluster_interconnect.haip 1 ONLINE OFFLINE - STABLE
ora.crf 1 ONLINE OFFLINE - STABLE
ora.crsd 1 ONLINE OFFLINE - STABLE
ora.cssd 1 ONLINE OFFLINE - STABLE
ora.cssdmonitor 1 OFFLINE OFFLINE - STABLE
ora.ctssd 1 ONLINE OFFLINE - STABLE
ora.diskmon 1 OFFLINE OFFLINE - STABLE
ora.drivers.acfs 1 ONLINE ONLINE hract21 STABLE
ora.evmd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.gipcd 1 ONLINE OFFLINE hract21 STARTING
ora.gpnpd 1 ONLINE INTERMEDIATE hract21 STABLE
ora.mdnsd 1 ONLINE ONLINE hract21 STABLE
ora.storage 1 ONLINE OFFLINE - STABLE
--> GIPCD doensn't start
CLUVFY:
Found no cluvfy command to detect this error
TRACFILE review :
alert.log :
Mon Feb 16 12:09:03 2015
Errors in file /u01/app/grid/diag/crs/hract21/crs/trace/gipcd.trc (incident=2921):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /u01/app/grid/diag/crs/hract21/crs/incident/incdir_2921/gipcd_i2921.trc
2015-02-16 12:09:03.181 [GIPCD(14763)]CRS-8503: Oracle Clusterware GIPCD process with operating system process ID 14763
experienced fatal signal or exception code 6 - Sweep [inc][2921]: completed
--> Got no further indications that file permissons on file /u01/app/121/grid/gipc/init/hract21.pid are the root cause
DTRACE SCRIPT:
syscall::open:entry
{
self->path = copyinstr(arg0);
}
syscall::open:return
/arg0<0 && execname!= "crsctl.bin" && substr( self->path,0,grid_len)== grid_loc && strstr(self->path, pid_file ) == pid_file /
{
printf("- Exec: %s - open() %s failed with error: %d - scan_dir: %s - PID-File : %s ", execname, self->path, arg0, substr( self->path,0,grid_len), pid_file );
}
DTRACE OUTPUT :
DTrace helps us to find that problem very quickly :
[root@hract21 DTRACE]# dtrace -s check_rac.d
dtrace: script 'check_rac.d' matched 21 probes
CPU ID FUNCTION:NAME
0 1 :BEGIN GRIDHOME: /u01/app/121/grid - GRIDHOME/bin: /u01/app/121/grid/bin - Temp Loc: /var/tmp/.oracle - PIDFILE: hract21.pid - Port for bind: 53
0 9 open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir: /var/tmp/.oracle
0 9 open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir: /var/tmp/.oracle
0 9 open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir: /var/tmp/.oracle
0 9 open:return - Exec: oraagent.bin - open() /u01/app/121/grid/gipc/init/hract21.pid failed with error: -13 - scan_dir: /u01/app/121/grid - PID-File : hract21.pid
0 89 connect:return - Exec: mdnsd.bin - PID: 19658 connect() failed with error : -101 - fd : 39 - IP: 17.17.17.17 - Port: 256
0 89 connect:return - Exec: gipcd.bin - PID: 19702 connect() to Nameserver - fd : 27 - IP: 192.168.5.50 - Port: 53
0 9 open:return - Exec: gipcd.bin - open() /u01/app/121/grid/gipc/init/hract21.pid failed with error: -13 - scan_dir: /u01/app/121/grid - PID-File : hract21.pid
0 9 open:return - Exec: gipcd.bin - open() /u01/app/121/grid/gipc/init/hract21.pid failed with error: -13 - scan_dir: /u01/app/121/grid - PID-File : hract21.pid
FIX :
Change permission and reboot Clusterware :
[root@hract21 DTRACE]# chmod 644 /u01/app/121/grid/gipc/init/hract21.pid
[root@hract21 DTRACE]# ls -l /u01/app/121/grid/gipc/init/hract21.pid
-rw-r--r-- 1 grid oinstall 5 Feb 16 11:37 /u01/app/121/grid/gipc/init/hract21.pid
Pages: Page 1, Page 2, Page 3, Page 4, Page 5, Page 6, Page 7, Page 8, Page 9, Page 10, Page 11, Page 12, Page 13, Page 14, Page 15, Page 16
Many thx
This is very helpful
I really like looking through an article that can make people think.
Also, thank you for permitting me to comment!