Overview:
- This article provides some background info how to get some info what process stops a RAC service
- If you are lucky you can use OSWatcher output or any OS monitoring tool to track the status change
- CLIENT_PID: TextMessage[9176] not always reports a still running OS process
Tracing a srvctl stop service command
Verify our service status
[root@grac41 Desktop]# crs | egrep "NAME|---|grac41.svc"
NAME TARGET STATE SERVER STATE_DETAILS
------------------------- ---------- ---------- ------------ ------------------
ora.grac4.grac41.svc ONLINE ONLINE grac41
Find process leader running the srvctl stop service operation
[oracle@grac41 ~]$ ps
PID TTY TIME CMD
12495 pts/5 00:00:00 bash
Stop the service
[oracle@grac41 ~]$ strace -f -o s.trc srvctl stop service -d grac4 -s grac41
--> crsd.log:
2014-11-08 08:21:03.373: [UiServer][1505679104]{3:51776:993} Container [ Name: UI_STOP
API_HDR_VER:
TextMessage[2]
CLIENT:
TextMessage[]
CLIENT_NAME:
TextMessage[/usr/bin/java]
CLIENT_PID:
TextMessage[15151]
CLIENT_PRIMARY_GROUP:
TextMessage[oinstall]
..
What is going on here ?
Why is /usr/bin/java reported to stop the service ?
We expect the shell/srvctl command as CLIENT_PID
Lets monitor with pstree the operation
# pstree -p 12495
bash(12495)───strace(15119)───srvctl(15124)───java(15151)─┬─{java}(15157)
├─{java}(15158)
├─{java}(15159)
├─{java}(15160)
├─{java}(15161)
├─{java}(15162)
├─{java}(15163)
├─{java}(15164)
├─{java}(15165)
└─{java}(15300)
During the service start we have the following process stack: bash --> svrct --> java program
The first java process/thread is reported as CLIENT_PID : java(15151)
Lets have a look at our strace output
15124 execve("/u01/app/oracle/product/11204/racdb/bin/srvctl", ["srvctl", "stop", "service", "-d", "grac4", "-s", "grac41"], [/* 26 vars */]) = 0
..
15124 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd8713d69d0) = 15151
..
15151 execve("/u01/app/oracle/product/11204/racdb/jdk/bin/java", ["/u01/app/oracle/product/11204/ra"..., "-DORACLE_HOME=/u01/app/oracle/pr"..., "-classpath", "/u01/app/oracle/product/11204/ra"..., "-Djava.net.preferIPv4Stack=true", "-Djava.util.logging.config.file="..., "oracle.ops.opsctl.OPSCTLDriver", "stop", "service", "-d", "grac4", "-s", "grac41"], [/* 28 vars */]) = 0
--> Here we can see that this java program is stopping the service
Note it will be not easy to track the bash shell/srvctl/java process chain with OSWatcher.
crsd.log will show us the java PID but getting a link vi PPID to the bash shell will not be
that easy as the srvctl command very like has terminated and you will not be able to track down
to ths root process. Only if you're lucky and the OSWachter ps program tracks the java PID
as reported by crsd-log ( CLIENT_PID: TextMessage[15151] ) you will be able to track down
to the related OS process by using PPID .
Tacking an resource stop command via Oracle process ( tnslsnr )
If an Oracle process stops a resource things are reported much easier listener resource
2014-11-07 19:52:53.277: [UiServer][1505679104]{3:51776:81} Container [ Name: UI_START
API_HDR_VER:
TextMessage[2]
ASYNC_TAG:
TextMessage[1]
CLIENT:
TextMessage[]
CLIENT_NAME:
TextMessage[tnslsnr]
CLIENT_PID:
TextMessage[9176]
CLIENT_PRIMARY_GROUP:
TextMessage[oinstall]
EVENT_TAG:
TextMessage[1]
HOST:
TextMessage[grac41]
HOST_TAG:
TextMessage[1]
LOCALE:
TextMessage[AMERICAN_AMERICA.AL32UTF8]
NO_WAIT_TAG:
TextMessage[1]
RESOURCE:
TextMessage[ora.LISTENER.lsnr^AUSR_ORA_OPI=true]
]
--> tnslsnr is start the ora.LISTENER.lsnr the group is oinstall - the PID is 9176
[root@grac41 grac41]# pstree -p 9176
tnslsnr(9176)─┬─{tnslsnr}(9177)
└─{tnslsnr}(9178)
[root@grac41 grac41]# ps -elf |grep 9176
0 S grid 9176 1 0 80 0 - 54090 poll_s Nov07 ? 00:00:00 /u01/app/11204/grid/bin/tnslsnr LISTENER -inherit
Tracking a stop service request by using TINTs
Hosts:
grac41 : running srvctl command, resource home
grac42 : PE Master ,
Generic flow
grac41:
UiServer reports UI_STOP on grac41 ( this is the first message for our TINT )
UiServer Sending message to PE - Client PID: 15151 ( Here we know the client PID )
[ AGFW][1518286592]{3:51776:993} Agfw Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:47658
--> Sending message to the PE master
grac42:
CRSPE][4141836032]{3:51776:993} Processing PE command id=1049. Description: [Stop Resource : ..
[ CRSPE][4141836032]{3:51776:993} RI [ora.grac4.grac41.svc 1 1] new target state: [OFFLINE] old value: [ONLINE]
[ CRSPE][4141836032]{3:51776:993} RI [ora.grac4.grac41.svc 1 1] new internal state: [STOPPING] old value: [STABLE]
--> Changing internal state
[ CRSPE][4141836032]{3:51776:993} Sending message to agfw: id = 47658
[ CRSPE][4141836032]{3:51776:993} CRS-2673: Attempting to stop 'ora.grac4.grac41.svc' on 'grac41'
grac41:
[ AGFW][1518286592]{3:51776:993} Received the reply to the message: RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:10786 from the agent /u01/app/11204/grid/bin/oraagent_oracle
[ AGFW][1518286592]{3:51776:993} Agfw Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:47658
[ AGFW][1518286592]{3:51776:993} Received the reply to the message: RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:10786 from the agent /u01/app/11204/grid/bin/oraagent_oracle
[ AGFW][1518286592]{3:51776:993} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:47658
--> Agent on grac41 stops the resource
grac42:
[ CRSPE][4141836032]{3:51776:993} Received reply to action [Stop] message ID: 47658
[ CRSPE][4141836032]{3:51776:993} RI [ora.grac4.grac41.svc 1 1] new internal state: [STABLE] old value: [STOPPING]
[ CRSPE][4141836032]{3:51776:993} RI [ora.grac4.grac41.svc 1 1] new external state [OFFLINE] old value: [ONLINE] label = []
--> PE changes its internal state
[ CRSRPT][4139734784]{3:51776:993} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.grac4.grac41.svc
--> PE publish the new service state
[ CRSPE][4141836032]{3:51776:993} CRS-2677: Stop of 'ora.grac4.grac41.svc' on 'grac41' succeeded
[ CRSPE][4141836032]{3:51776:993} PE Command [ Stop Resource : 0x7fa008168b70 ] has completed
[ CRSPE][4141836032]{3:51776:993} UI Command [Stop Resource : 0x7fa008168b70] is replying to sender.
--> Informs UIServer about successfull status change
grac41:
[ AGFW][1518286592]{3:51776:993} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:47662
[ AGFW][1518286592]{3:51776:993} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:47662
[UiServer][1505679104]{3:51776:993} Container [ Name: UI_DATA
[UiServer][1505679104]{3:51776:993} Done for ctx=0x7f346c0276f0
--> UIServer get confirmation that its STOP service command is completed