Table of Contents
Overview:
- This article provides some background info how to get some info what process stops a RAC service
- If you are lucky you can use OSWatcher output or any OS monitoring tool to track the status change
- CLIENT_PID: TextMessage[9176] not always reports a still running OS process
Tracing a srvctl stop service command
Verify our service status [root@grac41 Desktop]# crs | egrep "NAME|---|grac41.svc" NAME TARGET STATE SERVER STATE_DETAILS ------------------------- ---------- ---------- ------------ ------------------ ora.grac4.grac41.svc ONLINE ONLINE grac41 Find process leader running the srvctl stop service operation [oracle@grac41 ~]$ ps PID TTY TIME CMD 12495 pts/5 00:00:00 bash Stop the service [oracle@grac41 ~]$ strace -f -o s.trc srvctl stop service -d grac4 -s grac41 --> crsd.log: 2014-11-08 08:21:03.373: [UiServer][1505679104]{3:51776:993} Container [ Name: UI_STOP API_HDR_VER: TextMessage[2] CLIENT: TextMessage[] CLIENT_NAME: TextMessage[/usr/bin/java] CLIENT_PID: TextMessage[15151] CLIENT_PRIMARY_GROUP: TextMessage[oinstall] .. What is going on here ? Why is /usr/bin/java reported to stop the service ? We expect the shell/srvctl command as CLIENT_PID Lets monitor with pstree the operation # pstree -p 12495 bash(12495)───strace(15119)───srvctl(15124)───java(15151)─┬─{java}(15157) ├─{java}(15158) ├─{java}(15159) ├─{java}(15160) ├─{java}(15161) ├─{java}(15162) ├─{java}(15163) ├─{java}(15164) ├─{java}(15165) └─{java}(15300) During the service start we have the following process stack: bash --> svrct --> java program The first java process/thread is reported as CLIENT_PID : java(15151) Lets have a look at our strace output 15124 execve("/u01/app/oracle/product/11204/racdb/bin/srvctl", ["srvctl", "stop", "service", "-d", "grac4", "-s", "grac41"], [/* 26 vars */]) = 0 .. 15124 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd8713d69d0) = 15151 .. 15151 execve("/u01/app/oracle/product/11204/racdb/jdk/bin/java", ["/u01/app/oracle/product/11204/ra"..., "-DORACLE_HOME=/u01/app/oracle/pr"..., "-classpath", "/u01/app/oracle/product/11204/ra"..., "-Djava.net.preferIPv4Stack=true", "-Djava.util.logging.config.file="..., "oracle.ops.opsctl.OPSCTLDriver", "stop", "service", "-d", "grac4", "-s", "grac41"], [/* 28 vars */]) = 0 --> Here we can see that this java program is stopping the service Note it will be not easy to track the bash shell/srvctl/java process chain with OSWatcher. crsd.log will show us the java PID but getting a link vi PPID to the bash shell will not be that easy as the srvctl command very like has terminated and you will not be able to track down to ths root process. Only if you're lucky and the OSWachter ps program tracks the java PID as reported by crsd-log ( CLIENT_PID: TextMessage[15151] ) you will be able to track down to the related OS process by using PPID .
Tacking an resource stop command via Oracle process ( tnslsnr )
If an Oracle process stops a resource things are reported much easier listener resource 2014-11-07 19:52:53.277: [UiServer][1505679104]{3:51776:81} Container [ Name: UI_START API_HDR_VER: TextMessage[2] ASYNC_TAG: TextMessage[1] CLIENT: TextMessage[] CLIENT_NAME: TextMessage[tnslsnr] CLIENT_PID: TextMessage[9176] CLIENT_PRIMARY_GROUP: TextMessage[oinstall] EVENT_TAG: TextMessage[1] HOST: TextMessage[grac41] HOST_TAG: TextMessage[1] LOCALE: TextMessage[AMERICAN_AMERICA.AL32UTF8] NO_WAIT_TAG: TextMessage[1] RESOURCE: TextMessage[ora.LISTENER.lsnr^AUSR_ORA_OPI=true] ] --> tnslsnr is start the ora.LISTENER.lsnr the group is oinstall - the PID is 9176 [root@grac41 grac41]# pstree -p 9176 tnslsnr(9176)─┬─{tnslsnr}(9177) └─{tnslsnr}(9178) [root@grac41 grac41]# ps -elf |grep 9176 0 S grid 9176 1 0 80 0 - 54090 poll_s Nov07 ? 00:00:00 /u01/app/11204/grid/bin/tnslsnr LISTENER -inherit
Tracking a stop service request by using TINTs
Hosts: grac41 : running srvctl command, resource home grac42 : PE Master , Generic flow grac41: UiServer reports UI_STOP on grac41 ( this is the first message for our TINT ) UiServer Sending message to PE - Client PID: 15151 ( Here we know the client PID ) [ AGFW][1518286592]{3:51776:993} Agfw Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:47658 --> Sending message to the PE master grac42: CRSPE][4141836032]{3:51776:993} Processing PE command id=1049. Description: [Stop Resource : .. [ CRSPE][4141836032]{3:51776:993} RI [ora.grac4.grac41.svc 1 1] new target state: [OFFLINE] old value: [ONLINE] [ CRSPE][4141836032]{3:51776:993} RI [ora.grac4.grac41.svc 1 1] new internal state: [STOPPING] old value: [STABLE] --> Changing internal state [ CRSPE][4141836032]{3:51776:993} Sending message to agfw: id = 47658 [ CRSPE][4141836032]{3:51776:993} CRS-2673: Attempting to stop 'ora.grac4.grac41.svc' on 'grac41' grac41: [ AGFW][1518286592]{3:51776:993} Received the reply to the message: RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:10786 from the agent /u01/app/11204/grid/bin/oraagent_oracle [ AGFW][1518286592]{3:51776:993} Agfw Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:47658 [ AGFW][1518286592]{3:51776:993} Received the reply to the message: RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:10786 from the agent /u01/app/11204/grid/bin/oraagent_oracle [ AGFW][1518286592]{3:51776:993} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_STOP[ora.grac4.grac41.svc 1 1] ID 4099:47658 --> Agent on grac41 stops the resource grac42: [ CRSPE][4141836032]{3:51776:993} Received reply to action [Stop] message ID: 47658 [ CRSPE][4141836032]{3:51776:993} RI [ora.grac4.grac41.svc 1 1] new internal state: [STABLE] old value: [STOPPING] [ CRSPE][4141836032]{3:51776:993} RI [ora.grac4.grac41.svc 1 1] new external state [OFFLINE] old value: [ONLINE] label = [] --> PE changes its internal state [ CRSRPT][4139734784]{3:51776:993} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.grac4.grac41.svc --> PE publish the new service state [ CRSPE][4141836032]{3:51776:993} CRS-2677: Stop of 'ora.grac4.grac41.svc' on 'grac41' succeeded [ CRSPE][4141836032]{3:51776:993} PE Command [ Stop Resource : 0x7fa008168b70 ] has completed [ CRSPE][4141836032]{3:51776:993} UI Command [Stop Resource : 0x7fa008168b70] is replying to sender. --> Informs UIServer about successfull status change grac41: [ AGFW][1518286592]{3:51776:993} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:47662 [ AGFW][1518286592]{3:51776:993} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:47662 [UiServer][1505679104]{3:51776:993} Container [ Name: UI_DATA [UiServer][1505679104]{3:51776:993} Done for ctx=0x7f346c0276f0 --> UIServer get confirmation that its STOP service command is completed