ERROR: PRVF-9802 : Attempt to get udev information from node "hract21" failed No UDEV rule found for device(s) specified Checking: cv/log/cvutrace.log.0 ERRORMSG(hract21): PRVF-9802 : Attempt to get udev information from node "hract21" failed No UDEV rule found for device(s) specified [Thread-757] [ 2015-01-29 15:56:44.157 CET ] [StreamReader.run:65] OUTPUT><CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP> </SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT> <SLOS_OTHERINFO>No UDEV rule found for device(s) specified</SLOS_OTHERINFO> </CV_ERR><CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG> <CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G </CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT> <CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT> <CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT> <CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' | awk '{if ("asmdisk1_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | sed -e 's/://' -e 's/\.\*/\*/g'</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT> .. [Worker 3] [ 2015-01-29 15:56:44.157 CET ] [RuntimeExec.runCommand:144] runCommand: process returns 0 [Worker 3] [ 2015-01-29 15:56:44.157 CET ] [RuntimeExec.runCommand:161] RunTimeExec: output> Run the exectask from OS prompt : [root@hract21 ~]# /tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G <CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP></SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT><SLOS_OTHERINFO>No UDEV rule found for device(s) specified</SLOS_OTHERINFO></CV_ERR><CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG> <CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G </CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT> <CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT> <CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT> <CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' | awk '{if ("asmdisk1_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | sed -e 's/://' -e 's/\.\*/\*/g' </CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT><CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' | awk '{if ("asmdisk2_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | sed -e 's/://' -e 's/\.\*/\*/g' </CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT> Test the exectask in detail: [root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' | awk ' {if ("asmdisk1_10G" ~ $1) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' --> Here awk returns nothing ! [root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE |sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' |awk ' { print $1, $2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' sd?1 @ NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid" --> The above sed script adds sd?1 as parameter $1 and @ as parameter $2 . later awk search for "asmdisk1_10G" in parameter $1 if ("asmdisk1_10G" ~ $1) ... as string "asmdisk1_10G" can be found in paramter $3 but in in paramter $1 !! Potential Fix : Modify search string we get a record back ! [root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE |sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' |awk ' /asmdisk1_10G/ { print $1, $2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' sd?1 @ NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid", .. --> Seems the way Oracle extracts UDEV data is not working for OEL 6 where UDEV Records could look like: NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid", GROUP="asmadmin", MODE="0660" As the ASM disk has the proper permissions I decided to ignore the warnings [root@hract21 rules.d]# ls -l /dev/asm* brw-rw---- 1 grid asmadmin 8, 17 Jan 29 09:33 /dev/asmdisk1_10G brw-rw---- 1 grid asmadmin 8, 33 Jan 29 09:33 /dev/asmdisk2_10G brw-rw---- 1 grid asmadmin 8, 49 Jan 29 09:33 /dev/asmdisk3_10G brw-rw---- 1 grid asmadmin 8, 65 Jan 29 09:33 /dev/asmdisk4_10G
Month: January 2015
Using datapatch in a RAC env
Overview
- Datapatch is the new tool that enables automation of post-patch SQL actions for RDBMS patches.
- If we have a 3 node Rac cluster datapatch runs 3 jobs named LOAD_OPATCH_INVENTORY_1 ,LOAD_OPATCH_INVENTORY_2, LOAD_OPATCH_INVENTORY_3
- This inventory updates requires that all RAC nodes are available ( even for Policy managed database )
- Install Helper package from Node 1585814.1 : [ demo1.sql + demo2.sql ]
- With 12c we have a SQL interface for quering patches (by reading lsinventory via PLSQL )
- For patches that do not have post-patch SQL actions to be performed, calling datapatch is a no-op.
- For patches that do have post-patch SQL instructions to be invoked on the database instance, datapatch will automatically detect ALL pending actions (from one installed patch or multiple installed patches) and complete the actions as appropriate.
What should I do when the datapatch commands throws any error or warning ?
Rollable VS. Non-Rollable Patches: ( From Oracle Docs ) - Patches are designed to be applied in either rolling mode or non-rolling mode. - If a patch is rollable, the patch has no dependency on the SQL script. The database can be brought up without issue. OPatchauto succeeds with a warning on datapatch/sqlpatch. -> For rollable patches: In-1gnore datapatch errors on node 1 - node(). On the last node (node n), run datapatch again. You can cut and paste this command from the log file. If you still encounter datapatch errors on the last node, call Oracle Support or open a Service Request. -> For non-rollable patches: Bring down all databases and stacks manually for all nodes. Run opatchauto apply on every node. Bring up the stack and databases. Note that the databases must be up in order for datapatch to connect and apply the SQL. Manually run datapatch on the last node. Note that if you do not run datapatch, the SQL for the patch will not be applied and you will not benefit from the bug fix. In addition, you may encounter incorrect system behavior depending on the changes the SQL is intended to implement. If datapatch continues to fail, you must roll back the patch. Call Oracle Support for assistance or open a Service Request.
How to check the current patch level and reinstall a SQL patch ?
[oracle@gract1 OPatch]$ ./datapatch -verbose SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 08:55:31 2015 Copyright (c) 2014, Oracle. All rights reserved. Connecting to database...OK Determining current state... Currently installed SQL Patches: 19121550 Currently installed C Patches: 19121550 Adding patches to installation queue and performing prereq checks... Installation queue: Nothing to roll back Nothing to apply Patch installation complete. Total patches installed: 0 SQL Patching tool complete on Sun Jan 25 08:57:14 2015 --> Patch 19121550 is installed ( both parts C layer and SQL layer are installed ) Rollback the patch [oracle@gract1 OPatch]$ ./datapatch -rollback 19121550 SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 09:03:03 2015 Copyright (c) 2014, Oracle. All rights reserved. Connecting to database...OK Determining current state...done Adding patches to installation queue and performing prereq checks...done Installation queue: The following patches will be rolled back: 19121550 Nothing to apply Installing patches... Patch installation complete. Total patches installed: 1 Validating logfiles...done SQL Patching tool complete on Sun Jan 25 09:04:51 2015 Reapply the patch oracle@gract1 OPatch]$ ./datapatch -verbose SQL Patching tool version 12.1.0.1.0 on Sun Jan 25 09:06:55 2015 Copyright (c) 2014, Oracle. All rights reserved. Connecting to database...OK Determining current state... Currently installed SQL Patches: <-- Here we can see that SQL patch is not yet installed ! Currently installed C Patches: 19121550 Adding patches to installation queue and performing prereq checks... Installation queue: Nothing to roll back The following patches will be applied: 19121550 Installing patches... Patch installation complete. Total patches installed: 1 Validating logfiles... Patch 19121550 apply: SUCCESS logfile: /u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2015Jan25_09_08_51.log (no errors) catbundle generate logfile: /u01/app/oracle/cfgtoollogs/catbundle/catbundle_PSU_DW_dw_GENERATE_2015Jan25_09_08_51.log (no errors) catbundle apply logfile: /u01/app/oracle/cfgtoollogs/catbundle/catbundle_PSU_DW_dw_APPLY_2015Jan25_09_08_53.log (no errors) SQL Patching tool complete on Sun Jan 25 09:10:31 2015 Verify the current patch status SQL> select * from dba_registry_sqlpatch; PATCH_ID ACTION STATUS ACTION_TIME DESCRIPTION ---------- --------------- --------------- ------------------------------ -------------------- LOGFILE ------------------------------------------------------------------------------------------------------------------------ 19121550 APPLY SUCCESS 26-OCT-14 12.13.19.575484 PM bundle:PSU /u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2014Oct26_12_01_54.log 19121550 ROLLBACK SUCCESS 25-JAN-15 09.04.51.585648 AM bundle:PSU /u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_rollback_DW_2015Jan25_09_04_43.log 19121550 APPLY SUCCESS 25-JAN-15 09.10.31.872019 AM bundle:PSU /u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2015Jan25_09_08_51.log --> Here we can identify that we re-applied the SQL part of patch 19121550 at : 25-JAN-15 09.10.31
Using Queryable Patch Inventory [ DEMOQP helper package ]
Overview DEMOQP helper package Install Helper package from Node 1585814.1 : [ demo1.sql + demo2.sql ] Have a short look on these package details: SQL> desc DEMOQP PROCEDURE CHECK_PATCH_INSTALLED Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- BUGS QOPATCH_LIST IN PROCEDURE COMPARE_CURRENT_DB Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- BUGS QOPATCH_LIST IN PROCEDURE COMPARE_RAC_NODE Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- NODE VARCHAR2 IN INST VARCHAR2 IN FUNCTION GET_BUG_DETAILS RETURNS XMLTYPE Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- PATCH VARCHAR2 IN FUNCTION GET_DEMO_XSLT RETURNS XMLTYPE Script to test Queryable Patch Inventory : check_patch.sql /* For details see : Queryable Patch Inventory -- SQL Interface to view, compare, validate database patches (Doc ID 1585814.1) */ set echo on set pagesize 20000 set long 200000 /* Is patch 19849140 installed ? */ set serveroutput on exec DEMOQP.check_patch_installed (qopatch_list('19849140')); /* Return details about pacht 19849140 */ select xmltransform(DEMOQP.get_bug_details('19849140'), dbms_qopatch.get_opatch_xslt()) from dual; /* As we are running on a PM managed db let's have look on host_names and instance names */ col HOST_NAME format A30 select host_name, instance_name from gv$instance; select host_name, instance_name from v$instance; /* check Instance ERP_1 on gract2.example.com */ exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1'); select xmltransform (dbms_qopatch.get_opatch_lsinventory(), dbms_qopatch.GET_OPATCH_XSLT()) from dual; /* Compare RAC nodes - this is not working in my env ! --> Getting ORA-06502: PL/SQL: numeric or value error */ set serveroutput on exec demoqp.compare_rac_node('gract2.example.com','ERP_1'); 1) Check whether a certain patch ins installed SQL> /* Is patch 19849140 installed ? */ SQL> set serveroutput on SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140')); ----------Patch Report---------- 19849140 : INSTALLED 2) Check patch details for patch 19849140 SQL> /* Return details about pacht 19849140 */ SQL> select xmltransform(DEMOQP.get_bug_details('19849140'), dbms_qopatch.get_opatch_xslt()) from dual; XMLTRANSFORM(DEMOQP.GET_BUG_DETAILS('19849140'),DBMS_QOPATCH.GET_OPATCH_XSLT()) -------------------------------------------------------------------------------- Patch 19849140: applied on 2015-01-23T16:31:09+01:00 Unique Patch ID: 18183131 Patch Description: Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Comp onent) Created on : 23 Oct 2014, 08:32:20 hrs PST8PDT Bugs fixed: 16505840 16505255 16505717 16505617 16399322 16390989 17486244 1 6168869 16444109 16505361 13866165 16505763 16208257 16904822 17299876 1 6246222 16505540 16505214 15936039 16580269 16838292 16505449 16801843 1 6309853 16505395 17507349 17475155 16493242 17039197 16196609 18045611 1 7463260 17263488 16505667 15970176 16488665 16670327 17551223 Files Touched: cluvfyrac.sh crsdiag.pl lsnodes .. 3) Read in the inventory stuff from a gract2.example.com running instance ERP_1 SQL> /* As we are running on a PM managed db let's have look on host_names and instance names */ SQL> col HOST_NAME format A30 SQL> select host_name, instance_name from gv$instance; HOST_NAME INSTANCE_NAME ------------------------------ ---------------- gract1.example.com ERP_2 gract2.example.com ERP_1 gract3.example.com ERP_3 SQL> select host_name, instance_name from v$instance; HOST_NAME INSTANCE_NAME ------------------------------ ---------------- gract1.example.com ERP_2 SQL> SQL> /* check Instance ERP_1 on gract2.example.com */ SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1'); SQL> select xmltransform (dbms_qopatch.get_opatch_lsinventory(), dbms_qopatch.GET_OPATCH_XSLT()) from dual; XMLTRANSFORM(DBMS_QOPATCH.GET_OPATCH_LSINVENTORY(),DBMS_QOPATCH.GET_OPATCH_XSLT( -------------------------------------------------------------------------------- Oracle Querayable Patch Interface 1.0 -------------------------------------------------------------------------------- Oracle Home : /u01/app/oracle/product/121/racdb Inventory : /u01/app/oraInventory -------------------------------------------------------------------------------- Installed Top-level Products (1): Oracle Database 12c 12.1.0.1.0 Installed Products ( 131) .. 4) Compare RAC nodes This very exiting feature doesn't work - sorry not time for debugging ! SQL> /* Compare RAC nodes - this is not working in my env ! --> Getting ORA-06502: PL/SQL: numeric or value error */ SQL> set serveroutput on SQL> exec demoqp.compare_rac_node('gract2.example.com','ERP_1'); BEGIN demoqp.compare_rac_node('gract2.example.com','ERP_1'); END; * ERROR at line 1: ORA-06502: PL/SQL: numeric or value error: NULL index table key value ORA-06512: at "SYS.DEMOQP", line 40 ORA-06512: at line 1 gract2.example.com ERP_1
Why rollback and reapply SQL patch results in a NO-OP operation ?
[oracle@gract1 OPatch]$ ./datapatch -rollback 19849140 -force SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 19:39:29 2015 Copyright (c) 2014, Oracle. All rights reserved. Connecting to database...OK Determining current state...done Adding patches to installation queue and performing prereq checks...done Installation queue: The following patches will be rolled back: 19849140 Nothing to apply Error: prereq checks failed! patch 19849140: rollback script /u01/app/oracle/product/121/racdb/sqlpatch/19849140/19849140_rollback.sql does not exist Prereq check failed! Exiting without installing any patches See support note 1609718.1 for information on how to resolve the above errors SQL Patching tool complete on Sat Jan 24 19:39:29 2015 What is this ? Lets check dba_registry_sqlpatch whether patch 19849140 comes with any SQL changes SQL> col action_time format A30 SQL> col DESCRIPTION format A20 SQL> select * from dba_registry_sqlpatch ; PATCH_ID ACTION STATUS ACTION_TIME DESCRIPTION ---------- --------------- --------------- ------------------------------ -------------------- LOGFILE ------------------------------------------------------------------------------------------------------------------------ 19121550 APPLY SUCCESS 26-OCT-14 12.13.19.575484 PM bundle:PSU /u01/app/oracle/product/121/racdb/sqlpatch/19121550/19121550_apply_DW_2014Oct26_12_01_54.log --> Patch doesn't provide any SQL changes - so above error isn't more an informational message.
What is the root cause of ORA-20006 in a RAC env?
Stop an instance [oracle@gract2 ~]$ srvctl stop instance -d dw -i dw_3 Resource NAME INST TARGET STATE SERVER STATE_DETAILS --------------------------- ---- ------------ ------------ --------------- ----------------------------------------- ora.dw.db 1 ONLINE ONLINE gract1 Open,STABLE ora.dw.db 2 ONLINE ONLINE gract3 Open,STABLE ora.dw.db 3 OFFLINE OFFLINE - Instance Shutdown,ST ABLE [oracle@gract1 OPatch]$ ./datapatch -verbose SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 20:03:22 2015 Copyright (c) 2014, Oracle. All rights reserved. Connecting to database...OK Determining current state... Currently installed SQL Patches: 19121550 DBD::Oracle::st execute failed: ORA-20006: Number of RAC active instances and opatch jobs configured are not same ORA-06512: at "SYS.DBMS_QOPATCH", line 1007 ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute) [for Statement "DECLARE x XMLType; BEGIN x := dbms_qopatch.get_pending_activity; ? := x.getStringVal(); END;" with ParamValues: :p1=undef] at /u01/app/oracle/product/121/racdb/sqlpatch/sqlpatch.pm line 1293. Note even for policy managed database we need all instances up running on all servers to apply the patch ! Start the instance and and rerun ./datapatch command [oracle@gract1 OPatch]$ srvctl start instance -d dw -i dw_3 [oracle@gract1 OPatch]$ vi check_it.sql [oracle@gract1 OPatch]$ ./datapatch -verbose SQL Patching tool version 12.1.0.1.0 on Sat Jan 24 20:17:33 2015 Copyright (c) 2014, Oracle. All rights reserved. Connecting to database...OK Determining current state... Currently installed SQL Patches: 19121550 ...................
ORA-20008 during datapatch installation on a RAC env
You get ORA-20008 during running datapatch tool or during quering the patch status SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140')); ----------Patch Report---------- BEGIN DEMOQP.check_patch_installed (qopatch_list('19849140')); END; * ERROR at line 1: ORA-20008: Timed out, Job Load_opatch_inventory_3execution time is more than 120Secs ORA-06512: at "SYS.DBMS_QOPATCH", line 1428 ORA-06512: at "SYS.DBMS_QOPATCH", line 182 ORA-06512: at "SYS.DEMOQP", line 157 ORA-06512: at line 1 SQL> set linesize 120 SQL> col NODE_NAME format A20 SQL> col JOB_NAME format A30 SQL> col START_DATE format A35 SQL> col INST_JOB format A30 SQL> select NODE_NAME, INST_ID, INST_JOB from opatch_inst_job; NODE_NAME INST_ID INST_JOB -------------------- ---------- ------------------------------ gract1.example.com 1 Load_opatch_inventory_1 gract3.example.com 2 Load_opatch_inventory_2 gract2.example.com 3 Load_opatch_inventory_3 SQL> SQL> select job_name,state, start_date from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%'; JOB_NAME STATE START_DATE ------------------------------ --------------- ----------------------------------- LOAD_OPATCH_INVENTORY_2 SUCCEEDED 24-JAN-15 11.35.41.629308 AM +01:00 LOAD_OPATCH_INVENTORY_3 SCHEDULED 24-JAN-15 11.35.41.683097 AM +01:00 LOAD_OPATCH_INVENTORY_1 SUCCEEDED 24-JAN-15 11.35.41.156565 AM +01:00 JOB was scheduled but was never succeeded ! --> After fixing the the connections problem to gract2.example.com the job runs to completion SQL> select job_name,state, start_date from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%'; JOB_NAME STATE START_DATE ------------------------------ --------------- ----------------------------------- LOAD_OPATCH_INVENTORY_2 SUCCEEDED 24-JAN-15 11.59.29.078730 AM +01:00 LOAD_OPATCH_INVENTORY_3 SUCCEEDED 24-JAN-15 11.59.29.148714 AM +01:00 LOAD_OPATCH_INVENTORY_1 SUCCEEDED 24-JAN-15 11.59.29.025652 AM +01:00 Verify the patch install on all cluster nodes SQL> set echo on SQL> set pagesize 20000 SQL> set long 200000 SQL> SQL> /* As we are running on a PM managed db let's have look on host_names and instance names */ SQL> col HOST_NAME format A30 SQL> select host_name, instance_name from gv$instance; HOST_NAME INSTANCE_NAME ------------------------------ ---------------- gract1.example.com dw_1 gract2.example.com dw_3 gract3.example.com dw_2 SQL> select host_name, instance_name from v$instance; HOST_NAME INSTANCE_NAME ------------------------------ ---------------- gract1.example.com dw_1 SQL> /* exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','ERP_1'); */ SQL> set serveroutput on SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140')); ----------Patch Report---------- 19849140 : INSTALLED SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract2.example.com','dw_3'); SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140')); ----------Patch Report---------- 19849140 : INSTALLED SQL> exec DBMS_QOPATCH.SET_CURRENT_OPINST ('gract3.example.com','dw_2'); SQL> exec DEMOQP.check_patch_installed (qopatch_list('19849140')); ----------Patch Report---------- 19849140 : INSTALLED
Monitor Script to track dba_scheduler_jobs and opatch_inst_job tables
[oracle@gract1 ~/DATAPATCH]$ cat check_it.sql
connect / as sysdba
alter session set NLS_TIMESTAMP_TZ_FORMAT = 'dd-MON-yyyy HH24:mi:ss';
set linesize 120
col NODE_NAME format A20
col JOB_NAME format A30
col START_DATE format A25
col LAST_START_DATE format A25
col INST_JOB format A30
select NODE_NAME, INST_ID, INST_JOB from opatch_inst_job;
select job_name,state, start_date, LAST_START_DATE from dba_scheduler_jobs where job_name like 'LOAD_OPATCH%';
How to cleanup after ORA-27477 errors ?
oracle@gract1 OPatch]$ ./datapatch -verbose SQL Patching tool version 12.1.0.1.0 on Fri Jan 23 20:44:48 2015 Copyright (c) 2014, Oracle. All rights reserved. Connecting to database...OK Determining current state... Currently installed SQL Patches: 19121550 DBD::Oracle::st execute failed: ORA-27477: "SYS"."LOAD_OPATCH_INVENTORY_3" already exists ORA-06512: at "SYS.DBMS_QOPATCH", line 1011 ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute) [for Statement "DECLARE x XMLType; BEGIN x := dbms_qopatch.get_pending_activity; ? := x.getStringVal(); END;" with ParamValues: :p1=undef] at /u01/app/oracle/product/121/racdb/sqlpatch/sqlpatch.pm line 1293. sqlplus /nolog @check_it NODE_NAME INST_ID INST_JOB -------------------- ---------- ------------------------------ gract2.example.com 1 Load_opatch_inventory_1 gract1.example.com 2 Load_opatch_inventory_2 JOB_NAME STATE START_DATE ------------------------------ --------------- ----------------------------------- LOAD_OPATCH_INVENTORY_1 DISABLED 23-JAN-15 08.38.11.746811 PM +01:00 LOAD_OPATCH_INVENTORY_3 DISABLED 23-JAN-15 08.36.18.506279 PM +01:00 LOAD_OPATCH_INVENTORY_2 DISABLED 23-JAN-15 08.38.11.891360 PM +01:00 Drop the jobs and cleanup the opatch_inst_job table SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_1'); SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_2'); SQL> exec DBMS_SCHEDULER.DROP_JOB('LOAD_OPATCH_INVENTORY_3'); SQL> delete from opatch_inst_job; 2 rows deleted. SQL> commit; Now rerun ./datapatch verbose command and monitor progress SQL> @check_it Connected. NODE_NAME INST_ID INST_JOB -------------------- ---------- ------------------------------ gract2.example.com 1 Load_opatch_inventory_1 gract1.example.com 2 Load_opatch_inventory_2 gract3.example.com 3 Load_opatch_inventory_3 --> All our cluster nodes are ONLINE and the required JOBS are SCHEDULED ! JOB_NAME STATE START_DATE ------------------------------ --------------- ----------------------------------- LOAD_OPATCH_INVENTORY_1 SUCCEEDED 23-JAN-15 08.46.08.885038 PM +01:00 LOAD_OPATCH_INVENTORY_2 SUCCEEDED 23-JAN-15 08.46.08.933665 PM +01:00 LOAD_OPATCH_INVENTORY_3 RUNNING 23-JAN-15 08.46.09.014492 PM +01:00
Reference
- 12.1.0.1 datapatch issue : ORA-27477: “SYS”.”LOAD_OPATCH_INVENTORY_1″ already exists (Doc ID 1934882.1)
- Oracle Database 12.1 : FAQ on Queryable Patch Inventory [ID 1530108.1]
- Datapatch errors at “SYS.DBMS_QOPATCH” [ID 1599479.1]
- Queryable Patch Inventory — SQL Interface to view, compare, validate database patches (Doc ID 1585814.1)
Manually applying CW Patch ( 12.1.0.1.5 )
Overview
- In this tutorial we will manually apply a CW patch [ 19849140 ] without using opatchauto.
- For that we closely follow the patch README – chapter 5 [ patches/12105/19849140/README.html ] -> Manual Steps for Apply/Rollback Patch
Check for conflicts
[root@gract1 CLUVFY-JAN-2015]# $GRID_HOME/OPatch/opatchauto apply /media/sf_kits/patches/12105/19849140 -analyze OPatch Automation Tool Copyright (c) 2015, Oracle Corporation. All rights reserved. OPatchauto version : 12.1.0.1.5 OUI version : 12.1.0.1.0 Running from : /u01/app/121/grid opatchauto log file: /u01/app/121/grid/cfgtoollogs/opatchauto/19849140/opatch_gi_2015-01-22_18-25-48_analyze.log NOTE: opatchauto is running in ANALYZE mode. There will be no change to your system. Parameter Validation: Successful Grid Infrastructure home: /u01/app/121/grid RAC home(s): /u01/app/oracle/product/121/racdb Configuration Validation: Successful Patch Location: /media/sf_kits/patches/12105/19849140 Grid Infrastructure Patch(es): 19849140 RAC Patch(es): 19849140 Patch Validation: Successful Analyzing patch(es) on "/u01/app/oracle/product/121/racdb" ... [WARNING] The local database instance 'dw_2' from '/u01/app/oracle/product/121/racdb' is not running. SQL changes, if any, will not be analyzed. Please refer to the log file for more details. [WARNING] SQL changes, if any, could not be analyzed on the following database(s): ERP ... Please refer to the log file for more details. Apply Summary: opatchauto ran into some warnings during analyze (Please see log file for details): GI Home: /u01/app/121/grid: 19849140 RAC Home: /u01/app/oracle/product/121/racdb: 19849140 opatchauto completed with warnings. You have new mail in /var/spool/mail/root If this is a GI Home, as the root user execute: Oracle Clusterware active version on the cluster is [12.1.0.1.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [482231859]. .. --> As this is a clusterware patch ONLY ignore the WARNINGs
- Note during the analyze step we get a first hint that all instances must run on all all server for applying the patch !
Run pre root script and apply the GRID patch
1) Stop all databases rununing out of this ORACLE_HOME and unmount ACFS filesystem 2) Run the pre root script [grid@gract1 gract1]$ $GRID_HOME>/crs/install/rootcrs.pl -prepatch 3) Apply the CRS patch [grid@gract1 gract1]$ $GRID_HOME/OPatch/opatch apply -oh $GRID_HOME -local /media/sf_kits/patches/12105/19849140/19849140 Oracle Interim Patch Installer version 12.1.0.1.5 Copyright (c) 2015, Oracle Corporation. All rights reserved. Oracle Home : /u01/app/121/grid Central Inventory : /u01/app/oraInventory from : /u01/app/121/grid/oraInst.loc OPatch version : 12.1.0.1.5 OUI version : 12.1.0.1.0 Log file location : /u01/app/121/grid/cfgtoollogs/opatch/opatch2015-01-23_12-25-48PM_1.log Applying interim patch '19849140' to OH '/u01/app/121/grid' Verifying environment and performing prerequisite checks... Interim patch 19849140 is a superset of the patch(es) [ 17077442 ] in the Oracle Home OPatch will roll back the subset patches and apply the given patch. All checks passed. Provide your email address to be informed of security issues, install and initiate Oracle Configuration Manager. Easier for you if you use your My Oracle Support Email address/User Name. Visit http://www.oracle.com/support/policies.html for details. Email address/User Name: You have not provided an email address for notification of security issues. Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Y Please shutdown Oracle instances running out of this ORACLE_HOME on the local system. (Oracle Home = '/u01/app/121/grid') Is the local system ready for patching? [y|n] y User Responded with: Y Backing up files... Rolling back interim patch '17077442' from OH '/u01/app/121/grid' Patching component oracle.crs, 12.1.0.1.0... Patching component oracle.has.db, 12.1.0.1.0... Patching component oracle.has.common, 12.1.0.1.0... RollbackSession removing interim patch '17077442' from inventory OPatch back to application of the patch '19849140' after auto-rollback. Patching component oracle.crs, 12.1.0.1.0... Patching component oracle.has.db, 12.1.0.1.0... Patching component oracle.has.common, 12.1.0.1.0... Verifying the update... Patch 19849140 successfully applied Log file location: /u01/app/121/grid/cfgtoollogs/opatch/opatch2015-01-23_12-25-48PM_1.log OPatch succeeded. Verify OUI inventory [grid@gract2 ~]$ $GRID_HOME//OPatch/opatch lsinventory -------------------------------------------------------------------------------- Installed Top-level Products (1): Oracle Grid Infrastructure 12c 12.1.0.1.0 There are 1 products installed in this Oracle Home. Interim patches (3) : Patch 19849140 : applied on Fri Jan 23 15:52:12 CET 2015 Unique Patch ID: 18183131 Patch description: "Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Component)" Created on 23 Oct 2014, 08:32:20 hrs PST8PDT Bugs fixed: 16505840, 16505255, 16505717, 16505617, 16399322, 16390989, 17486244 16168869, 16444109, 16505361, 13866165, 16505763, 16208257, 16904822 17299876, 16246222, 16505540, 16505214, 15936039, 16580269, 16838292 16505449, 16801843, 16309853, 16505395, 17507349, 17475155, 16493242 17039197, 16196609, 18045611, 17463260, 17263488, 16505667, 15970176 16488665, 16670327, 17551223 ... Patch level status of Cluster nodes : Patch level status of Cluster nodes : Patching Level Nodes -------------- ----- 3174741718 gract2,gract1 482231859 gract3 --> Here Node gract1 and gract2 are ready patched where gract3 still need to be patched !
Apply the DB patch
[oracle@gract2 ~]$ $ORACLE_HOME/OPatch/opatch apply -oh $ORACLE_HOME
-local /media/sf_kits/patches/12105/19849140/19849140
Oracle Interim Patch Installer version 12.1.0.1.5
Copyright (c) 2015, Oracle Corporation. All rights reserved.
Oracle Home : /u01/app/oracle/product/121/racdb
Central Inventory : /u01/app/oraInventory
from : /u01/app/oracle/product/121/racdb/oraInst.loc
OPatch version : 12.1.0.1.5
OUI version : 12.1.0.1.0
Log file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_16-30-11PM_1.log
Applying interim patch '19849140' to OH '/u01/app/oracle/product/121/racdb'
Verifying environment and performing prerequisite checks...
Patch 19849140: Optional component(s) missing : [ oracle.crs, 12.1.0.1.0 ]
Interim patch 19849140 is a superset of the patch(es) [ 17077442 ] in the Oracle Home
OPatch will roll back the subset patches and apply the given patch.
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Y
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/121/racdb')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Rolling back interim patch '17077442' from OH '/u01/app/oracle/product/121/racdb'
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
RollbackSession removing interim patch '17077442' from inventory
OPatch back to application of the patch '19849140' after auto-rollback.
Patching component oracle.has.db, 12.1.0.1.0...
Patching component oracle.has.common, 12.1.0.1.0...
Verifying the update...
Patch 19849140 successfully applied
Log file location: /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_16-30-11PM_1.log
OPatch succeeded.
Run the post script for GRID
As root user execute: # $GRID_HOME/rdbms/install/rootadd_rdbms.sh # $GRID_HOME/crs/install/rootcrs.pl -postpatch Using configuration parameter file: /u01/app/121/grid/crs/install/crsconfig_params .. Verify the RAC Node patch level [oracle@gract3 ~]$ $ORACLE_HOME/OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.1.0.1.5 Copyright (c) 2015, Oracle Corporation. All rights reserved. Oracle Home : /u01/app/oracle/product/121/racdb Central Inventory : /u01/app/oraInventory from : /u01/app/oracle/product/121/racdb/oraInst.loc OPatch version : 12.1.0.1.5 OUI version : 12.1.0.1.0 Log file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/opatch2015-01-23_17-59-49PM_1.log Lsinventory Output file location : /u01/app/oracle/product/121/racdb/cfgtoollogs/opatch/lsinv/lsinventory2015-01-23_17-59-49PM.txt -------------------------------------------------------------------------------- Installed Top-level Products (1): Oracle Database 12c 12.1.0.1.0 There are 1 products installed in this Oracle Home. Interim patches (2) : Patch 19849140 : applied on Fri Jan 23 17:41:28 CET 2015 Unique Patch ID: 18183131 Patch description: "Grid Infrastructure Patch Set Update : 12.1.0.1.1 (HAS Component)" Created on 23 Oct 2014, 08:32:20 hrs PST8PDT Bugs fixed: 16505840, 16505255, 16505717, 16505617, 16399322, 16390989, 17486244 16168869, 16444109, 16505361, 13866165, 16505763, 16208257, 16904822 17299876, 16246222, 16505540, 16505214, 15936039, 16580269, 16838292 16505449, 16801843, 16309853, 16505395, 17507349, 17475155, 16493242 17039197, 16196609, 18045611, 17463260, 17263488, 16505667, 15970176 16488665, 16670327, 17551223 Using configuration parameter file: /u01/app/121/grid/crs/install/crsconfig_params .... Rac system comprising of multiple nodes Local node = gract3 Remote node = gract1 Remote node = gract2 Restart the CRS / database and login into the local instance root@gract2 Desktop]# su - oracle -> Active ORACLE_SID: ERP_1 [oracle@gract2 ~]$ [oracle@gract2 ~]$ sqlplus / as sysdba SQL> select host_name, instance_name from v$instance; HOST_NAME INSTANCE_NAME ------------------------------ ---------------- gract2.example.com ERP_1 Repeat now all above steps for each RAC node !!
Run the datapatch tool for each Oracle Database
ORACLE_SID=ERP_1 [oracle@gract2 OPatch]$ cd $ORACLE_HOME/OPatch [oracle@gract2 OPatch]$ ./datapatch -verbose ORACLE_SID=dw_1 [oracle@gract2 OPatch]$ cd $ORACLE_HOME/OPatch [oracle@gract2 OPatch]$ ./datapatch -verbose For potential problems runing datapatch you may read the following article
Reference
Cluvfy Usage
Download location for 12c cluvfy
http://www.oracle.com/technetwork/database/options/clustering/downloads/index.html
- Cluster Verification Utility Download for Oracle Grid Infrastructure 12c
- Always download the newest cluvfy version from above linke
- The latest CVU version (July 2013) can be used with all currently supported Oracle RAC versions, including Oracle RAC 10g, Oracle RAC 11g and Oracle RAC 12c.
Impact of latest Cluvfy version
It's nothing more annoying than debugging a RAC problem which is finally a Cluvfy BUG. The latest Download from January 2015 shows the following version [grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy -version 12.1.0.1.0 Build 112713x8664 whereas my current 12.1 installation reports the following version [grid@gract1 ~/CLUVFY-JAN-2015]$ cluvfy -version 12.1.0.1.0 Build 100213x866
Cluvfy trace Location
If you have installed cluvfy in /home/grid/CLUVFY-JAN-2015 the related cluvfy traces could be found in cv/log subdirectory [root@gract1 CLUVFY-JAN-2015]# ls /home/grid/CLUVFY-JAN-2015/cv/log cvutrace.log.0 cvutrace.log.0.lck Note some cluvfy commands like : # cluvfy comp dhcp -clustername gract -verbose must be run as root ! Im that case the default trace location may not have the correct permissions . In that uses the script below to set Trace Level and Trace Location
Setting Cluvfy trace File Locaton and Trace Level in a bash script
The following bash script sets the cluvfy trace location and the cluvfy trace level
#!/bin/bash
rm -rf /tmp/cvutrace
mkdir /tmp/cvutrace
export CV_TRACELOC=/tmp/cvutrace
export SRVM_TRACE=true
export SRVM_TRACE_LEVEL=2
Why cluvfy version matters ?
Yesterday I debugged a DHCP problem starting with cluvfy : [grid@gract1 ~]$ cluvfy -version 12.1.0.1.0 Build 100213x8664 [root@gract1 network-scripts]# cluvfy comp dhcp -clustername gract -verbose Verifying DHCP Check Checking if any DHCP server exists on the network... <null> At least one DHCP server exists on the network and is listening on port 67 Checking if DHCP server has sufficient free IP addresses for all VIPs... Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip" <null> Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip" <null> .. DHCP server was able to provide sufficient number of IP addresses The DHCP server response time is within acceptable limits Verification of DHCP Check was unsuccessful on all the specified nodes. --> As verification was unsuccessful I started Network Tracing using tcpdump. But Network tracing looks good and I get a bad feeling about cluvfy ! What to do next ? Install the newest cluvfy version and rerun the test ! [grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy -version 12.1.0.1.0 Build 112713x8664 Now rerun test : [root@gract1 CLUVFY-JAN-2015]# bin/cluvfy comp dhcp -clustername gract -verbose Verifying DHCP Check Checking if any DHCP server exists on the network... DHCP server returned server: 192.168.5.50, loan address: 192.168.5.150/255.255.255.0, lease time: 21600 At least one DHCP server exists on the network and is listening on port 67 Checking if DHCP server has sufficient free IP addresses for all VIPs... Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip" Checking if DHCP server has sufficient free IP addresses for all VIPs... Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip" DHCP server returned server: 192.168.5.50, loan address: 192.168.5.150/255.255.255.0, lease time: 21600 Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip" .. released DHCP server lease for client ID "gract-gract1-vip" on port "67" DHCP server was able to provide sufficient number of IP addresses The DHCP server response time is within acceptable limits Verification of DHCP Check was successful.
Why you should always review your cluvfy logs ?
Per default cluvfy logs are under CV_HOME/cv/logs [grid@gract1 ~/CLUVFY-JAN-2015]$ cluvfy stage -pre crsinst -n gract1 Performing pre-checks for cluster services setup Checking node reachability... Node reachability check passed from node "gract1" Checking user equivalence... User equivalence check passed for user "grid" ERROR: An error occurred in creating a TaskFactory object or in generating a task list PRCT-1011 : Failed to run "oifcfg". Detailed error: [] PRCT-1011 : Failed to run "oifcfg". Detailed error: [] This error is not very helpful at all ! Reviewing cluvfy logfiles for details: [root@gract1 log]# cd $GRID_HOME/cv/log Cluvfy log cvutrace.log.0 : [Thread-49] [ 2015-01-22 08:51:25.283 CET ] [StreamReader.run:65] OUTPUT>PRIF-10: failed to initialize the cluster registry [main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:144] runCommand: process returns 1 [main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:161] RunTimeExec: output> [main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:164] PRIF-10: failed to initialize the cluster registry [main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:170] RunTimeExec: error> [main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:192] Returning from RunTimeExec.runCommand [main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:884] retval = 1 [main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:885] exitval = 1 [main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:886] rtErrLength = 0 [main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:892] Failed to execute command. Command = [/u01/app/121/grid/bin/oifcfg, getif, -from, gpnp] env = null error = [] [main] [ 2015-01-22 08:51:25.287 CET ] [ClusterNetworkInfo.getNetworkInfoFromOifcfg:152] INSTALLEXCEPTION: occured while getting cluster network info. messagePRCT-1011 : Failed to run "oifcfg". Detailed error: [] [main] [ 2015-01-22 08:51:25.287 CET ] [TaskFactory.getNetIfFromOifcfg:4352] Exception occured while getting network information. msg=PRCT-1011 : Failed to run "oifcfg". Detailed error: [] Here we get a better error message : PRIF-10: failed to initialize the cluster registry and we extract the failing command : /u01/app/121/grid/bin/oifcfg getif Now we can retry the OS command as OS level [grid@gract1 ~/CLUVFY-JAN-2015]$ /u01/app/121/grid/bin/oifcfg getif PRIF-10: failed to initialize the cluster registry Btw, if you have uploaded the new cluvfy command you get a much better error output [grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy stage -pre crsinst -n gract1 ERROR: PRVG-1060 : Failed to retrieve the network interface classification information from an existing CRS home at path "/u01/app/121/grid" on the local node PRCT-1011 : Failed to run "oifcfg". Detailed error: PRIF-10: failed to initialize the cluster registry For Fixing PRVG-1060,PRCT-1011,PRIF-10 runnung above cluvfy commnads please read following article: Common cluvfy errors and warnings
Run cluvfy before CRS installation by passing network connections for PUBLIC and CLUSTER_INTERCONNECT
$ ./bin/cluvfy stage -pre crsinst -n grac121,grac122 -networks eth1:192.168.1.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect
Run cluvfy before doing an UPGRADE
grid@grac41 /]$ cluvfy stage -pre crsinst -upgrade -n grac41,grac42,grac43 -rolling -src_crshome $GRID_HOME
-dest_crshome /u01/app/grid_new -dest_version 12.1.0.1.0 -fixup -fixupdir /tmp -verbose
Run cluvfy 12.1 for preparing a 10.2.1.0 CRS installation
Always install newest cluvfy version even for 10gR2 CRS validations! [root@ract1 ~]$ ./bin/cluvfy -version 12.1.0.1.0 Build 112713x8664 Verify OS setup on ract1 [root@ract1 ~]$ ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract1 -verbose -fixup --> Run required scripts [root@ract1 ~]# /tmp/CVU_12.1.0.1.0_oracle/runfixup.sh All Fix-up operations were completed successfully. Repeat this step on ract2 [root@ract2 ~]$ ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract2 -verbose -fixup --> Run required scripts [root@ract2 ~]# /tmp/CVU_12.1.0.1.0_oracle/runfixup.sh All Fix-up operations were completed successfully. Now verify System requirements on both nodes [oracle@ract1 cluvfy12]$ ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract1 -verbose -fixup Verifying system requirement .. NOTE: No fixable verification failures to fix Finally run cluvfy to test CRS installation readiness $ cluvfy12/bin/cluvfy stage -pre crsinst -r 10gR2 \ -networks eth1:192.168.1.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect \ -n ract1,ract2 -verbose .. Pre-check for cluster services setup was successful.
Run cluvfy comp software to check file protections for GRID and RDBMS installations
- Note : Not all files are checked ( SHELL scripts like ohasd are missing ) – Bug 18407533 – CLUVFY DOES NOT VERIFY ALL FILES
- Config File : $GRID_HOME/cv/cvdata/ora_software_cfg.xml
Run cluvfy comp software to verify GRID stack [grid@grac41 ~]$ cluvfy comp software -r 11gR2 -n grac41 -verbose Verifying software Check: Software 1178 files verified Software check passed Verification of software was successful. Run cluvfy comp software to verify RDBMS stack [oracle@grac43 ~]$ cluvfy comp software -d $ORACLE_HOME -r 11gR2 -verbose Verifying software Check: Software 1780 files verified Software check passed Verification of software was successful.
Run cluvfy before CRS installation on a single node and create a script for fixable errors
$ ./bin/cluvfy comp sys -p crs -n grac121 -verbose -fixup Verifying system requirement Check: Total memory Node Name Available Required Status ------------ ------------------------ ------------------------ ---------- grac121 3.7426GB (3924412.0KB) 4GB (4194304.0KB) failed Result: Total memory check failed ... ***************************************************************************************** Following is the list of fixable prerequisites selected to fix in this session ****************************************************************************************** -------------- --------------- ---------------- Check failed. Failed on nodes Reboot required? -------------- --------------- ---------------- Hard Limit: maximum open grac121 no file descriptors Execute "/tmp/CVU_12.1.0.1.0_grid/runfixup.sh" as root user on nodes "grac121" to perform the fix up operations manually --> Now run runfixup.sh" as root on nodes "grac121" Press ENTER key to continue after execution of "/tmp/CVU_12.1.0.1.0_grid/runfixup.sh" has completed on nodes "grac121" Fix: Hard Limit: maximum open file descriptors Node Name Status ------------------------------------ ------------------------ grac121 successful Result: "Hard Limit: maximum open file descriptors" was successfully fixed on all the applicable nodes Fix up operations were successfully completed on all the applicable nodes Verification of system requirement was unsuccessful on all the specified nodes. Note errrors like to low memory/swap needs manual intervention: Check: Total memory Node Name Available Required Status ------------ ------------------------ ------------------------ ---------- grac121 3.7426GB (3924412.0KB) 4GB (4194304.0KB) failed Result: Total memory check failed Fix that error at OS level and rerun the above cluvfy command
Performing post-checks for hardware and operating system setup
- cluvfy stage -post hwos test multicast communication with multicast group “230.0.1.0”
[grid@grac42 ~]$ cluvfy stage -post hwos -n grac42,grac43 -verbose Performing post-checks for hardware and operating system setup Checking node reachability... Check: Node reachability from node "grac42" Destination Node Reachable? ------------------------------------ ------------------------ grac42 yes grac43 yes Result: Node reachability check passed from node "grac42" Checking user equivalence... Check: User equivalence for user "grid" Node Name Status ------------------------------------ ------------------------ grac43 passed grac42 passed Result: User equivalence check passed for user "grid" Checking node connectivity... Checking hosts config file... Node Name Status ------------------------------------ ------------------------ grac43 passed grac42 passed Verification of the hosts config file successful Interface information for node "grac43" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ eth0 10.0.2.15 10.0.2.0 0.0.0.0 10.0.2.2 08:00:27:38:10:76 1500 eth1 192.168.1.103 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:F6:18:43 1500 eth1 192.168.1.59 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:F6:18:43 1500 eth1 192.168.1.170 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:F6:18:43 1500 eth1 192.168.1.177 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:F6:18:43 1500 eth2 192.168.2.103 192.168.2.0 0.0.0.0 10.0.2.2 08:00:27:1C:30:DD 1500 eth2 169.254.125.13 169.254.0.0 0.0.0.0 10.0.2.2 08:00:27:1C:30:DD 1500 virbr0 192.168.122.1 192.168.122.0 0.0.0.0 10.0.2.2 52:54:00:ED:19:7C 1500 Interface information for node "grac42" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ eth0 10.0.2.15 10.0.2.0 0.0.0.0 10.0.2.2 08:00:27:6C:89:27 1500 eth1 192.168.1.102 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:63:08:07 1500 eth1 192.168.1.165 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:63:08:07 1500 eth1 192.168.1.178 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:63:08:07 1500 eth1 192.168.1.167 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:63:08:07 1500 eth2 192.168.2.102 192.168.2.0 0.0.0.0 10.0.2.2 08:00:27:DF:79:B9 1500 eth2 169.254.96.101 169.254.0.0 0.0.0.0 10.0.2.2 08:00:27:DF:79:B9 1500 virbr0 192.168.122.1 192.168.122.0 0.0.0.0 10.0.2.2 52:54:00:ED:19:7C 1500 Check: Node connectivity for interface "eth1" Source Destination Connected? ------------------------------ ------------------------------ ---------------- grac43[192.168.1.103] grac43[192.168.1.59] yes grac43[192.168.1.103] grac43[192.168.1.170] yes .. grac42[192.168.1.165] grac42[192.168.1.167] yes grac42[192.168.1.178] grac42[192.168.1.167] yes Result: Node connectivity passed for interface "eth1" Check: TCP connectivity of subnet "192.168.1.0" Source Destination Connected? ------------------------------ ------------------------------ ---------------- grac42:192.168.1.102 grac43:192.168.1.103 passed grac42:192.168.1.102 grac43:192.168.1.59 passed grac42:192.168.1.102 grac43:192.168.1.170 passed grac42:192.168.1.102 grac43:192.168.1.177 passed grac42:192.168.1.102 grac42:192.168.1.165 passed grac42:192.168.1.102 grac42:192.168.1.178 passed grac42:192.168.1.102 grac42:192.168.1.167 passed Result: TCP connectivity check passed for subnet "192.168.1.0" Check: Node connectivity for interface "eth2" Source Destination Connected? ------------------------------ ------------------------------ ---------------- grac43[192.168.2.103] grac42[192.168.2.102] yes Result: Node connectivity passed for interface "eth2" Check: TCP connectivity of subnet "192.168.2.0" Source Destination Connected? ------------------------------ ------------------------------ ---------------- grac42:192.168.2.102 grac43:192.168.2.103 passed Result: TCP connectivity check passed for subnet "192.168.2.0" Checking subnet mask consistency... Subnet mask consistency check passed for subnet "192.168.1.0". Subnet mask consistency check passed for subnet "192.168.2.0". Subnet mask consistency check passed. Result: Node connectivity check passed Checking multicast communication... Checking subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0" passed. Checking subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0" passed. Check of multicast communication passed. Checking for multiple users with UID value 0 Result: Check for multiple users with UID value 0 passed Check: Time zone consistency Result: Time zone consistency check passed Checking shared storage accessibility... Disk Sharing Nodes (2 in count) ------------------------------------ ------------------------ /dev/sdb grac43 /dev/sdk grac42 .. Disk Sharing Nodes (2 in count) ------------------------------------ ------------------------ /dev/sdp grac43 grac42 Shared storage check was successful on nodes "grac43,grac42" Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ... Checking if "hosts" entry in file "/etc/nsswitch.conf" is consistent across nodes... Checking file "/etc/nsswitch.conf" to make sure that only one "hosts" entry is defined More than one "hosts" entry does not exist in any "/etc/nsswitch.conf" file All nodes have same "hosts" entry defined in file "/etc/nsswitch.conf" Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed Post-check for hardware and operating system setup was successful.
Debugging Voting disk problems with: cluvfy comp vdisk
As your CRS stack may not be up run these commands from a node which is up and running [grid@grac42 ~]$ cluvfy comp ocr -n grac41 Verifying OCR integrity Checking OCR integrity... Checking the absence of a non-clustered configuration... All nodes free of non-clustered, local-only configurations ERROR: PRVF-4194 : Asm is not running on any of the nodes. Verification cannot proceed. OCR integrity check failed Verification of OCR integrity was unsuccessful on all the specified nodes. [grid@grac42 ~]$ cluvfy comp vdisk -n grac41 Verifying Voting Disk: Checking Oracle Cluster Voting Disk configuration... ERROR: PRVF-4194 : Asm is not running on any of the nodes. Verification cannot proceed. ERROR: PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdf1" ERROR: PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdg1" ERROR: PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdh1" PRVF-5431 : Oracle Cluster Voting Disk configuration check failed UDev attributes check for Voting Disk locations started... UDev attributes check passed for Voting Disk locations Verification of Voting Disk was unsuccessful on all the specified nodes. Debugging steps at OS level Verify disk protections and use kfed to read disk header [grid@grac41 ~/cluvfy]$ ls -l /dev/asmdisk1_udev_sdf1 /dev/asmdisk1_udev_sdg1 /dev/asmdisk1_udev_sdh1 b---------. 1 grid asmadmin 8, 81 May 14 09:51 /dev/asmdisk1_udev_sdf1 b---------. 1 grid asmadmin 8, 97 May 14 09:51 /dev/asmdisk1_udev_sdg1 b---------. 1 grid asmadmin 8, 113 May 14 09:51 /dev/asmdisk1_udev_sdh1 [grid@grac41 ~/cluvfy]$ kfed read /dev/asmdisk1_udev_sdf1 KFED-00303: unable to open file '/dev/asmdisk1_udev_sdf1'
Debugging file protection problems with: cluvfy comp software
- Related BUG: 18350484 : 112042GIPSU:”CLUVFY COMP SOFTWARE” FAILED IN 112042GIPSU IN HPUX
Investigate file protection problems with cluvfy comp software Cluvfy checks file protections against ora_software_cfg.xml [grid@grac41 cvdata]$ cd /u01/app/11204/grid/cv/cvdata [grid@grac41 cvdata]$ grep gpnp ora_software_cfg.xml <File Path="bin/" Name="gpnpd.bin" Permissions="0755"/> <File Path="bin/" Name="gpnptool.bin" Permissions="0755"/> Change protections and verify wiht cluvfy [grid@grac41 cvdata]$ chmod 444 /u01/app/11204/grid/bin/gpnpd.bin [grid@grac41 cvdata]$ cluvfy comp software -verbose | grep gpnpd /u01/app/11204/grid/bin/gpnpd.bin..."Permissions" did not match reference Permissions of file "/u01/app/11204/grid/bin/gpnpd.bin" did not match the expected value. [Expected = "0755" ; Found = "0444"] Now correct problem and verify again [grid@grac41 cvdata]$ chmod 755 /u01/app/11204/grid/bin/gpnpd.bin [grid@grac41 cvdata]$ cluvfy comp software -verbose | grep gpnpd --> No errors were reported anymore
Debugging CTSSD/NTP problems with: cluvfy comp clocksync
[grid@grac41 ctssd]$ cluvfy comp clocksync -n grac41,grac42,grac43 -verbose Verifying Clock Synchronization across the cluster nodes Checking if Clusterware is installed on all nodes... Check of Clusterware install passed Checking if CTSS Resource is running on all nodes... Check: CTSS Resource running on all nodes Node Name Status ------------------------------------ ------------------------ grac43 passed grac42 passed grac41 passed Result: CTSS resource check passed Querying CTSS for time offset on all nodes... Result: Query of CTSS for time offset passed Check CTSS state started... Check: CTSS state Node Name State ------------------------------------ ------------------------ grac43 Observer grac42 Observer grac41 Observer CTSS is in Observer state. Switching over to clock synchronization checks using NTP Starting Clock synchronization checks using Network Time Protocol(NTP)... NTP Configuration file check started... The NTP configuration file "/etc/ntp.conf" is available on all nodes NTP Configuration file check passed Checking daemon liveness... Check: Liveness for "ntpd" Node Name Running? ------------------------------------ ------------------------ grac43 yes grac42 yes grac41 yes Result: Liveness check passed for "ntpd" Check for NTP daemon or service alive passed on all nodes Checking NTP daemon command line for slewing option "-x" Check: NTP daemon command line Node Name Slewing Option Set? ------------------------------------ ------------------------ grac43 yes grac42 yes grac41 yes Result: NTP daemon slewing option check passed Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x" Check: NTP daemon's boot time configuration Node Name Slewing Option Set? ------------------------------------ ------------------------ grac43 yes grac42 yes grac41 yes Result: NTP daemon's boot time configuration check for slewing option passed Checking whether NTP daemon or service is using UDP port 123 on all nodes Check for NTP daemon or service using UDP port 123 Node Name Port Open? ------------------------------------ ------------------------ grac43 yes grac42 yes grac41 yes NTP common Time Server Check started... NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running Check of common NTP Time Server passed Clock time offset check from NTP Time Server started... Checking on nodes "[grac43, grac42, grac41]"... Check: Clock time offset from NTP Time Server Time Server: .LOCL. Time Offset Limit: 1000.0 msecs Node Name Time Offset Status ------------ ------------------------ ------------------------ grac43 0.0 passed grac42 0.0 passed grac41 0.0 passed Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[grac43, grac42, grac41]". Clock time offset check passed Result: Clock synchronization check using Network Time Protocol(NTP) passed Oracle Cluster Time Synchronization Services check passed Verification of Clock Synchronization across the cluster nodes was successful. At OS level you can run ntpq -p [root@grac41 dev]# ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== *ns1.example.com LOCAL(0) 10 u 90 256 377 0.072 -238.49 205.610 LOCAL(0) .LOCL. 12 l 15h 64 0 0.000 0.000 0.000
Running cluvfy stage -post crsinst after a failed Clusterware startup
- Note you should run cluvfy from a ndoe which is up and runnung to get best results
CRS resource status [grid@grac41 ~]$ my_crs_stat_init NAME TARGET STATE SERVER STATE_DETAILS ------------------------- ---------- ---------- ------------ ------------------ ora.asm ONLINE OFFLINE Instance Shutdown ora.cluster_interconnect.haip ONLINE OFFLINE ora.crf ONLINE ONLINE grac41 ora.crsd ONLINE OFFLINE ora.cssd ONLINE OFFLINE STARTING ora.cssdmonitor ONLINE ONLINE grac41 ora.ctssd ONLINE OFFLINE ora.diskmon OFFLINE OFFLINE ora.drivers.acfs ONLINE OFFLINE ora.evmd ONLINE OFFLINE ora.gipcd ONLINE ONLINE grac41 ora.gpnpd ONLINE ONLINE grac41 ora.mdnsd ONLINE ONLINE grac41 Verify CRS status with cluvfy ( CRS on grac42 is up and running ) [grid@grac42 ~]$ cluvfy stage -post crsinst -n grac41,grac42 -verbose Performing post-checks for cluster services setup Checking node reachability... Check: Node reachability from node "grac42" Destination Node Reachable? ------------------------------------ ------------------------ grac42 yes grac41 yes Result: Node reachability check passed from node "grac42" Checking user equivalence... Check: User equivalence for user "grid" Node Name Status ------------------------------------ ------------------------ grac42 passed grac41 passed Result: User equivalence check passed for user "grid" Checking node connectivity... Checking hosts config file... Node Name Status ------------------------------------ ------------------------ grac42 passed grac41 passed Verification of the hosts config file successful Interface information for node "grac42" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ eth0 10.0.2.15 10.0.2.0 0.0.0.0 10.0.2.2 08:00:27:6C:89:27 1500 eth1 192.168.1.102 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:63:08:07 1500 eth1 192.168.1.59 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:63:08:07 1500 eth1 192.168.1.178 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:63:08:07 1500 eth1 192.168.1.170 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:63:08:07 1500 eth2 192.168.2.102 192.168.2.0 0.0.0.0 10.0.2.2 08:00:27:DF:79:B9 1500 eth2 169.254.96.101 169.254.0.0 0.0.0.0 10.0.2.2 08:00:27:DF:79:B9 1500 virbr0 192.168.122.1 192.168.122.0 0.0.0.0 10.0.2.2 52:54:00:ED:19:7C 1500 Interface information for node "grac41" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ eth0 10.0.2.15 10.0.2.0 0.0.0.0 10.0.2.2 08:00:27:82:47:3F 1500 eth1 192.168.1.101 192.168.1.0 0.0.0.0 10.0.2.2 08:00:27:89:E9:A2 1500 eth2 192.168.2.101 192.168.2.0 0.0.0.0 10.0.2.2 08:00:27:6B:E2:BD 1500 virbr0 192.168.122.1 192.168.122.0 0.0.0.0 10.0.2.2 52:54:00:ED:19:7C 1500 Check: Node connectivity for interface "eth1" Source Destination Connected? ------------------------------ ------------------------------ ---------------- grac42[192.168.1.102] grac42[192.168.1.59] yes grac42[192.168.1.102] grac42[192.168.1.178] yes grac42[192.168.1.102] grac42[192.168.1.170] yes grac42[192.168.1.102] grac41[192.168.1.101] yes grac42[192.168.1.59] grac42[192.168.1.178] yes grac42[192.168.1.59] grac42[192.168.1.170] yes grac42[192.168.1.59] grac41[192.168.1.101] yes grac42[192.168.1.178] grac42[192.168.1.170] yes grac42[192.168.1.178] grac41[192.168.1.101] yes grac42[192.168.1.170] grac41[192.168.1.101] yes Result: Node connectivity passed for interface "eth1" Check: TCP connectivity of subnet "192.168.1.0" Source Destination Connected? ------------------------------ ------------------------------ ---------------- grac42:192.168.1.102 grac42:192.168.1.59 passed grac42:192.168.1.102 grac42:192.168.1.178 passed grac42:192.168.1.102 grac42:192.168.1.170 passed grac42:192.168.1.102 grac41:192.168.1.101 passed Result: TCP connectivity check passed for subnet "192.168.1.0" Check: Node connectivity for interface "eth2" Source Destination Connected? ------------------------------ ------------------------------ ---------------- grac42[192.168.2.102] grac41[192.168.2.101] yes Result: Node connectivity passed for interface "eth2" Check: TCP connectivity of subnet "192.168.2.0" Source Destination Connected? ------------------------------ ------------------------------ ---------------- grac42:192.168.2.102 grac41:192.168.2.101 passed Result: TCP connectivity check passed for subnet "192.168.2.0" Checking subnet mask consistency... Subnet mask consistency check passed for subnet "192.168.1.0". Subnet mask consistency check passed for subnet "192.168.2.0". Subnet mask consistency check passed. Result: Node connectivity check passed Checking multicast communication... Checking subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0" passed. Checking subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0" passed. Check of multicast communication passed. Check: Time zone consistency Result: Time zone consistency check passed Checking Oracle Cluster Voting Disk configuration... ERROR: PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes. --> Expected error as lower CRS stack is not completly up and running grac41 Oracle Cluster Voting Disk configuration check passed Checking Cluster manager integrity... Checking CSS daemon... Node Name Status ------------------------------------ ------------------------ grac42 running grac41 not running ERROR: PRVF-5319 : Oracle Cluster Synchronization Services do not appear to be online. Cluster manager integrity check failed --> Expected error as lower CRS stack is not completely up and running UDev attributes check for OCR locations started... Result: UDev attributes check passed for OCR locations UDev attributes check for Voting Disk locations started... Result: UDev attributes check passed for Voting Disk locations Check default user file creation mask Node Name Available Required Comment ------------ ------------------------ ------------------------ ---------- grac42 22 0022 passed grac41 22 0022 passed Result: Default user file creation mask check passed Checking cluster integrity... Node Name ------------------------------------ grac41 grac42 grac43 Cluster integrity check failed This check did not run on the following node(s): grac41 Checking OCR integrity... Checking the absence of a non-clustered configuration... All nodes free of non-clustered, local-only configurations ERROR: PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes. grac41 --> Expected error as lower CRS stack is not completely up and running Checking OCR config file "/etc/oracle/ocr.loc"... OCR config file "/etc/oracle/ocr.loc" check successful ERROR: PRVF-4195 : Disk group for ocr location "+OCR" not available on the following nodes: grac41 --> Expected error as lower CRS stack is not completly up and running NOTE: This check does not verify the integrity of the OCR contents. Execute 'ocrcheck' as a privileged user to verify the contents of OCR. OCR integrity check failed Checking CRS integrity... Clusterware version consistency passed The Oracle Clusterware is healthy on node "grac42" ERROR: PRVF-5305 : The Oracle Clusterware is not healthy on node "grac41" CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager CRS integrity check failed --> Expected error as lower CRS stack is not completly up and running Checking node application existence... Checking existence of VIP node application (required) Node Name Required Running? Comment ------------ ------------------------ ------------------------ ---------- grac42 yes yes passed grac41 yes no exists VIP node application is offline on nodes "grac41" Checking existence of NETWORK node application (required) Node Name Required Running? Comment ------------ ------------------------ ------------------------ ---------- grac42 yes yes passed grac41 yes no failed PRVF-4570 : Failed to check existence of NETWORK node application on nodes "grac41" --> Expected error as lower CRS stack is not completly up and running Checking existence of GSD node application (optional) Node Name Required Running? Comment ------------ ------------------------ ------------------------ ---------- grac42 no no exists grac41 no no exists GSD node application is offline on nodes "grac42,grac41" Checking existence of ONS node application (optional) Node Name Required Running? Comment ------------ ------------------------ ------------------------ ---------- grac42 no yes passed grac41 no no failed PRVF-4576 : Failed to check existence of ONS node application on nodes "grac41" --> Expected error as lower CRS stack is not completly up and running Checking Single Client Access Name (SCAN)... SCAN Name Node Running? ListenerName Port Running? ---------------- ------------ ------------ ------------ ------------ ------------ grac4-scan.grid4.example.com grac43 true LISTENER_SCAN1 1521 true grac4-scan.grid4.example.com grac42 true LISTENER_SCAN2 1521 true Checking TCP connectivity to SCAN Listeners... Node ListenerName TCP connectivity? ------------ ------------------------ ------------------------ grac42 LISTENER_SCAN1 yes grac42 LISTENER_SCAN2 yes TCP connectivity to SCAN Listeners exists on all cluster nodes Checking name resolution setup for "grac4-scan.grid4.example.com"... Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ... Checking if "hosts" entry in file "/etc/nsswitch.conf" is consistent across nodes... Checking file "/etc/nsswitch.conf" to make sure that only one "hosts" entry is defined More than one "hosts" entry does not exist in any "/etc/nsswitch.conf" file All nodes have same "hosts" entry defined in file "/etc/nsswitch.conf" Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed SCAN Name IP Address Status Comment ------------ ------------------------ ------------------------ ---------- grac4-scan.grid4.example.com 192.168.1.165 passed grac4-scan.grid4.example.com 192.168.1.168 passed grac4-scan.grid4.example.com 192.168.1.170 passed Verification of SCAN VIP and Listener setup passed Checking OLR integrity... Checking OLR config file... ERROR: PRVF-4184 : OLR config file check failed on the following nodes: grac41 grac41:Group of file "/etc/oracle/olr.loc" did not match the expected value. [Expected = "oinstall" ; Found = "root"] Fix : [grid@grac41 ~]$ ls -l /etc/oracle/olr.loc -rw-r--r--. 1 root root 81 May 11 14:02 /etc/oracle/olr.loc root@grac41 Desktop]# chown root:oinstall /etc/oracle/olr.loc Checking OLR file attributes... OLR file check successful OLR integrity check failed Checking GNS integrity... Checking if the GNS subdomain name is valid... The GNS subdomain name "grid4.example.com" is a valid domain name Checking if the GNS VIP belongs to same subnet as the public network... Public network subnets "192.168.1.0" match with the GNS VIP "192.168.1.0" Checking if the GNS VIP is a valid address... GNS VIP "192.168.1.59" resolves to a valid IP address Checking the status of GNS VIP... Checking if FDQN names for domain "grid4.example.com" are reachable PRVF-5216 : The following GNS resolved IP addresses for "grac4-scan.grid4.example.com" are not reachable: "192.168.1.168" PRKN-1035 : Host "192.168.1.168" is unreachable --> GNS resolved IP addresses are reachable GNS resolved IP addresses are reachable GNS resolved IP addresses are reachable GNS resolved IP addresses are reachable Checking status of GNS resource... Node Running? Enabled? ------------ ------------------------ ------------------------ grac42 yes yes grac41 no yes GNS resource configuration check passed Checking status of GNS VIP resource... Node Running? Enabled? ------------ ------------------------ ------------------------ grac42 yes yes grac41 no yes GNS VIP resource configuration check passed. GNS integrity check passed OCR detected on ASM. Running ACFS Integrity checks... Starting check to see if ASM is running on all cluster nodes... PRVF-5110 : ASM is not running on nodes: "grac41," --> Expected error as lower CRS stack is not completly up and running Starting Disk Groups check to see if at least one Disk Group configured... Disk Group Check passed. At least one Disk Group configured Task ACFS Integrity check failed Checking to make sure user "grid" is not in "root" group Node Name Status Comment ------------ ------------------------ ------------------------ grac42 passed does not exist grac41 passed does not exist Result: User "grid" is not part of "root" group. Check passed Checking if Clusterware is installed on all nodes... Check of Clusterware install passed Checking if CTSS Resource is running on all nodes... Check: CTSS Resource running on all nodes Node Name Status ------------------------------------ ------------------------ grac42 passed grac41 failed PRVF-9671 : CTSS on node "grac41" is not in ONLINE state, when checked with command "/u01/app/11204/grid/bin/crsctl stat resource ora.ctssd -init" --> Expected error as lower CRS stack is not completly up and running Result: Check of CTSS resource passed on all nodes Querying CTSS for time offset on all nodes... Result: Query of CTSS for time offset passed Check CTSS state started... Check: CTSS state Node Name State ------------------------------------ ------------------------ grac42 Observer CTSS is in Observer state. Switching over to clock synchronization checks using NTP Starting Clock synchronization checks using Network Time Protocol(NTP)... NTP Configuration file check started... The NTP configuration file "/etc/ntp.conf" is available on all nodes NTP Configuration file check passed Checking daemon liveness... Check: Liveness for "ntpd" Node Name Running? ------------------------------------ ------------------------ grac42 yes Result: Liveness check passed for "ntpd" Check for NTP daemon or service alive passed on all nodes Checking NTP daemon command line for slewing option "-x" Check: NTP daemon command line Node Name Slewing Option Set? ------------------------------------ ------------------------ grac42 yes Result: NTP daemon slewing option check passed Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x" Check: NTP daemon's boot time configuration Node Name Slewing Option Set? ------------------------------------ ------------------------ grac42 yes Result: NTP daemon's boot time configuration check for slewing option passed Checking whether NTP daemon or service is using UDP port 123 on all nodes Check for NTP daemon or service using UDP port 123 Node Name Port Open? ------------------------------------ ------------------------ grac42 yes NTP common Time Server Check started... NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running Check of common NTP Time Server passed Clock time offset check from NTP Time Server started... Checking on nodes "[grac42]"... Check: Clock time offset from NTP Time Server Time Server: .LOCL. Time Offset Limit: 1000.0 msecs Node Name Time Offset Status ------------ ------------------------ ------------------------ grac42 0.0 passed Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[grac42]". Clock time offset check passed Result: Clock synchronization check using Network Time Protocol(NTP) passed PRVF-9652 : Cluster Time Synchronization Services check failed --> Expected error as lower CRS stack is not completly up and running Checking VIP configuration. Checking VIP Subnet configuration. Check for VIP Subnet configuration passed. Checking VIP reachability Check for VIP reachability passed. Post-check for cluster services setup was unsuccessful. Checks did not pass for the following node(s): grac41
Verify your DHCP setup ( only if using GNS )
[root@gract1 Desktop]# cluvfy comp dhcp -clustername gract -verbose Checking if any DHCP server exists on the network... PRVG-5723 : Network CRS resource is configured to use DHCP provided IP addresses Verification of DHCP Check was unsuccessful on all the specified nodes. --> If network resource is ONLINE you aren't allowed to run this command DESCRIPTION: Checks if DHCP server exists on the network and is capable of providing required number of IP addresses. This check also verifies the response time for the DHCP server. The checks are all done on the local node. For port values less than 1024 CVU needs to be run as root user. If -networks is specified and it contains a PUBLIC network then DHCP packets are sent on the public network. By default the network on which the host IP is specified is used. This check must not be done while default network CRS resource configured to use DHCP provided IP address is online. In my case even stopping nodeapps doesn't help . Only a full cluster shutdown the command seems query the DHCP server ! [root@gract1 Desktop]# cluvfy comp dhcp -clustername gract -verbose Verifying DHCP Check Checking if any DHCP server exists on the network... Checking if network CRS resource is configured and online Network CRS resource is offline or not configured. Proceeding with DHCP checks. CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600 At least one DHCP server exists on the network and is listening on port 67 Checking if DHCP server has sufficient free IP addresses for all VIPs... Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip" CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600 Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip" CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600 Sending DHCP "DISCOVER" packets for client ID "gract-scan2-vip" CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.169/255.255.255.0, lease time: 21600 Sending DHCP "REQUEST" packets for client ID "gract-scan2-vip" CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.169/255.255.255.0, lease time: 21600 Sending DHCP "DISCOVER" packets for client ID "gract-scan3-vip" CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.168/255.255.255.0, lease time: 21600 Sending DHCP "REQUEST" packets for client ID "gract-scan3-vip" CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.168/255.255.255.0, lease time: 21600 Sending DHCP "DISCOVER" packets for client ID "gract-gract1-vip" CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.174/255.255.255.0, lease time: 21600 Sending DHCP "REQUEST" packets for client ID "gract-gract1-vip" CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.174/255.255.255.0, lease time: 21600 CRS-10012: released DHCP server lease for client ID gract-scan1-vip on port 67 CRS-10012: released DHCP server lease for client ID gract-scan2-vip on port 67 CRS-10012: released DHCP server lease for client ID gract-scan3-vip on port 67 CRS-10012: released DHCP server lease for client ID gract-gract1-vip on port 67 DHCP server was able to provide sufficient number of IP addresses The DHCP server response time is within acceptable limits Verification of DHCP Check was successful. The nameserver /var/log/messages shows the following: Jan 21 14:42:53 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2 Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2 Jan 21 14:42:54 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2 Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2 Jan 21 14:42:54 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2 Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2 Jan 21 14:42:55 ns1 dhcpd: Wrote 6 leases to leases file. Jan 21 14:42:55 ns1 dhcpd: DHCPREQUEST for 192.168.1.170 (192.168.1.50) from 00:00:00:00:00:00 via eth2 Jan 21 14:42:55 ns1 dhcpd: DHCPACK on 192.168.1.170 to 00:00:00:00:00:00 via eth2 Jan 21 14:42:55 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2 Jan 21 14:42:56 ns1 dhcpd: DHCPOFFER on 192.168.1.169 to 00:00:00:00:00:00 via eth2 Jan 21 14:42:56 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Reference :
Using tcpdump
Tracing PUBLIC RAC device for DHCP requests –
- our DHCP server is running on port 67
[root@gract1 cvutrace]# tcpdump -i eth1 -vvv -s 1500 port 67 .. gract1.example.com.bootpc > 255.255.255.255.bootps: [bad udp cksum 473!] BOOTP/DHCP, Request from 00:00:00:00:00:00 (oui Ethernet), length 368, xid 0xab536e31, Flags [Broadcast] (0x8000) Client-Ethernet-Address 00:00:00:00:00:00 (oui Ethernet) sname "gract-scan1-vip" Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Discover MSZ Option 57, length 2: 8 Client-ID Option 61, length 16: "gract-scan1-vip" END Option 255, length 0 PAD Option 0, length 0, occurs 102 11:25:25.480234 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 335) ns1.example.com.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 307, xid 0xab536e31, Flags [Broadcast] (0x8000) Your-IP 192.168.5.150 Client-Ethernet-Address 00:00:00:00:00:00 (oui Ethernet) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Offer Server-ID Option 54, length 4: ns1.example.com Lease-Time Option 51, length 4: 21600 Subnet-Mask Option 1, length 4: 255.255.255.0 Default-Gateway Option 3, length 4: 192.168.5.1 Domain-Name-Server Option 6, length 4: ns1.example.com Time-Zone Option 2, length 4: -19000 IPF Option 19, length 1: N RN Option 58, length 4: 10800 RB Option 59, length 4: 18900 NTP Option 42, length 4: ns1.example.com BR Option 28, length 4: 192.168.5.255 END Option 255, length 0 11:25:25.481129 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.153 tell ns1.example.com, length 46 11:25:25.484070 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 396) gract1.example.com.bootpc > ns1.example.com.bootps: [bad udp cksum 8780!] BOOTP/DHCP, Request from 00:00:00:00:00:00 (oui Ethernet), length 368, xid 0x7f90997b, Flags [Broadcast] (0x8000) Client-IP 192.168.5.150 Your-IP 192.168.5.150 Client-Ethernet-Address 00:00:00:00:00:00 (oui Ethernet) sname "gract-scan1-vip" Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Release Server-ID Option 54, length 4: ns1.example.com Client-ID Option 61, length 16: "gract-scan1-vip" END Option 255, length 0 PAD Option 0, length 0, occurs 100
Reference :
Using route command
Assume you want to route the traffic for network 192.168.5.0 through interface eth1 serving 192.168.1.0 network Verify current routing info : [root@gract1 Desktop]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 08:00:27:29:54:EF inet addr:192.168.1.111 Bcast:192.168.1.255 Mask:255.255.255.0 [root@gract1 ~]# netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth2 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2 192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3 [root@gract1 ~]# ping 192.168.5.50 PING 192.168.5.50 (192.168.5.50) 56(84) bytes of data. From 192.168.1.111 icmp_seq=2 Destination Host Unreachable From 192.168.1.111 icmp_seq=3 Destination Host Unreachable From 192.168.1.111 icmp_seq=4 Destination Host Unreachable Add routing info : [root@gract1 ~]# ip route add 192.168.5.0/24 dev eth1 [root@gract1 ~]# netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth2 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2 192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3 192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 Verify that ping and nslookup are working [root@gract1 ~]# ping 192.168.5.50 PING 192.168.5.50 (192.168.5.50) 56(84) bytes of data. 64 bytes from 192.168.5.50: icmp_seq=1 ttl=64 time=0.929 ms 64 bytes from 192.168.5.50: icmp_seq=2 ttl=64 time=0.264 ms --- 192.168.5.50 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1004ms rtt min/avg/max/mdev = 0.264/0.596/0.929/0.333 ms [root@gract1 ~]# nslookup ns1 Server: 192.168.1.50 Address: 192.168.1.50#53 Name: ns1.example.com Address: 192.168.5.50 To delete the above created route run : [root@gract1 ~]# ip route del 192.168.5.0/24 dev eth1