8/28/11

"root.sh" failed on second during the installation of 11.2.0.2 Grid InfraStructure

While installing 11.2.0.2 Grid Infrastructure root.sh failed on second node but successful on first node
on node2, root.sh failed with following errors:

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node prittoprfdb1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/crs/11.2.0.2/crs/install/crsconfig_lib.pm line 1016.
/u01/app/crs/11.2.0.2/perl/bin/perl -I/u01/app/crs/11.2.0.2/perl/lib -I/u01/app/crs/11.2.0.2/crs/install /u01/app/crs/11.2.0.2/crs/install/rootcrs.pl execution failed



May be there are number of different reasons for failing of root.sh on second node, but in my case it failed due to a "MULTICASTING BUG"


What is Multicasting? and Why we need it?


Oracle introduces a new feature called "Redundant Interconnect Usage" which provides the redundancy for interconnect without using any external NIC bondings. Oracle internally provides the redundancy if you specify two private interfaces at the tie of installation of 11.2.0.2 Grid Infrastructure.


To use this new feature the Multicasting should be enabled for private inerfaces even switches that used.


multicast based communication on the private interconnect is utilized to establish communication with peers in the cluster on each startup of the stack on a node. Once the connection with the peers in the cluster has been established, the communication is switched back to unicast


In CSSD log you will find the following error:


2010-09-16 23:13:14.862: [GIPCHGEN][1107937600] gipchaNodeCreate: adding new node 0x2aaab408d4a0 { host 'node1', haName 'CSS_ttoprf10cluster', srcLuid 54d7bb0e-ef4a0c7e, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 9563084, flags 0x0 }


2010-09-16 23:13:15.839: [ CSSD][1087465792]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 180134562, wrtcnt, 8627, LATS 9564064, lastSeqNo 8624, uniqueness 1284701023, timestamp 1284703995/10564774




Solution:


apply a Patch: 9974223 on both nodes and run the root.sh again on node2.













2 comments:

  1. Hi,

    I am facing the same multicast IP issue.. but I am using 11.2.0.4 not 11.2.0.2.

    Do i still need to apply this patch

    ReplyDelete
  2. Hi Rakesh,

    Any update on my question.

    ReplyDelete