HAIP异常导致RAC节点无法启动的解决方案

举报
资源描述
HAIP异常,导致 RAC节点无法启动的解决方案一个网友咨询一个问题,他的 11.2.0.2 RAC(for Aix),没有安装任何 patch 或 PSU。其中一个节点重启之后无法正常启动,查看 ocssd 日志如下:2014-08-09 14:21:46.094: [ CSSD][5414]clssnmSendingThread: sent 4 join msgs to all nodes2014-08-09 14:21:46.421: [ CSSD][4900]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0s2014-08-09 14:21:47.042: [ CSSD][4129]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958157, LATS 1518247992, lastSeqNo 255958154, uniqueness 1406064021, timestamp 1407565306/15017580722014-08-09 14:21:47.051: [ CSSD][3358]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958158, LATS 1518248002, lastSeqNo 255958155, uniqueness 1406064021, timestamp 1407565306/15017581902014-08-09 14:21:47.421: [ CSSD][4900]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 02014-08-09 14:21:48.042: [ CSSD][4129]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958160, LATS 1518248993, lastSeqNo 255958157, uniqueness 1406064021, timestamp 1407565307/15017590802014-08-09 14:21:48.052: [ CSSD][3358]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958161, LATS 1518249002, lastSeqNo 255958158, uniqueness 1406064021, timestamp 1407565307/15017591912014-08-09 14:21:48.421: [ CSSD][4900]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 02014-08-09 14:21:49.043: [ CSSD][4129]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958163, LATS 1518249993, lastSeqNo 255958160, uniqueness 1406064021, timestamp 1407565308/15017600822014-08-09 14:21:49.056: [ CSSD][3358]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958164, LATS 1518250007, lastSeqNo 255958161, uniqueness 1406064021, timestamp 1407565308/15017601932014-08-09 14:21:49.421: [ CSSD][4900]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 02014-08-09 14:21:50.044: [ CSSD][4129]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958166, LATS 1518250994, lastSeqNo 255958163, uniqueness 1406064021, timestamp 1407565309/15017610902014-08-09 14:21:50.057: [ CSSD][3358]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958167, LATS 1518251007, lastSeqNo 255958164, uniqueness 1406064021, timestamp 1407565309/15017611952014-08-09 14:21:50.421: [ CSSD][4900]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 02014-08-09 14:21:51.046: [ CSSD][4129]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958169, LATS 1518251996, lastSeqNo 255958166, uniqueness 1406064021, timestamp 1407565310/15017621002014-08-09 14:21:51.057: [ CSSD][3358]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958170, LATS 1518252008, lastSeqNo 255958167, uniqueness 1406064021, timestamp 1407565310/15017622052014-08-09 14:21:51.102: [ CSSD][5414]clssnmSendingThread: sending join msg to all nodes2014-08-09 14:21:51.102: [ CSSD][5414]clssnmSendingThread: sent 5 join msgs to all nodes2014-08-09 14:21:51.421: [ CSSD][4900]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 02014-08-09 14:21:52.050: [ CSSD][4129]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958172, LATS 1518253000, lastSeqNo 255958169, uniqueness 1406064021, timestamp 1407565311/15017631102014-08-09 14:21:52.058: [ CSSD][3358]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 217016033, wrtcnt, 255958173, LATS 1518253008, lastSeqNo 255958170, uniqueness 1406064021, timestamp 1407565311/15017632302014-08-09 14:21:52.089: [ CSSD][5671]clssnmRcfgMgrThread: Local Join2014-08-09 14:21:52.089: [ CSSD][5671]clssnmLocalJoinEvent: begin on node(2), waittime 1930002014-08-09 14:21:52.089: [ CSSD][5671]clssnmLocalJoinEvent: set curtime (1518253039) for my node2014-08-09 14:21:52.089: [ CSSD][5671]clssnmLocalJoinEvent: scanning 32 nodes2014-08-09 14:21:52.089: [ CSSD][5671]clssnmLocalJoinEvent: Node rac01, number 1, is in an existing cluster with disk state 32014-08-09 14:21:52.090: [ CSSD][5671]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk2014-08-09 14:21:52.431: [ CSSD][4900]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0从上面的信息,很容易给人感觉是心跳的问题。这么理解也不错,只是这里的心跳不是指的我们说理解的传统的心跳网络。我让他在 crs 正常的一个节点查询如下信息,我们就知道原因了,如下:SQL> select name,ip_address from v$cluster_interconnects;NAME IP_ADDRESS--------------- ----------------en0 169.254.116.242大家可以看到,这里心跳 IP 为什么是 169 网段呢
展开阅读全文
温馨提示:
金锄头文库所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
相关资源
正为您匹配相似的精品文档
相关搜索

当前位置:首页 > 行业资料 > 其它行业文档


电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号