- 論壇徽章:
- 0
|
本帖最后由 Ling_wwl 于 2010-08-12 19:36 編輯
大家好,我直接描述現(xiàn)象:
環(huán)境:兩臺HP DL 380 G6,redhat 5.3,oracle 10.2.0.4,redhat 5.3自帶的cluster。
主機:dz-oracle1,134.36.139.221
備機:dz-oracle2,134.36.139.222
集群ip:134.36.139.220
其他:由于雙網(wǎng)卡綁定與cluster有沖突,故cluster的啟動在 /ect/rc.d/rc.local 中,(即在此文件中添加service cman start和service rgmanager start)。
網(wǎng)絡(luò)連接方式:兩臺機的網(wǎng)卡eth1和eth2都綁定在bond0;主機的ilo接在備機的eth3;備機的ilo接在主機的eth3。
成果:目前兩臺機器之間是可以識別的,fence_ilo命令是通的,集群是可以起來的。當(dāng)主機的oracle進程有問題時,可以成功切換到備機。但……(如下)
故障現(xiàn)象:集群做斷網(wǎng)測試時不成功。無論是主機還是備機,只要有一臺機器斷開網(wǎng)絡(luò)(如:把主機的eth1和eth2的網(wǎng)線撥掉),則兩臺機器同時掉電。
/var/log/messages 里顯示“gnome-power-manager: (root) GNOME 交互式注銷,原因是 按下了電源按鈕”。
斷網(wǎng)測試的日志:現(xiàn)主機正在管理集群,我把主機的eth1和eth2的網(wǎng)線都撥掉,則出現(xiàn)同時掉電,其中 /var/log/messages 里的日志如下。
做了很多測試方法,也沒有把這個問題解決,各位有啥說啥,小弟在此恭候,希望能把此問題解決,謝謝!
主機的 /var/log/messages 記錄如下:
- Aug 12 16:20:52 dz-oracle1 scim-bridge: The lockfile is destroied
- Aug 12 16:20:52 dz-oracle1 scim-bridge: Cleanup, done. Exitting...
- Aug 12 16:21:44 dz-oracle1 kernel: bnx2: eth1 NIC Copper Link is Down
- Aug 12 16:21:44 dz-oracle1 kernel: bonding: bond0: link status definitely down for interface eth1, disabling it
- Aug 12 16:21:44 dz-oracle1 kernel: bonding: bond0: making interface eth2 the new active one.
- Aug 12 16:21:48 dz-oracle1 kernel: bnx2: eth2 NIC Copper Link is Down
- Aug 12 16:21:48 dz-oracle1 kernel: bonding: bond0: link status definitely down for interface eth2, disabling it
- Aug 12 16:21:48 dz-oracle1 kernel: bonding: bond0: now running without any active interface !
- Aug 12 16:21:57 dz-oracle1 openais[6494]: [TOTEM] The token was lost in the OPERATIONAL state.
- Aug 12 16:21:57 dz-oracle1 openais[6494]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
- Aug 12 16:21:57 dz-oracle1 openais[6494]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
- Aug 12 16:21:57 dz-oracle1 openais[6494]: [TOTEM] entering GATHER state from 2.
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] entering GATHER state from 0.
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Creating commit token because I am the rep.
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Saving state aru 43 high seq received 43
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Storing new sequence id for ring 430
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] entering COMMIT state.
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] entering RECOVERY state.
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] position [0] member 134.36.139.221:
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] previous ring seq 1068 rep 134.36.139.221
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] aru 43 high delivered 43 received flag 1
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Did not need to originate any messages in recovery.
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Sending initial ORF token
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM ] CLM CONFIGURATION CHANGE
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM ] New Configuration:
- Aug 12 16:22:02 dz-oracle1 fenced[6516]: dz-oracle2 not a cluster member after 0 sec post_fail_delay
- Aug 12 16:22:02 dz-oracle1 kernel: dlm: closing connection to node 2
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM ] r(0) ip(134.36.139.221)
- Aug 12 16:22:02 dz-oracle1 fenced[6516]: fencing node "dz-oracle2"
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM ] Members Left:
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM ] r(0) ip(134.36.139.222)
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM ] Members Joined:
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM ] CLM CONFIGURATION CHANGE
- Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM ] New Configuration:
- Aug 12 16:22:03 dz-oracle1 openais[6494]: [CLM ] r(0) ip(134.36.139.221)
- Aug 12 16:22:03 dz-oracle1 openais[6494]: [CLM ] Members Left:
- Aug 12 16:22:03 dz-oracle1 openais[6494]: [CLM ] Members Joined:
- Aug 12 16:22:03 dz-oracle1 openais[6494]: [SYNC ] This node is within the primary component and will provide service.
- Aug 12 16:22:03 dz-oracle1 openais[6494]: [TOTEM] entering OPERATIONAL state.
- Aug 12 16:22:03 dz-oracle1 openais[6494]: [CLM ] got nodejoin message 134.36.139.221
- Aug 12 16:22:03 dz-oracle1 openais[6494]: [CPG ] got joinlist message from node 1
- Aug 12 16:22:07 dz-oracle1 gnome-power-manager: (root) GNOME 交互式注銷,原因是 按下了電源按鈕
復(fù)制代碼 備機的 /var/log/messages 記錄如下:- Aug 12 16:20:38 dz-oracle2 scim-bridge: The lockfile is destroied
- Aug 12 16:20:38 dz-oracle2 scim-bridge: Cleanup, done. Exitting...
- Aug 12 16:21:19 dz-oracle2 openais[6496]: [TOTEM] The token was lost in the OPERATIONAL state.
- Aug 12 16:21:19 dz-oracle2 openais[6496]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
- Aug 12 16:21:19 dz-oracle2 openais[6496]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
- Aug 12 16:21:19 dz-oracle2 openais[6496]: [TOTEM] entering GATHER state from 2.
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] entering GATHER state from 0.
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Creating commit token because I am the rep.
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Saving state aru 43 high seq received 43
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Storing new sequence id for ring 430
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] entering COMMIT state.
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] entering RECOVERY state.
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] position [0] member 134.36.139.222:
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] previous ring seq 1068 rep 134.36.139.221
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] aru 43 high delivered 43 received flag 1
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Did not need to originate any messages in recovery.
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Sending initial ORF token
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] CLM CONFIGURATION CHANGE
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] New Configuration:
- Aug 12 16:21:24 dz-oracle2 kernel: dlm: closing connection to node 1
- Aug 12 16:21:24 dz-oracle2 fenced[6518]: dz-oracle1 not a cluster member after 0 sec post_fail_delay
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] r(0) ip(134.36.139.222)
- Aug 12 16:21:24 dz-oracle2 fenced[6518]: fencing node "dz-oracle1"
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] Members Left:
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] r(0) ip(134.36.139.221)
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] Members Joined:
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] CLM CONFIGURATION CHANGE
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] New Configuration:
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] r(0) ip(134.36.139.222)
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] Members Left:
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] Members Joined:
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [SYNC ] This node is within the primary component and will provide service.
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] entering OPERATIONAL state.
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM ] got nodejoin message 134.36.139.222
- Aug 12 16:21:24 dz-oracle2 openais[6496]: [CPG ] got joinlist message from node 2
- Aug 12 16:21:29 dz-oracle2 gnome-power-manager: (root) GNOME 交互式注銷,原因是 按下了電源按鈕
復(fù)制代碼 |
|