亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

Chinaunix

標(biāo)題: linux集群里若有一臺(tái)機(jī)器斷網(wǎng)則兩臺(tái)機(jī)器都掉電的問題 [打印本頁(yè)]

作者: Ling_wwl    時(shí)間: 2010-08-12 18:03
標(biāo)題: linux集群里若有一臺(tái)機(jī)器斷網(wǎng)則兩臺(tái)機(jī)器都掉電的問題
本帖最后由 Ling_wwl 于 2010-08-12 19:36 編輯

大家好,我直接描述現(xiàn)象:

環(huán)境:兩臺(tái)HP DL 380 G6,redhat 5.3,oracle 10.2.0.4,redhat 5.3自帶的cluster。
      主機(jī):dz-oracle1,134.36.139.221
         備機(jī):dz-oracle2,134.36.139.222
         集群ip:134.36.139.220

其他:由于雙網(wǎng)卡綁定與cluster有沖突,故cluster的啟動(dòng)在 /ect/rc.d/rc.local 中,(即在此文件中添加service cman start和service rgmanager start)。

網(wǎng)絡(luò)連接方式:兩臺(tái)機(jī)的網(wǎng)卡eth1和eth2都綁定在bond0;主機(jī)的ilo接在備機(jī)的eth3;備機(jī)的ilo接在主機(jī)的eth3。

成果:目前兩臺(tái)機(jī)器之間是可以識(shí)別的,fence_ilo命令是通的,集群是可以起來(lái)的。當(dāng)主機(jī)的oracle進(jìn)程有問題時(shí),可以成功切換到備機(jī)。但……(如下)

故障現(xiàn)象:集群做斷網(wǎng)測(cè)試時(shí)不成功。無(wú)論是主機(jī)還是備機(jī),只要有一臺(tái)機(jī)器斷開網(wǎng)絡(luò)(如:把主機(jī)的eth1和eth2的網(wǎng)線撥掉),則兩臺(tái)機(jī)器同時(shí)掉電。
          /var/log/messages 里顯示“gnome-power-manager: (root) GNOME 交互式注銷,原因是 按下了電源按鈕”。

斷網(wǎng)測(cè)試的日志:現(xiàn)主機(jī)正在管理集群,我把主機(jī)的eth1和eth2的網(wǎng)線都撥掉,則出現(xiàn)同時(shí)掉電,其中 /var/log/messages 里的日志如下。

做了很多測(cè)試方法,也沒有把這個(gè)問題解決,各位有啥說(shuō)啥,小弟在此恭候,希望能把此問題解決,謝謝!



主機(jī)的 /var/log/messages 記錄如下:

  1. Aug 12 16:20:52 dz-oracle1 scim-bridge: The lockfile is destroied
  2. Aug 12 16:20:52 dz-oracle1 scim-bridge: Cleanup, done. Exitting...
  3. Aug 12 16:21:44 dz-oracle1 kernel: bnx2: eth1 NIC Copper Link is Down
  4. Aug 12 16:21:44 dz-oracle1 kernel: bonding: bond0: link status definitely down for interface eth1, disabling it
  5. Aug 12 16:21:44 dz-oracle1 kernel: bonding: bond0: making interface eth2 the new active one.
  6. Aug 12 16:21:48 dz-oracle1 kernel: bnx2: eth2 NIC Copper Link is Down
  7. Aug 12 16:21:48 dz-oracle1 kernel: bonding: bond0: link status definitely down for interface eth2, disabling it
  8. Aug 12 16:21:48 dz-oracle1 kernel: bonding: bond0: now running without any active interface !
  9. Aug 12 16:21:57 dz-oracle1 openais[6494]: [TOTEM] The token was lost in the OPERATIONAL state.
  10. Aug 12 16:21:57 dz-oracle1 openais[6494]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
  11. Aug 12 16:21:57 dz-oracle1 openais[6494]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
  12. Aug 12 16:21:57 dz-oracle1 openais[6494]: [TOTEM] entering GATHER state from 2.
  13. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] entering GATHER state from 0.
  14. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Creating commit token because I am the rep.
  15. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Saving state aru 43 high seq received 43
  16. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Storing new sequence id for ring 430
  17. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] entering COMMIT state.
  18. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] entering RECOVERY state.
  19. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] position [0] member 134.36.139.221:
  20. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] previous ring seq 1068 rep 134.36.139.221
  21. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] aru 43 high delivered 43 received flag 1
  22. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Did not need to originate any messages in recovery.
  23. Aug 12 16:22:02 dz-oracle1 openais[6494]: [TOTEM] Sending initial ORF token
  24. Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM  ] CLM CONFIGURATION CHANGE
  25. Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM  ] New Configuration:
  26. Aug 12 16:22:02 dz-oracle1 fenced[6516]: dz-oracle2 not a cluster member after 0 sec post_fail_delay
  27. Aug 12 16:22:02 dz-oracle1 kernel: dlm: closing connection to node 2
  28. Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM  ]         r(0) ip(134.36.139.221)  
  29. Aug 12 16:22:02 dz-oracle1 fenced[6516]: fencing node "dz-oracle2"
  30. Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM  ] Members Left:
  31. Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM  ]         r(0) ip(134.36.139.222)  
  32. Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM  ] Members Joined:
  33. Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM  ] CLM CONFIGURATION CHANGE
  34. Aug 12 16:22:02 dz-oracle1 openais[6494]: [CLM  ] New Configuration:
  35. Aug 12 16:22:03 dz-oracle1 openais[6494]: [CLM  ]         r(0) ip(134.36.139.221)  
  36. Aug 12 16:22:03 dz-oracle1 openais[6494]: [CLM  ] Members Left:
  37. Aug 12 16:22:03 dz-oracle1 openais[6494]: [CLM  ] Members Joined:
  38. Aug 12 16:22:03 dz-oracle1 openais[6494]: [SYNC ] This node is within the primary component and will provide service.
  39. Aug 12 16:22:03 dz-oracle1 openais[6494]: [TOTEM] entering OPERATIONAL state.
  40. Aug 12 16:22:03 dz-oracle1 openais[6494]: [CLM  ] got nodejoin message 134.36.139.221
  41. Aug 12 16:22:03 dz-oracle1 openais[6494]: [CPG  ] got joinlist message from node 1
  42. Aug 12 16:22:07 dz-oracle1 gnome-power-manager: (root) GNOME 交互式注銷,原因是 按下了電源按鈕
復(fù)制代碼
備機(jī)的 /var/log/messages 記錄如下:
  1. Aug 12 16:20:38 dz-oracle2 scim-bridge: The lockfile is destroied
  2. Aug 12 16:20:38 dz-oracle2 scim-bridge: Cleanup, done. Exitting...
  3. Aug 12 16:21:19 dz-oracle2 openais[6496]: [TOTEM] The token was lost in the OPERATIONAL state.
  4. Aug 12 16:21:19 dz-oracle2 openais[6496]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
  5. Aug 12 16:21:19 dz-oracle2 openais[6496]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
  6. Aug 12 16:21:19 dz-oracle2 openais[6496]: [TOTEM] entering GATHER state from 2.
  7. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] entering GATHER state from 0.
  8. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Creating commit token because I am the rep.
  9. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Saving state aru 43 high seq received 43
  10. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Storing new sequence id for ring 430
  11. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] entering COMMIT state.
  12. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] entering RECOVERY state.
  13. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] position [0] member 134.36.139.222:
  14. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] previous ring seq 1068 rep 134.36.139.221
  15. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] aru 43 high delivered 43 received flag 1
  16. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Did not need to originate any messages in recovery.
  17. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] Sending initial ORF token
  18. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] CLM CONFIGURATION CHANGE
  19. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] New Configuration:
  20. Aug 12 16:21:24 dz-oracle2 kernel: dlm: closing connection to node 1
  21. Aug 12 16:21:24 dz-oracle2 fenced[6518]: dz-oracle1 not a cluster member after 0 sec post_fail_delay
  22. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ]         r(0) ip(134.36.139.222)  
  23. Aug 12 16:21:24 dz-oracle2 fenced[6518]: fencing node "dz-oracle1"
  24. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] Members Left:
  25. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ]         r(0) ip(134.36.139.221)  
  26. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] Members Joined:
  27. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] CLM CONFIGURATION CHANGE
  28. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] New Configuration:
  29. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ]         r(0) ip(134.36.139.222)  
  30. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] Members Left:
  31. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] Members Joined:
  32. Aug 12 16:21:24 dz-oracle2 openais[6496]: [SYNC ] This node is within the primary component and will provide service.
  33. Aug 12 16:21:24 dz-oracle2 openais[6496]: [TOTEM] entering OPERATIONAL state.
  34. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CLM  ] got nodejoin message 134.36.139.222
  35. Aug 12 16:21:24 dz-oracle2 openais[6496]: [CPG  ] got joinlist message from node 2
  36. Aug 12 16:21:29 dz-oracle2 gnome-power-manager: (root) GNOME 交互式注銷,原因是 按下了電源按鈕
復(fù)制代碼

作者: Ling_wwl    時(shí)間: 2010-08-12 18:10
Aug 12 16:21:29 dz-oracle2 gnome-power-manager: (root) GNOME 交互式注銷,原因是 按下了電源按鈕

現(xiàn)在就是啥原因引起按下了電源鍵?如何解決?我并沒有去按的,而且我觀察過,兩臺(tái)機(jī)器是同時(shí)掉電的。
作者: a_carlus    時(shí)間: 2010-08-13 09:44
好像這是rgmanager的bug沒什么好的解決辦法
作者: Ling_wwl    時(shí)間: 2010-08-16 20:44
各位大哥,有沒有什么解決方法呀。如果不知具體原因的,也可以幫我分析分析,看看是哪里的問題。很快要上線了,擔(dān)心會(huì)影響驗(yàn)收!
作者: fanjiefa    時(shí)間: 2010-08-17 10:22
HP ilo的ip配置里嗎?
作者: fanjiefa    時(shí)間: 2010-08-17 10:35
應(yīng)該是配置錯(cuò)誤,我以前配置過IBM3650,也是雙網(wǎng)卡綁定,沒問題的。
作者: Ling_wwl    時(shí)間: 2010-08-18 21:11
回復(fù) 5# fanjiefa


    已經(jīng)配置好了,兩臺(tái)機(jī)器都是menber了,fence_ilo命令都能通到對(duì)方。
    我覺得可能跟這個(gè)有關(guān):主機(jī)的ilo接在eth3上,備機(jī)的反之。但不知道為什么,主機(jī)的ilo接備機(jī)的ilo時(shí),兩臺(tái)機(jī)器之間識(shí)別不了(fence_ilo通不到對(duì)方),所以才用這種方法的。
作者: yjs_sh    時(shí)間: 2010-08-20 14:15
這個(gè)是正常的。你的這種網(wǎng)絡(luò)環(huán)境,需要配置qdisk才能實(shí)現(xiàn)拔網(wǎng)線切換。如果不配qdisk,ilo就不要和網(wǎng)卡直連,接到交換機(jī)上和bond0在一個(gè)網(wǎng)段。ilo是獨(dú)立于系統(tǒng)之外的設(shè)備,基本上只是被動(dòng)地接受命令,你直接把ilo連起來(lái)沒用的。
作者: Ling_wwl    時(shí)間: 2010-08-24 17:13
回復(fù) 8# yjs_sh


   
嗯嗯,看來(lái)就是所謂的“腦裂”,但也有一點(diǎn)奇怪的現(xiàn)象,就是:主機(jī)掉電,備機(jī)能接管;主機(jī)的oracle資源異常,備機(jī)也能接管。就是主機(jī)斷網(wǎng)了,備機(jī)接管不了,出現(xiàn)掉電現(xiàn)象。
這樣的話,用“腦裂”來(lái)解釋好像解釋不通!

這里用的是三層交換機(jī),心跳的TTL=1,無(wú)法進(jìn)行心跳!
作者: yjs_sh    時(shí)間: 2010-08-25 20:44
主機(jī)掉電能接管嗎?ilo作為fence設(shè)備,如果主機(jī)掉電,ilo就無(wú)法工作。rhcs的另外一個(gè)節(jié)點(diǎn)必須通過fence將出問題的節(jié)點(diǎn)重新啟動(dòng)后才能將資源接管。主機(jī)掉電,無(wú)法fence成功,應(yīng)該是無(wú)法接管的,這個(gè)我在實(shí)際運(yùn)行環(huán)境中試過多次了。log中會(huì)不停報(bào)"fence fail"
沒有qdisk,網(wǎng)絡(luò)斷掉,其實(shí)兩邊都認(rèn)為對(duì)方出現(xiàn)問題,互相fence。此時(shí)如果兩邊的fence都能通訊并正常工作的話,就會(huì)出現(xiàn)2節(jié)點(diǎn)同時(shí)關(guān)機(jī)了。fence的過程是先poweroff,然后poweron。
你的這個(gè)環(huán)境中如果將fence和心跳都放在同一個(gè)網(wǎng)絡(luò)里就不會(huì)出現(xiàn)這樣的問題,因?yàn)閿嗑W(wǎng)的那一方fence指令發(fā)不出去了,因此斷網(wǎng)的這邊會(huì)被好的那邊f(xié)ence掉重新啟動(dòng),資源切換。
bond和rhcs沒有任何沖突的,redhat也是推薦bond的
作者: Ling_wwl    時(shí)間: 2010-09-06 23:05
在G6 + RedHat 5.3下,bond和rhcs是有沖突的,配置了bond,則集群中,兩臺(tái)機(jī)器是不斷互相重啟的。
作者: duolanshizhe    時(shí)間: 2010-09-09 09:34
好像在rhel5.4中   如果主機(jī)a 網(wǎng)絡(luò)掉

雖然配置了qdiskd  但是雙機(jī)會(huì)出現(xiàn)故障




歡迎光臨 Chinaunix (http://72891.cn/) Powered by Discuz! X3.2