平臺論壇博客文庫

› 論壇 › 操作系統(tǒng) › 其他UNIX › SCO文檔中心 › ReliantHA經(jīng)常無故重新啟動的問題！

ReliantHA經(jīng)常無故重新啟動的問題！ [復(fù)制鏈接]

answer

榮譽(yù)版主

論壇徽章:: 1

電梯直達(dá)

1樓 [收藏(0)] [報(bào)告]

發(fā)表于 2007-10-09 15:32 |只看該作者 |倒序?yàn)g覽

原貼地址：
http://72891.cn/viewthread.php?tid=546470&highlight=ReliantHA%BE%AD%B3%A3%CE%DE%B9%CA%D6%D8%D0%C2%C6%F4%B6%AF%B5%C4%CE%CA%CC%E2

ReliantHA經(jīng)常無故重新啟動的問題！
I get an error, "GAB: Port h halting system" when using UnixWare 7 ReliantHa.

Problem
I have installed ReliantHA and when I run "hvstart" after a few seconds one or more servers shutdown displaying the message:
"GAB: Port h halting system".
and/or:
"System has halted and may be powered off (Press any key to reboot)."
This is a generic ReliantHA error message indicating that a ReliantHA node has been shutdown for some reason, often due to a communications failure of some kind.
Solution
解決方案：
Use the following tools to help diagnose the problem after first re-booting the servers in the cluster.
先將集群內(nèi)服務(wù)器重啟，再使用下列工具診斷問題，
1. Disconnect the public network and ping SYSA and ping SYSB. NOTE: These are the private network names that ReliantHA uses and are case sensitive.
１．?dāng)嚅_公網(wǎng)，ping SYSA 和ping SYSB. 注意：這些是ReliantHA使用的內(nèi)網(wǎng)名，大小寫敏感。
2. Make sure when ReliantHA was configured with "mkcluster" that the external uname (or public name) was used for the name of the nodes and NOT SYSA or SYSB.
２．請確認(rèn)當(dāng)ReliantHA配置為"mkcluster"，使用結(jié)點(diǎn)的外部名（或公共名）而非SYSA或SYSB.
２．請確認(rèn)在配置ReliantHA時(shí)"mkcluster"命令中使用的是結(jié)點(diǎn)外部名（或公共名）而非SYSA或SYSB.
3. Check the Release Notes of ReliantHA to look at the S99gab script's timeout values.
３．檢查ReliantHA的版本說明查看S99gab腳本的超時(shí)值。
            These release notes are located at:
            這些版本說明在：
            http://www.sco.com/products/clustering/notes/harelnot.html
4. Check the output from /usr/opt/reliant/log for any errors.
４．在/usr/opt/reliant/log 中差錯
            This is a directory, most useful is the switchlog file.
            這是一個目錄，最有用的是switchlog文件。
            NOTE: It is normal to see errors such as:
            看到下列錯誤是正常的：
            dynamic linker: commds: warning: copy relocation size mismatch
            for symbol svc_fdset
            動態(tài)鏈結(jié)：　命令：　警告：svc_fdset符號　拷貝位置大小不匹配
5. If using Compaq Network Interface Cards (NIC) Netflex3 series, consider using the OU8 eeE8 (DDI

driver rather than Compaq's own "n100c" driver. This is because these cards are rebadged Intel Pro100B cards.
５．如果使用Compaq Network Interface Cards (NIC) Netflex3系列，用OU8 eeE8 (DDI

驅(qū)動而非康柏自己的N100C驅(qū)動。因?yàn)檫@些卡是Intel Pro100B型的卡。
            The latest "nd" package is available from:
            最新的”nd”包在：
            ftp://ftp.sco.com/pub/unixware7/drivers/storage
            ftp://ftp.sco.com/pub/openunix8/drivers/storage
            ftp://ftp.sco.com/pub/unixware7/713/
            If the Compaq Insight Manager agents are installed for NIC　monitoring then this would need to be removed.
            如果NIC已安裝康柏識別管理器（Compaq Insight Manager agents）其“管理”應(yīng)該被禁。
            Basically, ensure that the NIC can support a programmable MAC　address and that cross-over cables are used to directly connect                            the nodes on the Private LAN.
      一般地，保證NIC支持可編程MAC地址并且使用交叉線直接連接局域網(wǎng)的結(jié)點(diǎn)。
6. Check the latest patches are installed for the operating system available from:
６．檢查操作系統(tǒng)最新版本：
            ftp://ftp.sco.com/pub/;
7. Check the output of "mswconfig -l", "llstat -a" and "/etc/mswtab" for any errors.
７．有差錯否：mswconfig –l
      　　llstat –a
      　　/etc/mswtab
8. If no specific config files are defined then hvstart will use a simple default set of scripts for basic testing between the nodes.
８．如果未制定配置文件，hvstart將使用簡單默認(rèn)腳本集來進(jìn)行結(jié)點(diǎn)間測試
9. Running "ipcs -a" should allocate a message queue once "hvstart" has run. You can also see the status of ReliantHA with "hvdisp -a".
９．運(yùn)行ipcs –a將在hvstart運(yùn)行時(shí)分配一個信息隊(duì)列。你也可以通過hvdisp –a查看reliantHA的狀態(tài)。
10. Use the "truss" command to examine the output of the "hvstart" command to get an indication of when the failure occurs:
１０.使用truss命令檢查hvstart命令的輸出，獲悉故障何時(shí)發(fā)生的：
            truss -f -o /hvstart.truss hvstart
11. If the system is swapping excessively then this could cause enough latency at the heartbeat communication layer for a heartbeat to be missed and so a node be killed with a gab halt. Use the standard system tools "sar" and "rtpm" to monitor for swapping behaviour.
１１．如果系統(tǒng)過度交換，將造成心跳（heartbeat）流通層的延遲，引起一個心跳被錯過，一個結(jié)點(diǎn)被誤“殺”。請使用標(biāo)準(zhǔn)系統(tǒng)工具"sar" and "rtpm"管理交換行為。
            In addition:另外：
Check /etc/conf/cf.d/stune for tuning that may conflict with the
            shared message queues that ReliantHA needs to operate such as:
檢查/etc/conf/cf.d/stune以調(diào)整與（reliantHA要對其操作的）共享信息隊(duì)列的沖突，例如：
               MSGSEG
               STRTHRESH
            Both of these values should be set to the default operating
            system values even if database vendors such as Oracle say that
            these values need to be set.
            上兩個值應(yīng)該被設(shè)為默認(rèn)操作系統(tǒng)值，即使數(shù)據(jù)庫發(fā)行商如ORACLE說這些值該被設(shè)定
NOTE: MSGSSZ, MSGMNB and MSGTQL should be tuned from their default values to at least 524288, 65536 and 1000 respectively (add any further application related tuning to these values).
NOTE: The minimum requirement for ReliantHA is 2 private LAN connections.
注意：MSGSSZ, MSGMNB， MSGTQL應(yīng)該分別被設(shè)為其默認(rèn)值，即至少524288, 65536，1000�。ㄟ€可對這些值進(jìn)行應(yīng)用程序相關(guān)的調(diào)整――如加一些值）
NOTE: Instead of a "real" NIC you could also use a (null modem) serial cable as the second interface.
注意：除了用“真實(shí)”NIC，你也可用（空MODEM）串行線作為第二接口。
               For Unisys: CBL6099-10M Null Modem Cable
            　對UNISYS：CBL6099-10M Null空MODEM線
               For Compaq/HP: BC29Q-02M Null Modem Cable
            　對COMPAQ/HP: BC29Q-02M Null Modem Cable
NOTE: In general, note that should a node fail if shared memory or disk buffering is used then this data will be lost when the second node takes over. This is important for databases that use this technology. Ensure that RAID controllers are configured to WRITE-THRU and not cached.
注意：通常，當(dāng)一個結(jié)點(diǎn)有故障，如果使用共享內(nèi)存或磁盤緩沖區(qū)，第二個結(jié)點(diǎn)接管時(shí)數(shù)據(jù)都被丟棄。此技術(shù)對數(shù)據(jù)庫很重要，保證RAID控制器被配置為WRITE-THRU而非緩存。
NOTE: When you run "hvstart" manually, you will need to hit RETURN to return to the prompt.
注意：當(dāng)手動運(yùn)行“hvstart“，你要單擊回車鍵回到命令行界面。
NOTE: With ReliantHA 1.1.3a a new option "gabconfig" option was added called -P.
注：對ReliantHA 1.1.3a，添加了新的gabconfig選項(xiàng)：－ｐ。
               The -P option was added as a standalone "debug" option for use
               after the gab driver is already configured which will generate
               a PANIC should "gab" halt.  By default it is turned off.  To
               turn it on set the value to -P 1.
－ｐ選項(xiàng)作為一個獨(dú)立的調(diào)試選項(xiàng)，在gab驅(qū)動被配置為若產(chǎn)生PANIC就gab停。默認(rèn)值是關(guān)，若要開，設(shè)置為－ｐ１．
               It is not recommended to use this feature within /etc/rc2.d.       　不推薦在/etc/rc2.d中使用此功能
               Create an S92gab file in /etc/init.d to execute this
               command at the end of the reboot, after entering multiuser
               mode in the following format:
            　可在/etc/init.d新建一個S92gab文件執(zhí)行此命令，這些應(yīng)在重啟，并進(jìn)入多用戶模式后，如下：
               /sbin/gabconfig -S 4000 -c
               /sbin/gabconfig -P 1
               Also add -D 63 to the previous line for more debug as:
            　也可在前一行加-D 63獲得更多調(diào)試功能：
               /sbin/gabconfig -S 4000 -c -D 63
               /sbin/gabconfig -P 1
NOTE: When replacing a private NIC, first remove the mswtab and clustertab, then recreate them again after the new card is installed.
NOTE: For RHA 1.1.4, please also run "rdu" for the Reliant Diags Utility.
注意：在替換一個私有NIC時(shí)，先刪除mswtab和clustertab（群標(biāo)簽），在新卡安裝后在重建他們。對于RHA1.1.4,還請運(yùn)行“rdu”以獲得Reliant Diags Utility。

這種問題很簡單的，只有兩個可能性
如果是備機(jī)掛，就是心跳線問題。你可能用了不穩(wěn)定的網(wǎng)線連接，或者其中一條心跳線為串口線。當(dāng)發(fā)生串口阻塞的時(shí)候，系統(tǒng)就掛了�？梢园汛趽Q成網(wǎng)卡，這樣一般都能解決。
如果是主機(jī)掛，通常是因?yàn)镃PU負(fù)載太大，導(dǎo)致系統(tǒng)響應(yīng)時(shí)間太慢。ReliantHA是老外設(shè)計(jì)出來的，比較教條+理想化，他們認(rèn)為如果CPU IDEL時(shí)間在10%以下，那一定是系統(tǒng)出問題了，所以強(qiáng)制切換，呵呵
要解決的話，加CPU，或者減少一個數(shù)據(jù)庫引擎，就可以搞定

本文來自ChinaUnix博客，如果查看原文請點(diǎn)：http://blog.chinaunix.net/u/22/showart_397281.html

文庫|博客

Apache官方強(qiáng)心劑：開源不受出口管理?xiàng)l例約束！
Linux基礎(chǔ)命令---lynx瀏覽器
Dell R740服務(wù)器設(shè)置磁盤直通,不做RAID虛擬磁盤陣列
Linux基礎(chǔ)命令---elinks文本瀏覽器
Linux基礎(chǔ)命令---wget下載工具

返回列表

Chinaunix › 論壇 › 操作系統(tǒng) › 其他UNIX › SCO文檔中心 › ReliantHA經(jīng)常無故重新啟動的問題！

積分 0, 距離下一級還需積分

亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

ReliantHA經(jīng)常無故重新啟動的問題！ [復(fù)制鏈接]

ReliantHA經(jīng)常無故重新啟動的問題！ [復(fù)制鏈接]