平臺論壇博客文庫

› 論壇 › 操作系統(tǒng) › Solaris › Solaris文檔中心 › CPU學(xué)習(xí)筆記(2)

CPU學(xué)習(xí)筆記(2) [復(fù)制鏈接]

Solaris12

富足長樂

論壇徽章:: 0

電梯直達(dá)

1樓 [收藏(0)] [報告]

發(fā)表于 2005-04-15 11:27 |只看該作者 |倒序瀏覽

CPU學(xué)習(xí)筆記(2)
作者: BadcoffeeEmail: blog.oliver@gmail.com2005年4月原文出處:
http://blog.csdn.net/yayong
版權(quán)所有: 轉(zhuǎn)載時請務(wù)必以超鏈接形式標(biāo)明文章原始出處、作者信息及本聲明
這是作者學(xué)習(xí)硬件基本知識過程中的筆記，由于以前很少接觸這方面的知識，又缺乏系統(tǒng)
的學(xué)習(xí)，難免會出現(xiàn)錯誤，希望得到大家指正。
一、Cache Coherence
在2004年寫的一篇文章
X86匯編語言學(xué)習(xí)手記(1)
中，曾經(jīng)涉及到gcc編譯的代碼默認(rèn)16字節(jié)
棧對齊的問題。之所以這樣做，主要是性能優(yōu)化方面的考慮。
大多數(shù)現(xiàn)代CPU都One-die了L1和L2Cache。對于L1 Cache，大多是write though的；L2 Cache
則是write back的，不會立即寫回memory，這就會導(dǎo)致Cache和Memory的內(nèi)容的不一致；另外，
對于MP(Multi Processors)的環(huán)境，由于Cache是CPU私有的，不同CPU的Cache的內(nèi)容也存在
不一致的問題，因此很多MP的的計算架構(gòu)，不論是ccNUMA還是SMP都實現(xiàn)了Cache Coherence
的機制,即不同CPU的Cache一致性機制。
Cache Coherence的一種實現(xiàn)是通過Cache-snooping協(xié)議，每個CPU通過對Bus的Snoop實現(xiàn)對
其它CPU讀寫Cache的監(jiān)控：
首先，Cache line是Cache和Memory之間數(shù)據(jù)傳輸?shù)淖钚卧?br /> 1. 當(dāng)CPU1要寫Cache時，其它CPU就會檢查自己Cache中對應(yīng)的Cache line,如果是dirty的，
就write back到Memory,并且會將CPU1的相關(guān)Cache line刷新；如果不是dirty的，就Invalidate
該Cache line.
2. 當(dāng)CPU1要讀Cache時，其它CPU就會將自己Cache中對應(yīng)的Cache line中標(biāo)記為dirty的部分
write back到Memory,并且會將CPU1的相關(guān)Cache line刷新。
所以，提高CPU的Cache hit rate,減少Cache和Memory之間的數(shù)據(jù)傳輸，將會提高系統(tǒng)的性能。
因此，在程序和二進制對象的內(nèi)存分配中保持Cache line aligned就十分重要，如果不保證
Cache line對齊，出現(xiàn)多個CPU中并行運行的進程或者線程同時讀寫同一個Cache line的情況
的概率就會很大。這時CPU的Cache和Memory之間會反復(fù)出現(xiàn)Write back和Refresh情況，這種
情形就叫做Cache thrashing。
為了有效的避免Cache thrashing,通常有以下兩種途徑：
1. 對于Heap的分配，很多系統(tǒng)在malloc調(diào)用中實現(xiàn)了強制的alignment.
2. 對于Stack的分配，很多編譯器提供了Stack aligned的選項。
當(dāng)然，如果在編譯器指定了Stack aligned,程序的尺寸將會變大，會占用更多的內(nèi)存。因此，
這中間的取舍需要仔細(xì)考慮，下面是我在google上搜索到的一段討論：
One of our customers complained about the additional code generated to
maintain the stack aligned to 16-byte boundaries, and suggested us to
default to the minimum alignment when optimizing for code size. This
has the caveat that, when you link code optimized for size with code
optimized for speed, if a function optimized for size calls a
performance-critical function with the stack misaligned, the
performance-critical function may perform poorly.
二、gcc的對齊參數(shù)
-mpreferred-stack-boundary在
X86匯編語言學(xué)習(xí)手記(1)
中已經(jīng)提及，另外，在google上還搜
索到了一個關(guān)于棧對齊討論的郵件，與大家分享：
----- Original Message -----
From: "Andreas Jaeger"
To: gcc@gcc.gnu.org
Cc: "Jens Wallner" wallner@ims.uni-hannover.de
Sent: Saturday, February 03, 2001 2:37 AM
Subject: Question about -mpreferred-stack-boundary
>
> We (glibc team) got a bug report that the stack is not aligned
> properly - and I'm a bit confused by the documentation of
> -mpreferred-stack-boundary which is:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> @item -mpreferred-stack-boundary=@var{num}
> Attempt to keep the stack boundary aligned to a 2 raised to @var{num}
> byte boundary. If @samp{-mpreferred-stack-boundary} is not specified,
> the default is 4 (16 bytes or 128 bits).
>
> The stack is required to be aligned on a 4 byte boundary. On Pentium
> and PentiumPro, @code{double} and @code{long double} values should be
> aligned to an 8 byte boundary (see @samp{-malign-double}) or suffer
> significant run time performance penalties. On Pentium III, the
> Streaming SIMD Extension (SSE) data type @code{__m128} suffers similar
> penalties if it is not 16 byte aligned.
>
> To ensure proper alignment of this values on the stack, the stack boundary
> must be as aligned as that required by any value stored on the stack.
> Further, every function must be generated such that it keeps the stack
> aligned. Thus calling a function compiled with a higher preferred
> stack boundary from a function compiled with a lower preferred stack
> boundary will most likely misalign the stack. It is recommended that
> libraries that use callbacks always use the default setting.
>
> This extra alignment does consume extra stack space. Code that is sensitive
> to stack space usage, such as embedded systems and operating system kernels,
> may want to reduce the preferred alignment to
> @samp{-mpreferred-stack-boundary=2}.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Who has to align the stack for calls to a function - the caller or the
> callee? In other words: Does this mean that the stack has to be
> aligned before calling a function? Or does it have to be aligned when
> entering a function?
>
> Andreas
> --
> Andreas Jaeger
> SuSE Labs aj@suse.de
> private aj@arthur.inka.de
>
http://www.suse.de/~aj
I believe the preferred alignment for long double is a 16 byte boundary, and
the stack (and instruction) alignments must be so set before entering a function.
Pentium 4 increases preferred data alignments to 32 bytes in some situations,
as well as increasing the number of situations (SSE2 instructions) where 16 byte
alignment is needed.
從這里可以看到，棧對齊是在調(diào)用函數(shù)之前就必須保證的：
the stack (and instruction) alignments must be so set before entering a function
相關(guān)文檔：
X86匯編語言學(xué)習(xí)手記(1)
CPU學(xué)習(xí)筆記(1)
Cache Cohernce with Multi-Processor

本文來自ChinaUnix博客，如果查看原文請點：http://blog.chinaunix.net/u/768/showart_21467.html

文庫|博客

Apache官方強心劑：開源不受出口管理條例約束！
Linux基礎(chǔ)命令---lynx瀏覽器
Dell R740服務(wù)器設(shè)置磁盤直通,不做RAID虛擬磁盤陣列
Linux基礎(chǔ)命令---elinks文本瀏覽器
Linux基礎(chǔ)命令---wget下載工具

返回列表

Chinaunix › 論壇 › 操作系統(tǒng) › Solaris › Solaris文檔中心 › CPU學(xué)習(xí)筆記(2)

積分 0, 距離下一級還需積分

亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

CPU學(xué)習(xí)筆記(2) [復(fù)制鏈接]