CSC/ECE 506 Spring 2011/ch6a jp: Difference between revisions
No edit summary |
|||
Line 43: | Line 43: | ||
=References= | =References= | ||
<span id="1foot">[[#1body|1.]]</span> http://www.real-knowledge.com/memory.htm <br> | |||
<span id="2foot">[[#2body|2.]]</span> Computer Design & Technology- Lectures slides by Prof.Eric Rotenberg <br> | |||
<span id="3foot">[[#3body|3.]]</span> Fundamentals of Parallel Computer Architecture by Prof.Yan Solihin <br> | |||
<span id="4foot">[[#4body|4.]]</span> “Cache write policies and performance,” Norman Jouppi, Proc. 20th International Symposium on Computer Architecture (ACM Computer Architecture News 21:2), May 1993, pp. 191–201.<br> | |||
<span id="5foot">[[#5body|5.]]</span> Architecture of Parallel Computers, Lecture slides by Prof. Edward Gehringer <br> | |||
{| class="wikitable" | {| class="wikitable" |
Revision as of 23:14, 26 February 2011
Cache Hierarchy
In a simple computer model, processor reads data and instructions from the memory and operates on the data. Operating frequency of CPU increased faster than the speed of memory and memory interconnects. For example, cores in Intel first generation i7 processors run at 3.2 GHz frequency, while the memory only runs at 1.3GHz frequency. Also, multi-core architecture started putting more demand on memory bandwidth. This increases the latency in memory access and CPU will have to be idle for most of the time. Due to this, memory became a bottle neck in performance.
To solve this problem, “cache” was invented. Cache is simply a temporary volatile storage space like primary memory but runs at the speed similar to core frequency. CPU can access data and instructions from cache in few clock cycles while accessing data from main memory can take more than 50 cycles. In early days of computing, cache was implemented as a stand alone chip outside the processor. In today’s processors, cache is implemented on same die as core.
There can be multiple levels of caches, each cache subsequently away from the core and larger in size. L1 is closest to the CPU and as a result, fastest to excess. Next to L1 is L2 cache and then L3. L1 cache is divided into instruction cache and data cache. This is better than having a combined larger cache as instruction cache being read-only is easy to implement while data cache is read-write.
Cache Write Policies
Write hit policies
Write-through
Also known as store-through, this policy will write to main memory whenever a write is performed to cache.
Write-back
Also known as store-in or copy-back, this policy will write to main memory only when a block of data is purged from the cache storage.
Write miss policies
Write-allocate vs Write no-allocate
When a write misses in the cache, there may or may not be a line in the cache allocated to the block. For write-allocate, there will be a line in the cache for the written data. This policy is typically associated with write-back caches. For no-write-allocate, there will not be a line in the cache.
Fetch-on-write vs no-fetch-on-write
The fetch-on-write will cause the block of data to be fetched from a lower memory hierarchy if the write misses. The policy fetches a block on every write miss.
Write-before-hit vs no-write-before-hit
The write-before-hit will write data to the cache before checking the cache tags for a match. In case of a miss, the policy will displace the block of data already in the cache.
Combination Policies
Write-validate
It is a combination of no-fetch-on-write and write-allocate. The policy allows partial lines to be written to the cache on a miss. It provides for better performance as well as works with machines that have various line sizes and does not add instruction execution overhead to the program being run.
Write-invalidate
This policy is a combination of write-before-hit, no-fetch-on-write, and no-write-allocate. This policy invalidates lines when there is a miss.
Write-around
Combination of no-fetch-on-write, no-write-allocate, and no-write-before-hit. This policy uses a non-blocking write scheme to write to cache. It writes data to the next lower cache without modifying the data of the cache line.
Prefetching
Advantages
Disadvantages
Effectiveness
Stream Buffer Prefetching
Prefetching in Parallel Computing
References
1. http://www.real-knowledge.com/memory.htm
2. Computer Design & Technology- Lectures slides by Prof.Eric Rotenberg
3. Fundamentals of Parallel Computer Architecture by Prof.Yan Solihin
4. “Cache write policies and performance,” Norman Jouppi, Proc. 20th International Symposium on Computer Architecture (ACM Computer Architecture News 21:2), May 1993, pp. 191–201.
5. Architecture of Parallel Computers, Lecture slides by Prof. Edward Gehringer
Header 1 | Header 2 | Header 3 |
---|---|---|
row 1, cell 1 | row 1, cell 2 | row 1, cell 3 |
row 2, cell 1 | row 2, cell 2 | row 2, cell 3 |
row 3, cell 1 | row 3, cell 2 | row 3, cell 3 |
× | 1 | 2 | 3 |
---|---|---|---|
1 | 1 | 2 | 3 |
2 | 2 | 4 | 6 |
3 | 3 | 6 | 9 |