CSC/ECE 506 Spring 2011/ch6a jp

From Expertiza_Wiki
Jump to navigation Jump to search

Cache Hierarchy

In a simple computer model, processor reads data and instructions from the memory and operates on the data. Operating frequency of CPU increased faster than the speed of memory and memory interconnects. For example, cores in Intel first generation i7 processors run at 3.2 GHz frequency, while the memory only runs at 1.3GHz frequency. Also, multi-core architecture started putting more demand on memory bandwidth. This increases the latency in memory access and CPU will have to be idle for most of the time. Due to this, memory became a bottle neck in performance.

To solve this problem, “cache” was invented. Cache is simply a temporary volatile storage space like primary memory but runs at the speed similar to core frequency. CPU can access data and instructions from cache in few clock cycles while accessing data from main memory can take more than 50 cycles. In early days of computing, cache was implemented as a stand alone chip outside the processor. In today’s processors, cache is implemented on same die as core.

There can be multiple levels of caches, each cache subsequently away from the core and larger in size. L1 is closest to the CPU and as a result, fastest to excess. Next to L1 is L2 cache and then L3. L1 cache is divided into instruction cache and data cache. This is better than having a combined larger cache as instruction cache being read-only is easy to implement while data cache is read-write.

Cache Write Policies

Write hit policies

Write miss policies

Generic Policies

Prefetching

Advantages

Disadvantages

Effectiveness

Stream Buffer Prefetching

Prefetching in Parallel Computing

References

Header 1 Header 2 Header 3
row 1, cell 1 row 1, cell 2 row 1, cell 3
row 2, cell 1 row 2, cell 2 row 2, cell 3
row 3, cell 1 row 3, cell 2 row 3, cell 3


Multiplication table
× 1 2 3
1 1 2 3
2 2 4 6
3 3 6 9