CSC/ECE 506 Fall 2007/wiki2 7 amassg
Cache to Cache sharing
Introduction :
Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .
The need for Cache-to-Cache Sharing :
The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . An example for a bus based communication protocol which supports cache to cache transfer of blocks is the original version of the MESI (Illinois) protocol. The reason for considering cache - to - cache sharing mechanisms is as follows:
Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures where the memory is widely distributed and the latency involved in accessing main memory is significant.
Additionally, cache to cache transfer have been used for bandwidth reduction. This involves keeping a track of address of data transferred. The future transfers are predicted using the record of previous transfers and only the required caches are involved.
Current use of cache to cache sharing:
Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various countries and institutions have established hierarchies of proxy caches that co-operate via ICP to reduce traffic over the Internet.
Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead, while maintaining an acceptable hit ratio.
When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and access speed obtained by using this scheme. Two of the most important design decisions involved are : - frequency of summary updates - representation of the summary to reduce the memory requirements
The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.
Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple , space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.
Patent Link : [1]
Disadvantages :
1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must make sure that no cache is capable of supplying the data before driving the bus line.
2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)
3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.
4> Existing web servers cannot be modified easily to support these protocols and optimizations.
Conclusion
Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.
Further references
- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao, and Jussara Almeida Andrei Z. Broder Department of Computer Science Systems Research Center University of Wisconsin-Madison Digital Equipment Corporation [2]
- Information on web cache communication
- Bandwidth reduction using cache to cache transfer prediction