Expertiza_Wiki - User contributions [en]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-28T20:37:33Z

Ssghatta: /* Further references */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . An example for a bus based communication protocol which supports cache to cache transfer of blocks is the original version of the [http://en.wikipedia.org/wiki/MESI_protocol MESI] (Illinois) protocol. The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures where the memory is widely distributed and the latency involved in accessing main memory is significant.

Additionally, cache to cache transfer have been used for bandwidth reduction. This involves keeping a track of address of data transferred. The future transfers are predicted using the record of previous transfers and only the required caches are involved.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( [http://en.wikipedia.org/wiki/Internet_Cache_Protocol ICP] ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and access speed obtained by using this scheme. Two of the most important design decisions involved are :
- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

- [http://www.freepatentsonline.com/7174430.html Bandwidth reduction using cache to cache transfer prediction]

- [http://tibrewala.net/papers/mesi98/ Cache to cache transfer in MESI protocol]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-28T20:30:40Z

Ssghatta: /* Introduction : */

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-28T20:17:47Z

Ssghatta: /* Current use of cache to cache sharing: */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . An example for a bus based communication protocol which supports cache to cache transfer of blocks is the original version of the [http://en.wikipedia.org/wiki/MESI_protocol MESI] (Illinois) protocol. The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( [http://en.wikipedia.org/wiki/Internet_Cache_Protocol ICP] ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and access speed obtained by using this scheme. Two of the most important design decisions involved are :
- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-28T20:15:27Z

Ssghatta: /* Introduction : */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . An example for a bus based communication protocol which supports cache to cache transfer of blocks is the original version of the [http://en.wikipedia.org/wiki/MESI_protocol MESI] (Illinois) protocol. The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and access speed obtained by using this scheme. Two of the most important design decisions involved are :
- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:51:36Z

Ssghatta: /* Current use of cache to cache sharing: */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and access speed obtained by using this scheme. Two of the most important design decisions involved are :
- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:50:57Z

Ssghatta: /* Further references */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :

- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:50:42Z

Ssghatta: /* Further references */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :

- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation
[http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:50:28Z

Ssghatta: /* Disadvantages : */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :

- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation
[http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:50:16Z

Ssghatta: /* Disadvantages : */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :

- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must
make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only in smaller chunks and hence it takes longer to accomplish the data transfer. Hence, cache-to-cache sharing is feasible only if the amount of data to be transferred among caches is small. ( Otherwise, the overheads are too large.)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation
[http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:49:14Z

Ssghatta: /* Further references */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :

- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must
make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only
in smaller chunks and hence it takes longer to accomplish the data transfer. Hence , cache-to-cache sharing is feasible
only if the amount of data to be transferred among caches is small. ( Otherwise , the overheads are too large .)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection
algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation
[http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:48:19Z

Ssghatta: /* Further references */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :

- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must
make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only
in smaller chunks and hence it takes longer to accomplish the data transfer. Hence , cache-to-cache sharing is feasible
only if the amount of data to be transferred among caches is small. ( Otherwise , the overheads are too large .)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection
algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation
[http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a Summary Cache]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:47:46Z

Ssghatta: /* Further references */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :

- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must
make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only
in smaller chunks and hence it takes longer to accomplish the data transfer. Hence , cache-to-cache sharing is feasible
only if the amount of data to be transferred among caches is small. ( Otherwise , the overheads are too large .)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection
algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation

Link :
- [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html More on Web Caching : ]

- [http://www.web-cache.com/Writings/papers.html Information on web cache communication]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:44:42Z

Ssghatta: /* Current use of cache to cache sharing: */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :

- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must
make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only
in smaller chunks and hence it takes longer to accomplish the data transfer. Hence , cache-to-cache sharing is feasible
only if the amount of data to be transferred among caches is small. ( Otherwise , the overheads are too large .)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection
algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation

Link :
- [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- More on Web Caching : [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html]

- [http://www.web-cache.com/Writings/papers.html]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:43:43Z

Ssghatta: /* Current use of cache to cache sharing: */

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :
- frequency of summary updates
- representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must
make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only
in smaller chunks and hence it takes longer to accomplish the data transfer. Hence , cache-to-cache sharing is feasible
only if the amount of data to be transferred among caches is small. ( Otherwise , the overheads are too large .)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection
algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation

Link :
- [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- More on Web Caching : [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html]

- [http://www.web-cache.com/Writings/papers.html]

CSC/ECE 506 Fall 2007/wiki2 7 amassg

2007-09-24T22:42:48Z

Ssghatta:

=Cache to Cache sharing=

==Introduction :==

Caches are small , fast memories which are placed in-between the CPU and the main memory in order to speed up the access of frequently used instructions and data . Caches are designed to exploit spatial and temporal locality.However, since fast memory is expensive, the memory hierarchy is organised into several levels , each smaller, faster and more expensive per byte than the next lower level . The highest level in the memory hierarchy is the L1 cache which is usually accommodated on-chip ( to reduce the latency involved in communicating over the shared bus ) . The L1 cache is usually constructed out of an SRAM and is hence very fast and expensive. Typical L1 cache sizes range from 8 - 64 KB . L2 caches use less advanced memory technology and they typically have a capacity of around 512 KB .

'''The need for Cache-to-Cache Sharing :'''

The concept of cache sharing surfaces in CMP ( Chip Multi-Processor ) systems which typically use a shared - bus architecture where multiple processors ( each with its own local cache ) are connected to the controller corresponding to the main memory through a shared bus . The bus communication protocols in such systems are designed to ensure memory consistency and cache coherence . An interesting lower - level design decision that a computer architect is faced with while implementing such a design is whether to implement cache-to-cache sharing or not . The reason for considering cache - to - cache sharing mechanisms is as follows:

Consider a situation where a particular processor has initiated a 'Bus Read' transaction in order to read a block from memory. If this memory block has already been cached earlier by another processor, then it would be faster to read the block from the other processor's cache rather than allowing the memory access to propagate all the way down to the main memory.The main advantage is that caches can supply the requested block faster than the main memory can. The cache-to-cache sharing technique is particularly useful for large multiprcessor architectures
where the memory is widely distributed and the latency involved in accessing main memory is significant.

==Current use of cache to cache sharing: ==

Web Cache Sharing was first proposed in the context of the Harvest Group, which designed the Internet Cache
Protocol ( ICP ), which supported discovery and retrieval of documents from nearby caches.Today, various
countries and institutions have established hierarchies of proxy caches that co-operate via ICP to
reduce traffic over the Internet.

Cache-to-cache sharing is particularly useful in Web-based applications to reduce latency and serve incoming requests as efficiently as possible. The sharing of information among web proxies is an important technique to reduce traffic on the web and alleviate network bottlenecks. A new protocol called 'Summary Cache' has been proposed where each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these
summaries for potential hits before sending any queries. Trace - driven simulations have shown that compared to the
existing Internet Cache Protocol ( ICP ), Summary Cache reduces the number of inter-cache messages by a factor
of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead,
while maintaining an acceptable hit ratio.

When a cache miss occurs , the cache probes the summaries of all the proxies to check for a potential cache hit and
also sends out messages to all caches whose summaries show promising results. Summaries need not be accurate
at all times.If a request is not a hit , as falsely indicated by the summary ( a false hit ) , the only penalty
incurred is a wasted query , which is acceptable , considering the significant improvement in hit ratio and
access speed obtained by using this scheme.
Two of the most important design decisions involved are :
# frequency of summary updates
# representation of the summary to reduce the memory requirements

The cache-cache communication protocols can be further enhanced by having some sort of message passing mechanism
between the caches. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.

Bloom Filters have been suggested as a method to share Web Cache information. A Bloom filter represents a simple ,
space-efficient data structure which can be used to represent data sets on the individual caches. Proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup.

Patent Link : [http://www.google.com/patents?id=3sUkAAAAEBAJ&dq=Cache-+Cache+communication]

==Disadvantages :==

1> Cache-cache sharing adds a lot of complexity to the bus-based protocol since the main memory must
make sure that no cache is capable of supplying the data before driving the bus line.

2> The main memory can transfer an entire block in a single transfer operation but caches can transfer data only
in smaller chunks and hence it takes longer to accomplish the data transfer. Hence , cache-to-cache sharing is feasible
only if the amount of data to be transferred among caches is small. ( Otherwise , the overheads are too large .)

3> There is a possibility that the requested block may be present in multiple caches, in which case a selection
algorithm is required in order to determine who will provide the data.

4> Existing web servers cannot be modified easily to support these protocols and optimizations.

==Conclusion==

Although cache to cache communication and sharing seems to be a fairly good proposition, it has quite of a few drawbacks. Additional hardware, software and communication overheads make the system slower and expensive for implementation. It does not seem to be feasible for data transfers that occur in bulk rather than in blocks or words, since the bandwidth for the communication is limited. Also, conversion of environments with older architectures to the newer ones supporting cache to cache sharing is difficult and often the whole set up needs to be reimplemented.

==Further references==

- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Li Fan, Pei Cao, and Jussara Almeida
Andrei Z. Broder
Department of Computer Science
Systems Research Center
University of Wisconsin-Madison
Digital Equipment Corporation

Link :
- [http://64.233.169.104/search?q=cache:8-At8udlW3oJ:pages.cs.wisc.edu/~cao/papers/summarycache.ps+cache+to+cache+sharing&hl=en&ct=clnk&cd=2&gl=us&client=firefox-a]

- More on Web Caching : [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node16.html]

- [http://www.web-cache.com/Writings/papers.html]

CSC/ECE 506 Fall 2007/wiki1 8 s5

2007-09-05T22:05:58Z

Ssghatta: /* Message Passing and Blade Servers */

== Message Passing and Blade Servers ==

'''Introduction:'''

When we have multiple processors, there needs to be a way to communicate between those processors. Message Passing forms a part of this communication architecture. There are other methods of communication like Shared Address Space and Data Parallel Processing, which along with Message Passing contribute to the communication abstraction. Communication abstraction is essentially a layer in between the application software and the communication hardware where the programmer uses available libraries to initiate communication between processors though programs.

'''Message Passing:'''

Message Passing Model is defined as:

1. Set of Processes having only local memory

2. Processes communicate by sending and receiving messages

3. Transfer of data between processes requires cooperative operations to be performed by each process (a send operation must have a matching receive)

The message passing model has gained wide use in the field of parallel computing due to advantages that include:

1. Hardware match - The message passing model fits well on parallel supercomputers and clusters of workstations which are composed of separate processors connected by a communications network.

2. Functionality - Message passing offers a full set of functions for expressing parallel algorithms, providing the control not found in data-parallel and compiler-based models.

3. Performance - Effective use of modern CPUs requires management of their memory hierarchy, especially their caches. Message passing achieves this by giving programmer explicit control of data locality.

The principle drawback of message passing is the responsibility it places on the programmer. The programmer must explicitly implement a data distribution scheme and all interprocess communication and synchronization. In so doing, it is the programmer's responsibility to resolve data dependencies and avoid deadlock and race conditions.

Latest Developments in Message Passing:

Although Message Passing Model as a whole has not changed over time, the Message Passing Interface (MPI) has undergone continuous change. MPI is a communications protocol used to program parallel computers. MPI is not sanctioned by any major standards body; nevertheless, it has become the de facto standard for communication among processes that model a parallel program running on a distributed memory system.

'''Message Passing Interface (MPI)'''

It is a specification for message passing libraries, designed to be a standard for distributed memory, message passing and parallel computing. The goal of the Message Passing Interface simply stated is to provide a widely used standard for writing message-passing programs. The interface attempts to establish a practical, portable, efficient, and flexible standard for message passing.

History:

MPI resulted from the efforts of numerous individuals and groups over the course of 2 years, dated back in 1980’s. Given below is a chronology of developments in MPI according to documentation provided by the Maui High Performance Computing Center. (www.mhpcc.edu)

1. 1980s - early 1990s: Distributed memory, parallel computing develops, as do a number of incompatible software tools for writing such programs - usually with tradeoffs between portability, performance, functionality and price. Recognition of the need for a standard arose.

2. April, 1992: Workshop on Standards for Message Passing in a Distributed Memory Environment, sponsored by the Center for Research on Parallel Computing, Williamsburg, Virginia. The basic features essential to a standard message passing interface were discussed, and a working group established to continue the standardization process. Preliminary draft proposal developed subsequently.

3. November 1992: - Working group meets in Minneapolis. MPI draft proposal (MPI1) from ORNL presented. Group adopts procedures and organization to form the MPI Forum. MPIF eventually comprised of about 175 individuals from 40 organizations including parallel computer vendors, software writers, academia and application scientists.

4. November 1993: Supercomputing 93 conference - draft MPI standard presented.

5. Final version of draft released in May, 1994 - available on the WWW at: http://www.mcs.anl.gov/Projects/mpi/standard.html

Advantages:

MPI is preferred over other implementations for several reasons like:

1. Standardization - MPI is the only message passing library which can be considered a standard. It is supported on virtually all High Performance Computing (HPC) platforms.

2. Portability – Modification of source code not required when the application is ported to a different platform that supports MPI.

3. Performance - vendor implementations should be able to exploit native hardware features to optimize performance.

4. Functionality (over 115 routines)

5. Availability - a variety of implementations are available, both vendor and public domain.

MPI Implementations:

Some of the implementations of MPI include:

1. Classical Cluster and Supercomputer implementations

2. Python

3. OCaml

4. Java

5. Microsoft Windows

6. MATLAB

7. Hardware implementations

'''Blade Servers'''

A blade server is a server chassis housing multiple thin, modular electronic circuit boards, known as server blades. Each blade is a server in its own right, often dedicated to a single application. The blades are literally servers on a card, containing processors, memory, integrated network controllers, an optional fiber channel host bus adaptor (HBA) and other input/output (IO) ports.

Blade servers allow more processing power in less rack space, simplifying cabling and reducing power consumption. According to a Search, WinSystems.com article on server technology, enterprises moving to blade servers can experience as much as an 85% reduction in cabling for blade installations over conventional 1U or tower servers. With so much less cabling, IT administrators can spend less time managing the infrastructure and more time ensuring high availability

A blade server is sometimes referred to as a high-density server and is typically used in a clustering of servers that are dedicated to a single task, such as:

1. File sharing

2. Web page serving and caching

3. SSL encrypting of Web communication

4. The transcoding of Web page content for smaller displays

5. Streaming audio and video content

Architecture:

A general blade server architecture is shown in the figure below. The hardware components of a blade server are the switch blade, chassis (with fans, temperature sensors, etc), and multiple compute blades. Some vendors offer, partner, or plan to partner with companies that provide application specific blades that provide traffic conditioning, protection, or network processing prior to the traffic reaching the compute blades. Often, these application specific blades may be functionally positioned between the switch blade and compute blades. However, these blades reside in a standard compute blade slot.
The outside world connects through the rear of the chassis to a switch card in the blade server. The switch card is provisioned to distribute packets to blades within the blade server. All these components are wrapped together with network management system software provided by the blade server vendor. The network management could be done through Message Passing which essentially makes blade servers an extension of message passing.

Evolution:

The name blade server appeared when a card included the processor, memory, I/O and non-volatile program storage (flash memory or small hard disk(s)). This allowed a complete server, with its operating system and applications, to be packaged on a single card / board / blade. These blades could then operate independently within a common chassis, doing the work of multiple separate server boxes more efficiently. Less space consumption is the most obvious benefit of this packaging, but additional efficiency benefits have become clear in power, cooling, management, and networking due to the pooling or sharing of common infrastructure to supports the entire chassis, rather than providing each of these on a per server box basis.
Blade servers date back to 1970s. The evolution chronology as provided by Wikipedia (Article: Blade Servers) is given below:
Complete microcomputers were placed on cards and packaged in standard 19-inch racks in the 1970s soon after the introduction of 8-bit microprocessors. This architecture was used in the industrial process control industry as an alternative to minicomputer control systems. Programs were stored in EPROM on early models and were limited to a single function with a small realtime executive.

The VMEBus architecture (ca. 1981) defined a computer interface which included implementation of a board-level computer that was installed in a chassis backplane with multiple slots for pluggable boards that provide I/O, memory, or additional computing. The PCI Industrial Computer Manufacturers Group PICMG developed a chassis/blade structure for the then emerging Peripheral Component Interconnect bus PCI which is called CompactPCI. Common among these chassis based computers was the fact that the entire chassis was a single system. While a chassis might include multiple computing elements to provide the desired level of performance and redundancy, there was always one board in charge, one master board coordinating the operation of the entire system. PICMG expanded the CompactPCI specification with the use of standard Ethernet connectivity between boards across the backplane. The PICMG 2.16 CompactPCI Packet Switching Backplane specification was adopted in Sept 2001 (PICMG specifications). This provided the first open architecture for a multi-server chassis. PICMG followed with the larger and more feature rich AdvancedTCA specification targeting the telecom industry's need for a high availability and dense computing platform with extended product life (10+ years). While AdvancedTCA system and board pricing is typically higher than blade servers, AdvancedTCA suppliers claim that low operating expenses and total cost of ownership can make AdvancedTCA-based solutions a cost effective alternative for many building blocks of the next generation telecom network.

Future:

Early versions of server blades will be primarily high-density, low-power devices with relatively low performance. This type of blade is suited for first-tier applications such as static Web servers, security, network services, and streaming media because the applications can be easily and inexpensively load balanced. The performance of an application depends on the aggregate performance of the servers rather than the performance of an individual server.
Higher performance, less dense blade designs will help drive blade usage into more mainstream applications in the corporate data center. These designs can offer the individual performance characteristics and features available in today's rack-dense servers along with the cost, deployment, serviceability, and density benefits of server blades. The blades will be well suited to high-performance Web servers, dedicated application servers, server-based or thin-client computing, and high-performance computing (HPC) clusters.

The introduction of server blades and associated technology like IB will usher in a new IT infrastructure. IT managers should start planning now for server blade installations by evaluating IP-based storage solutions, remote software provisioning and management solutions, scale-out architectures, and load-balancing technologies

CSC/ECE 506 Fall 2007/wiki1 8 s5

2007-09-05T21:42:27Z

Ssghatta: /* Message Passing and Blade Servers */

== Message Passing and Blade Servers ==

'''Introduction:'''

When we have multiple processors, there needs to be a way to communicate between those processors. Message Passing forms a part of this communication architecture. There are other methods of communication like Shared Address Space and Data Parallel Processing, which along with Message Passing contribute to the communication abstraction. Communication abstraction is essentially a layer in between the application software and the communication hardware where the programmer uses available libraries to initiate communication between processors though programs.

'''Message Passing:'''

Message Passing Model is defined as:

1. Set of Processes having only local memory

2. Processes communicate by sending and receiving messages

3. Transfer of data between processes requires cooperative operations to be performed by each process (a send operation must have a matching receive)

The message passing model has gained wide use in the field of parallel computing due to advantages that include:

1. Hardware match - The message passing model fits well on parallel supercomputers and clusters of workstations which are composed of separate processors connected by a communications network.

2. Functionality - Message passing offers a full set of functions for expressing parallel algorithms, providing the control not found in data-parallel and compiler-based models.

3. Performance - Effective use of modern CPUs requires management of their memory hierarchy, especially their caches. Message passing achieves this by giving programmer explicit control of data locality.

The principle drawback of message passing is the responsibility it places on the programmer. The programmer must explicitly implement a data distribution scheme and all interprocess communication and synchronization. In so doing, it is the programmer's responsibility to resolve data dependencies and avoid deadlock and race conditions.

Latest Developments in Message Passing:

Although Message Passing Model as a whole has not changed over time, the Message Passing Interface (MPI) has undergone continuous change. MPI is a communications protocol used to program parallel computers. MPI is not sanctioned by any major standards body; nevertheless, it has become the de facto standard for communication among processes that model a parallel program running on a distributed memory system.

'''Message Passing Interface (MPI)'''

It is a specification for message passing libraries, designed to be a standard for distributed memory, message passing and parallel computing. The goal of the Message Passing Interface simply stated is to provide a widely used standard for writing message-passing programs. The interface attempts to establish a practical, portable, efficient, and flexible standard for message passing.

History:

MPI resulted from the efforts of numerous individuals and groups over the course of 2 years, dated back in 1980’s. Given below is a chronology of developments in MPI according to documentation provided by the Maui High Performance Computing Center. (www.mhpcc.edu)

1. 1980s - early 1990s: Distributed memory, parallel computing develops, as do a number of incompatible software tools for writing such programs - usually with tradeoffs between portability, performance, functionality and price. Recognition of the need for a standard arose.

2. April, 1992: Workshop on Standards for Message Passing in a Distributed Memory Environment, sponsored by the Center for Research on Parallel Computing, Williamsburg, Virginia. The basic features essential to a standard message passing interface were discussed, and a working group established to continue the standardization process. Preliminary draft proposal developed subsequently.

3. November 1992: - Working group meets in Minneapolis. MPI draft proposal (MPI1) from ORNL presented. Group adopts procedures and organization to form the MPI Forum. MPIF eventually comprised of about 175 individuals from 40 organizations including parallel computer vendors, software writers, academia and application scientists.

4. November 1993: Supercomputing 93 conference - draft MPI standard presented.

5. Final version of draft released in May, 1994 - available on the WWW at: http://www.mcs.anl.gov/Projects/mpi/standard.html

Advantages:

MPI is preferred over other implementations for several reasons like:

1. Standardization - MPI is the only message passing library which can be considered a standard. It is supported on virtually all High Performance Computing (HPC) platforms.

2. Portability – Modification of source code not required when the application is ported to a different platform that supports MPI.

3. Performance - vendor implementations should be able to exploit native hardware features to optimize performance.

4. Functionality (over 115 routines)

5. Availability - a variety of implementations are available, both vendor and public domain.

MPI Implementations:

Some of the implementations of MPI include:

1. Classical Cluster and Supercomputer implementations

2. Python

3. OCaml

4. Java

5. Microsoft Windows

6. MATLAB

7. Hardware implementations

'''Blade Servers'''

A blade server is a server chassis housing multiple thin, modular electronic circuit boards, known as server blades. Each blade is a server in its own right, often dedicated to a single application. The blades are literally servers on a card, containing processors, memory, integrated network controllers, an optional fiber channel host bus adaptor (HBA) and other input/output (IO) ports.

Blade servers allow more processing power in less rack space, simplifying cabling and reducing power consumption. According to a Search, WinSystems.com article on server technology, enterprises moving to blade servers can experience as much as an 85% reduction in cabling for blade installations over conventional 1U or tower servers. With so much less cabling, IT administrators can spend less time managing the infrastructure and more time ensuring high availability

A blade server is sometimes referred to as a high-density server and is typically used in a clustering of servers that are dedicated to a single task, such as:

1. File sharing

2. Web page serving and caching

3. SSL encrypting of Web communication

4. The transcoding of Web page content for smaller displays

5. Streaming audio and video content

Architecture:

A general blade server architecture is shown in the figure below. The hardware components of a blade server are the switch blade, chassis (with fans, temperature sensors, etc), and multiple compute blades. Some vendors offer, partner, or plan to partner with companies that provide application specific blades that provide traffic conditioning, protection, or network processing prior to the traffic reaching the compute blades. Often, these application specific blades may be functionally positioned between the switch blade and compute blades. However, these blades reside in a standard compute blade slot.
The outside world connects through the rear of the chassis to a switch card in the blade server. The switch card is provisioned to distribute packets to blades within the blade server. All these components are wrapped together with network management system software provided by the blade server vendor. The network management could be done through Message Passing which essentially makes blade servers an extension of message passing.

[[Image:K:\personal\ECE_506\BladeServer.JPG]]

Fig 1: Blade Server Architecture (Courtesy: Blade Servers: Evolution and revolution by Curtis A Schwaderer)

Evolution:

The name blade server appeared when a card included the processor, memory, I/O and non-volatile program storage (flash memory or small hard disk(s)). This allowed a complete server, with its operating system and applications, to be packaged on a single card / board / blade. These blades could then operate independently within a common chassis, doing the work of multiple separate server boxes more efficiently. Less space consumption is the most obvious benefit of this packaging, but additional efficiency benefits have become clear in power, cooling, management, and networking due to the pooling or sharing of common infrastructure to supports the entire chassis, rather than providing each of these on a per server box basis.
Blade servers date back to 1970s. The evolution chronology as provided by Wikipedia (Article: Blade Servers) is given below:
Complete microcomputers were placed on cards and packaged in standard 19-inch racks in the 1970s soon after the introduction of 8-bit microprocessors. This architecture was used in the industrial process control industry as an alternative to minicomputer control systems. Programs were stored in EPROM on early models and were limited to a single function with a small realtime executive.

The VMEBus architecture (ca. 1981) defined a computer interface which included implementation of a board-level computer that was installed in a chassis backplane with multiple slots for pluggable boards that provide I/O, memory, or additional computing. The PCI Industrial Computer Manufacturers Group PICMG developed a chassis/blade structure for the then emerging Peripheral Component Interconnect bus PCI which is called CompactPCI. Common among these chassis based computers was the fact that the entire chassis was a single system. While a chassis might include multiple computing elements to provide the desired level of performance and redundancy, there was always one board in charge, one master board coordinating the operation of the entire system. PICMG expanded the CompactPCI specification with the use of standard Ethernet connectivity between boards across the backplane. The PICMG 2.16 CompactPCI Packet Switching Backplane specification was adopted in Sept 2001 (PICMG specifications). This provided the first open architecture for a multi-server chassis. PICMG followed with the larger and more feature rich AdvancedTCA specification targeting the telecom industry's need for a high availability and dense computing platform with extended product life (10+ years). While AdvancedTCA system and board pricing is typically higher than blade servers, AdvancedTCA suppliers claim that low operating expenses and total cost of ownership can make AdvancedTCA-based solutions a cost effective alternative for many building blocks of the next generation telecom network.

Future:

Early versions of server blades will be primarily high-density, low-power devices with relatively low performance. This type of blade is suited for first-tier applications such as static Web servers, security, network services, and streaming media because the applications can be easily and inexpensively load balanced. The performance of an application depends on the aggregate performance of the servers rather than the performance of an individual server.
Higher performance, less dense blade designs will help drive blade usage into more mainstream applications in the corporate data center. These designs can offer the individual performance characteristics and features available in today's rack-dense servers along with the cost, deployment, serviceability, and density benefits of server blades. The blades will be well suited to high-performance Web servers, dedicated application servers, server-based or thin-client computing, and high-performance computing (HPC) clusters.

The introduction of server blades and associated technology like IB will usher in a new IT infrastructure. IT managers should start planning now for server blade installations by evaluating IP-based storage solutions, remote software provisioning and management solutions, scale-out architectures, and load-balancing technologies