CSC/ECE 506 Spring 2012/7b pk
TLB (Translation Lookaside Buffer) Coherence in Multiprocessing
Overview
Background - Virtual Memory, Paging and TLB
In this section we introduce basic terminology and the set-up where TLBs are used.
A process running on a CPU has its own view of memory - "Virtual Memory" (one-single space) that is mapped to actual physical memory. Virtual memory management scheme allows programs to exceed the size of physical memory space. Virtual memory operates by executing programs only partially resident in memory while relying on hardware and the operating system to bring the missing items into main memory when needed.
Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method involves breaking physical memory into fixed-size blocks called frames and breaking the virtual memory into blocks of the same size called pages. When a process is to be executed its pages are loaded into any available memory frames from the backing storage (for example-a hard drive).
Each process has a page table that maintains a mapping of virtual pages to physical pages. A page table is a software construct managed by operating system that is kept in main memory. For a process, a pointer to the page table (Page-Table Base Register) is stored in a register. Changing page table requires changing only this one register, substantially reducing the context-switch time. (For a page table, with virtual page number that is 4 bytes long (32 bits) can point to 2^32 physical page frames. If frame size is 4Kb then a system with 4-byte entries can address 2^44 bytes (or 16 TB) of physical memory).
The hardware support for paging is shown in figure. Every address is divided into two parts – a page number (p) and a page offset (d). The page number is used as an index into the page table. The page table contains the base address of each page in physical memory. The base address is combined with the page offset to define the physical memory address that is sent to the memory unit. The problem with this approach is the time required to access a user memory location. If we want to access location, we must first index into the page table (that requires a memory access of PTBR) and then find the frame number from the page table and thereafter access the required frame number. The memory access is slowed down by a factor of 2.
The standard solution to this problem is to use a special, small fast lookup hardware cache called a Translation Look-Aside Buffer (TLB). The TLB is fully associate, high-speed memory. Each entry in TLB consists of two parts –a key (tag) and a value. When the associative memory is presented with an item, the item is compared with all keys simultaneously. If the item is found, the corresponding value field is returned. The search is fast; the hardware, however is expensive. Typically, the number of entries in a TLB is small, often numbering between 64 to 1024.
The TLB is used with page tables in the following way. The TLB contains only a few of the page-table entries. When a logical address is generated by the CPU, its page number is presented to the TLB. If the page number is found its frame number is immediately availble and is used to access memory.
If the page number is not in the TLB (known as a TLB miss), a memory reference to the page table must be made. When the frame is obtained, we can use it to access memory. In addition, we add the page number and frame number to the TLB, so that they will be found quickly on the next reference. If the TLB is already full of entries, the operating system must select one for replacement. Replacement policies range from least recently used (LRU) to random.
A page table typically has an additional bit per entry to indicate whether the frame number is read only or both read-write (protection level). Another bit valid-invalid is also added to indicate whether the frame is part of that particular process.
Mutilevel pages tables are often used rather than single page table per process. The first level page table points to the entry in the next level page table and so on until the mapping of virtual to physical page is obtained from the last level page table.
It may be noted that caches typically use "Virtual Indexed Physically Tagged" solution that enables simultaneous look-up of both L-1 cache and TLB. If the page block is not found in L-1 cache, a TLB look-up of address is used to refresh the page block. Simultanous look-up of both L-1 cache and TLB makes the process efficient.Solihin
TLB coherence problem in Multiprocessing
TLB coherence through Shootdown
TLB coherence through invalidation
Other TLB coherence solutions
Unified Cache and TLB coherence solution
Links
References
1. Operating System Concepts - Silberschatz, Galvin, Gagne. Seventh Edition, Wiley publication
2. Parallel Compute Architecture - A Hardware software approach - David E. Culler, Jaswinder Pal Singh, Anoop Gupta
3. Fundamentals of Parallel Computer Architecture, Yan Solihin