1.3 Typical Structure
Early message passing machines provided hardware primitives that were very close to the simple send/receive user-level communication abstraction, with some additional restrictions. A node was connected to a fixed set of neighbors in a regular pattern by point-to-point links that behaved as simple FIFOs. Most early machines were hypercubes, where each node is connected to n other nodes differing by one bit in the binary address, for a total of 2^n nodes, or meshes, where the nodes are connect to neighbors on two or three dimensions. The network topology was especially important in the early message passing machines, because only the neighboring processors could be named in a send or receive operation. The data transfer involved the sender writing into a link and then writing the message until the receiver started reading it, so the send would block until the receive occurred. In modern terms this is called synchronous message passing because the two events coincide in time. The details of moving data were hidden from the programmer in a message passing library, forming a layer of software between send and receive calls and the actual hardware.
The direct FIFO design was soon replaced by more versatile and more robust designs which provided direct memory access (DMA) transfers on either end of the communication event. The use of DMA allowed non-blocking sends, where the sender is able to initiate a send and continue with useful computation (or even perform a receive) while the send completes. On the receiving end, the transfer is accepted via a DMA transfer by the message layer into a buffer and queued until the target process performs a matching receive, at which point the data is copying into the address space of the receiving process. The physical topology of the communication network dominated the programming model of these early machines and parallel algorithms were often stated in terms of a specific interconnection topology, e.g., a ring, a grid, or a hypercube. However, to make the machines more generally useful, the designers of the message layers provided support for communication between arbitrary processors, rather than only between physical neighbors. This was originally supported by forwarding the data within the message layer along links in the network. Soon this routing function was moved into the hardware, so each node consisted of a processor with memory, and a switch that could forward messages, called a router. However, in this store and forward approach the time to transfer a message is proportional to the number of hops it takes through the network, so there remained an emphasis on interconnection topology.
The emphasis on network topology was significantly reduced with the introduction of more general purpose networks, which pipelined the message transfer through each of the routers forming the interconnection network. In most modern message passing machines, the incremental delay introduced by each router is small enough that the transfer time is dominated by the time to simply move that data between the processor and the network, not how far it travels.This greatly simplifies the programming model; typically the processors are viewed as simply forming a linear sequence with uniform communication costs.A processor in a message passing machine can name only the locations in its local memory, and it can name each of the processors, perhaps by number or by route. A user process can only name private addresses and other processes; it can transfer data using the send/receive calls.