CSC/ECE 506 Fall 2007/wiki2 3 pa: Difference between revisions
No edit summary |
|||
Line 23: | Line 23: | ||
For Computational efficiency LAMMPS uses neighbor lists to keep track of the neighboring particles. The lists are optimized for systems with particles that are repulsive at short distances, so that the local density of particles never becomes too large. On parallel machines, LAMMPS uses spatial-decomposition techniques to partition the simulation domain into small 3d sub-domains, one of which is assigned to each processor. | For Computational efficiency LAMMPS uses neighbor lists to keep track of the neighboring particles. The lists are optimized for systems with particles that are repulsive at short distances, so that the local density of particles never becomes too large. On parallel machines, LAMMPS uses spatial-decomposition techniques to partition the simulation domain into small 3d sub-domains, one of which is assigned to each processor. | ||
==='''Mapping''' === | ==='''Mapping''' === |
Revision as of 04:03, 24 September 2007
Topic: Parallelizing an application
Pick another parallel application, not covered in the text, and less than 7 years old, and describe the various steps in parallelizing it (decomposition, assignment, orchestration, and mapping). You may use an example from the peer-reviewed literature, or a Web page. You do not have to go into great detail, but you should describe enough about these four stages to make the algorithm interesting.
LAMMPS Algorithm:
The LAMMPS(Large Scale Atomic/Molecular Massively Parallel System) algorithm is a classical molecular dynamics code developed at Sandia National Labs, New Mexico. This algorithm models the ensemble of particles in a solid, liquid or gaseous state.
Sequential Algorithm:
The initialization step sets up the various parameters for the atom like number of particles, initial velocity, temperature etc. This algorithm is performed for every atoms.
Decomposition & Assignment
The LAMMPS algorithm provides two levels of concurrency in a single time step just like the ocean problem. The function parallelism is performed across the grid where the parameters like the force, energy, temperature , pressure etc of the atom is computed. The data parallelism is performed for the function but with different data sets.
The LAMMPS algorithm decompose domain into a set of equal sized boxes. Since nearby atoms are placed on same processor, only neighboring atoms on different processor need to be communicated. Communication is minimized to optimal level by replicating force computations of boundary atoms. To increase computational efficiency the algorithm uses different timescales for different force computations.
Orchestration
For Computational efficiency LAMMPS uses neighbor lists to keep track of the neighboring particles. The lists are optimized for systems with particles that are repulsive at short distances, so that the local density of particles never becomes too large. On parallel machines, LAMMPS uses spatial-decomposition techniques to partition the simulation domain into small 3d sub-domains, one of which is assigned to each processor.
Mapping
The LAMMPS suite utilizes the Message-Passing parallel computing model. This implies that each processor has a copy of all the data. It performs its operations and it sends, receives and broadcasts data as necessary. The LAMMPS suite defines a Universe where all the processors belong. The LAMMPS suite defines a number of Worlds in case different unrelated simulations should run. However, if all the processors available are used to tackle a single problem, then the Universe is said to contain one World. Each processor has its own copy of the LAMMPS suite and it knows some information about the Universe such as its processor ID, the number of processors in the Universe, the World it belongs to, the number of processors in its world and the total number of worlds. In each world, there exists a processor, which is called the Root processor. Also, the Message-Passing interface is defined for each processor to enable it to communicate with its six neighboring processors in its three-dimensional world.