We evaluate the idea of virtualization and its usefulness in helping a programmer to write applications with high degree of parallelism. We demonstrate it by developing several mini-applications with million-way parallelism. We show that Charm++ and AMPI (an extension to MPI) with migratable objects and support for load balancing are suitable programming model for programming very large machines.
It is important to understand the performance of parallel applications on very large parallel machines. This thesis explores Parallel Discrete Event Simulation (PDES) techniques with an optimistic synchronization protocol to simulate parallel applications running on a very large number of processors. We optimize the synchronization protocol by exploiting the inherent determinacy that is normally found in parallel applications to reduce the synchronization overhead significantly.
Load balance problems present significant challenges to applications to achieve scalability on very large machines. We study load balancing techniques and develop a spectrum of load balancing strategies based on studies of the characteristics of applications. These load balancing strategies are motivated by applications such as LeanMD, NAMD (both are classical molecular dynamics applications) and Fractography3D (a dynamic 3D crack propagation simulation program). We optimize our load balancing strategies in multiple dimensions of criteria such as load balancing for improving communication locality, sub-step load balancing, and computation phase-aware load balancing.
We further study the performance of existing load balancing strategies in the context of very large parallel machines using the parallel simulator we developed. We demonstrate the weaknesses of the centralized and fully distributed load balancing schemes via large scale simulation, and design a new scalable hierarchical load balancing scheme suitable for such large machines. This hierarchial load balancing scheme builds load data from instrumenting an application at run-time on both computation and communication pattern in a fully automatic way. The hierarchical load balancing takes application communication pattern into account explicitly in decision making. It also incorporates an explicit memory cost control function to make it easy to adapt to extremely large number of processors with small memory footprint.