Programming Methodologies beyond petascale, based on adaptive runtime systems
Joint Laboratory for Petascale Computing Workshop (JLPC) 2009
Publication Type: Talk
Repository URL:
Download:
Summary
Multiple PetaFLOPS class machines have appeared during the past year, and many multi-PetaFLOPS machines are on the anvil. It will be a substantial challenge to make existing parallel CSE applications run efficiently on them, and even more challenging to design new applications that can effectively leverage the large computational power of these machines. Multicore chips and SMP nodes are becoming popular and pose challenges of their own. Further, a new set of challenges in productivity arise, especially if we wish to have a broader set of applications and people to use these machines. I will review a set of techniques, incorporated in the Charm++ system, that have proved useful in my group’s work, on multiple parallel applications that have scaled to tens of thousands of processors, on machines like Blue Gene/L, Blue Gene/P, Cray XT3 and XT4. These techniques were developed in the context of our experience with several applications ranging from quantum chemistry, biomolecular simulations, simulation of solid propellant rockets, and computational astronomy. I will identify new challenges and potential solutions for the performance issues. Issues presented by multicore chips and SMP nodes will also be addressed. Finally, I will review some new and old ideas for increasing productivity in parallel programming substantially.
People
Research Areas