Overview

Computational scientists working on behalf of the Department of Energy’s Office of Science (DOE SC) are exploiting a new generation of petascale computing resources to make previously inaccessible discoveries in a broad range of disciplines including physics, chemistry, and material science. The computational systems underpinning this work will increase in performance potential from tens to hundreds of PFlop/s, but in the process will evolve significantly from those in use today. Although Moore’s law continues unabated, the end of Dennard scaling has necessitated a fundamental shift in computer architecture focused on power efficiency. To that end, processors are increasingly varied as they strive to satisfy performance, productivity, reliability, and energy efficiency in the face of divergent computational requirements. Today, we see three major offerings: those built from commodity processors; those built from processors specialized for energy-efficient HPC (IBM Blue Gene/Q); and those built with accelerators (e.g., GPUs). The diversity among these machines presents a number of challenges to merely porting today’s scientific applications, much less achieving good performance. Extrapolating five years, we anticipate vastly increased scale (e.g., more chips, 4-8x the cores per chip, wider SIMD) and heterogeneity will exacerbate performance optimization challenges while simultaneously promoting the issues of energy consumption and resilience to the forefront. Just as today’s DOE computing centers incentivize performance optimization through finite computing allocations, they may similarly incentivize energy-efficiency by reducing the charges (in terms of CPU hours) for reduced-power jobs. Moreover, as DRAM-replacements (e.g., phase change, resistive, spin-transfer torque) appear in DOE’s leadership-class systems, computational scientists must learn to exploit the resultant asymmetric read/write bandwidths and latencies. Thus, it is imperative that application scientists be provided with solutions to productively maximize performance, conserve energy, and attain resilience.

To ensure that DOE’s computational scientists can successfully exploit the emerging generation of high performance computing (HPC) systems, the University of Southern California (USC) is leading the Institute for Sustained Performance, Energy, and Resilience (SUPER). We have chosen to organize a broadly-based project with expertise in compilers and other system tools, performance engineering, energy management, and resilience. Our accomplishments during the first two years of the project include an integrated autotuning tool framework, performance optimization of SciDAC applications, and strategies for reducing energy consumption and mitigating faults.