Introduction for Special Issue on Autotuning
Leonid Oliker and Richard Vuduc
Power constraints have been driving new generations of machine architectures with vast degrees of concurrency, unprecedentedly deep memory hierarchies, wide data parallelism, and heterogeneous acceleration technologies, among other techniques. Since the overall architectural design space is large and complex, emerging systems in the power-wall era are showing considerable diversity.
This increasing architectural diversity is already visible in today’s petascale systems, and will certainly be exacerbated as we move towards tomorrow’s exascale platforms. These state-of-the-art systems, like all computing platforms, are increasingly relying on software-controlled on- chip parallelism to manage the complex tradeoffs between performance, power-efficiency, and reliability. Consequently, the programming challenges to attain the necessary productivity, performance, energy-efficiency, and portability have become much more daunting. Progress in high-end computing will suffer without productive programming models and tools that allow efficient utilization of these systems.
Thus, this special issue focuses on state-of-the-art methods for addressing these challenges via automated performance tuning (autotuning) techniques, which hold great promise for increasing both programming productivity, as well as performance, on emerging architectural designs. The basic idea behind autotuning is to think of a computation or program as having many possible implementations, of which the goal of the autotuning system is to select the best one guided by a combination of models and empirical search (i.e., running a candidate implementation). The major research problems center around two major themes, namely (i) how to identify and enumerate the space of candidates; and (ii) how to search this space effectively. Examples of research in these two thrusts include how to generalize a compiler infrastructure to generate many candidates, and how to use machine learning to identify the fastest or most energy-efficient implementation.
This special issue begins with several articles that facilitate the autotuning process. Basu et al. review the state-of- the-art, exploring the barriers to autotuning adoption and future research areas to break down these limitations. The authors discuss specific challenges and opportunities in integrating both compiler and programming intelligence into the autotuning framework. A similar challenge is explored in the paper by Chen et al., which considers how to reduce the cost of autotuning integration. The authors’ framework analyzes the target application at the source level and infers information regarding potential tunable parameters, thus greatly reducing the burden of producing a tunable parameter space. Chaimov et al. then discuss tool technology for the autotuning challenges, which enables reuse of generated performance data. Results show that this knowledge can be leveraged for runtime selection of specialized function variants and the reduction of autotuning evaluation phases.
Our special issue then discusses novel approaches for improving important components of the autotuning infrastructure. Tavarageri et al. examine adaptive parametrically tiled code generation for parallel contexts, which allows tile sizes to be changed on-the-fly during execution. This methodology enables several tile sizes to be evaluated within an individual run, thus enhancing the potential of autotuning for numerous important calculations.
The next two articles consider ‘‘domain-specific’’ auto- tuning, in the specific context of dense linear algebra. Fabregat-Traver et al. address the problem of automatically generating algorithms for linear algebra operations by leveraging problem-specific knowledge. This work presents the design of a domain-specific compiler that makes use of input information to perform data dependency-analysis, identify redundant computation, and reuse intermediate results, to significantly improve performance behavior of certain classes of algorithms. The second article by Marker et al. takes a similarly top-down approach that uses transformations to encode expert domain knowledge in a form that allows automatic derivations of dense linear algebra implementations. This methodology generates a space of semantically equivalent versions from a high-level understanding of the underlying algorithms and prunes the resultant search space that contains suboptimal algorithms.
Finally, in order to enable autotuning framework development, a critical step is to gain a detailed understanding of the benefits and tradeoffs of key optimization methods. The case study by Ibrahim et al. analyzes a gyrokinetic fusion code and explores the impact of multi- and manycore-centric optimization schemes across a broad spectrum of modern processor technologies. The results demonstrate the significant challenges of effective thread parallelization, data locality and fine-grained data synchronization schemes, while exploring memory reduction and novel performance optimization techniques.
By way of historical note, this special issue represents a snapshot of current research as presented at the last of a series of technical meetings sponsored by the US Department of Energy (DOE). That series, organized through the Center for Scalable Application Development Software (CScADS) as a set of annual summer workshops, served in part to build and expand the community of autotuning (among several other topic areas) through common soft- ware infrastructures and standards. Several of the infrastructures covered by this special issue aim to act in this role as community-driven frameworks. Numerous research and funding activities are represented by the studies described in this special issue. In particular, several of the efforts are supported under the DOE SciDAC Institute for Sustained Performance, Energy, and Resilience (SUPER).
As computing systems evolve towards more complex and diverse designs on the road to exascale, the role of autotuning will no doubt play an increasingly critical role. The approaches described in these articles constitute key technology areas of great promise to improve the programmer productivity and system utilization for next-generation platforms. We hope you enjoy this special issue.