Khan, M M., P. Basu, G. Rudy, M. W. Hall, C. Chen, and J. Chame, "A script-based autotuning compiler system to generate high-performance CUDA code", ACM Transcations on Architectures and Code Optimization (TACO) , January 2013, vol. 9, issue 4, 2012.
Balaprakash, P., S. M. Wild, and B. Norris, "SPAPT: Search Problems in Automatic Performance Tuning", In Proceedings of the International Conference on Computational Science (ICCS 2012), vol. Procedia Computer Science, no. ANL/MCS-P1872-0411, pp. 1959--1968, 2012.
Ermler, W., J. Tilson, and R. J. Fowler, "Spin-orbit configuration interaction calculations of electronic spectra of RuO2+ and OsO2+ catalytic cores", Southwest Regional Meeting of the American Chemical Society (SWRMACS 2012), Baton Rouge, LA, 2012.
Rahman, S M F., J. Guo, A. Bhat, C. Garcia, M. H. Aque Sujon, Q. Yi, C. Liao, and D. J. Quinlan, "Studying the impact of application-level optimizations on the power consumption of multi-core architectures", Computing Frontiers Conference, 15 - 17 May 2012, Cagliari , Italy, Association for Computing Machinery , 2012.
Rahman, S M F., J. Guo, A. Bhat, C. Garcia, Q Y. Majedul Haque Sujon, C. Liao, and D. J. Quinlan, "Studying The Impact Of Application-level Optimizations On The Power Consumption Of Multi-Core Architectures", ACM International Conference on Computing Frontiers 2012 (CF'12), Cagliari, Italy, May 15th - 17th, 2012.
Mandal, A., R. Fowler, and A. Porterfield, "System-wide Introspection for Accurate Attribution of Performance Bottlenecks", Workshop on High-performance Infrastructure for Scalable Tools (WHIST), Venice, Italy, 06/2012.
Bosilca, G., T. Herault, A. Rezmerita, and J. Dongarra, On Scalability for MPI Runtime Systems, , no. ICL-UT-11-05: Innovative Computing Laboratory, University of Tennessee, may, 2011.
Olivier, S., A. Porterfield, K. Wheeler, and J. Prins, "Scheduling Task Parallelism on Multi-Socket Multicore Systems", International Workshop on Runtime and Operating Systems for Supercomputers, Tuson, AZ, USA, {ACM}, June, 2011.
Vo, A., S. Aananthakrishnan, G. Gopalakrishnan, B. R. de Supinski, M. Schulz, and G. Bronevetsky, "A Scalable and Distributed Dynamic Formal Verifier for MPI Programs", Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis SC '10: IEEE Computer Society Washington, DC, pp. 1-10, nov, 2010.
Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra, "Self-Healing Network for Scalable Fault-Tolerant Runtime Environments", Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, mar, 2010.
Lim, M Y., A. Porterfield, and R. Fowler, "SoftPower: Fine-Grain Power Estimations Using Performance Counters", The ACM International Symposium on High Performance Distributed Computing (HPDC), Chicago, IL, ACM, pp. 308-311, 2010.