Publications

Export 12 results:
Filters: Author is J. Dongarra  [Clear All Filters]
2012
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, "A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI", 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012), Christos Kaklamanis, Theodore Papatheodorou and Paul Spirakis eds., Springer-Verlag, Rhodes, Greece, August 27-31, 2012.
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, "An Evaluation of User-Level Failure Mitigation Support in MPI", roceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI, Springer, Vienna, Austria, September 23 - 26, 2012.
2011
nd Luszczek, L. H. P., and J. Dongarra, "Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energ Efficiency", International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, sep, 2011.
Bosilca, G., T. Herault, A. Rezmerita, and J. Dongarra, On Scalability for MPI Runtime Systems, , no. ICL-UT-11-05: Innovative Computing Laboratory, University of Tennessee, may, 2011.
2010
Bosilca, G., C. Coti, T. Herault, P. P. Lemarinier, and J. Dongarra, "Constructing Resilient Communication Infrastructure for Runtime Environments", Parallel Computing: From Multicores and GPU's to Petascale: IOS Press, pp. 441-451, 2010.
Ma, T., G. Bosilca, A. Bouteiller, B. Goglin, J. Squyres, and J. Dongarra, Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs, , no. UT-CS-10-663: Innovative Computing Laboratory, University of Tennessee, nov, 2010.
Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra, "Self-Healing Network for Scalable Fault-Tolerant Runtime Environments", Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, mar, 2010.
2009
Dongarra, J., G. Bosilca, R. Delmas, and J. Langou, "Algorithmic Based Fault Tolerance Applied to High Performance Computing", Journal of Parallel and Distributed Computing, vol. 69, no. 4, pp. 410-416, april, 2009.
2008
Chen, Z., and J. Dongarra, "Algorithm-Based Fault Tolerance for Fail-Stop Failures", IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, 2008.
2007
Langou, J., Z. Chen, G. Bosilca, and J. Dongarra, "Recovery Patterns for Iterative Methods in a Parallel Unstable Environment", SIAM Journal on Scientific Computing, vol. 30, no. 1, pp. 102-116, 2007.
2006
Chen, Z., and J. Dongarra, "Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources", 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006), Rhodes Island, Greece, 2006.