Matrices Over Runtime Systems @ Exascale

Authors: Emmanuel Agullo (French Institute for Research in Computer Science and Automation), Olivier Aumage (French Institute for Research in Computer Science and Automation), George Bosilca (University of Tennessee, Knoxville), Bérenger Bramas (French Institute for Research in Computer Science and Automation), Alfredo Buttari (French National Center for Scientific Research and Toulouse Institute of Computer Science Research), Olivier Coulaud (French Institute for Research in Computer Science and Automation), Eric Darve (Stanford University), Jack Dongarra (University of Tennessee, Knoxville), Mathieu Faverge (French Institute for Research in Computer Science and Automation), Nathalie Furmento (French National Center for Scientific Research and Toulouse Institute of Computer Science Research), Luc Giraud (French Institute for Research in Computer Science and Automation), Abdou Guermouche (University of Bordeaux and French Institute for Research in Computer Science and Automation), Julien Langou (University of Colorado Denver), Florent Lopez (French National Center for Scientific Research and Toulouse Institute of Computer Science Research), Hatem Ltaief (King Abdullah University of Science & Technology), Samuel Pitoiset (French Institute for Research in Computer Science and Automation), Florent Pruvost (French Institute for Research in Computer Science and Automation), Marc Sergent (French Institute for Research in Computer Science and Automation), Samuel Thibault (University of Bordeaux and French Institute for Research in Computer Science and Automation), Stanimire Tomov (University of Tennessee, Knoxville)

Abstract: The goal of the Matrices Over Runtime Systems @ Exascale (MORSE) project is to design dense and sparse linear algebra methods that achieve the fastest possible time to an accurate solution on large-scale multicore systems with GPU accelerators, using all the processing power that future high end systems can make available. We propose a framework for describing matrix algorithms with a sequential expression at a high level of abstraction and delegating the actual execution to a runtime system. In this poster we show that this model allows for (1) achieving an excellent scalability on heterogeneous clusters, (2) designing advanced numerical algorithms and (3) being compliant with standards such as OpenMP 4.0 or possible extensions of this standard. We illustrate our methodology on three classes of problems: dense linear algebra, sparse direct methods and fast multipole methods. The resulting codes have been incorporated into the Chameleon, qr_mumps and ScalFMM solvers, respectively.

Poster: pdf
Two-page extended abstract: pdf

Poster Index