Skip to main content.
Advanced search >
<< Back to previous page Print

<< Wednesday, February 06, 2013 >>

Remind me

Tell a friend

Add to my Google calendar (bCal)

Download to my calendar

Bookmark and ShareShare

Communication-Avoiding Parallel Algorithms for Dense Linear Algebra and Tensor Computations: Scientific Computing and Matrix Computations Seminar

Seminar: Departmental | February 6 | 12:10-1 p.m. | 380 Soda Hall

Edgar Solomonik, UC Berkeley

Electrical Engineering and Computer Sciences (EECS)

The motivating electronic structure calculation methods for this work are Density Functional Theory (DFT), which employs dense linear algebra, and Coupled Cluster, a method for highly correlated systems, which relies heavily on contractions of symmetric tensors. I will introduce 2.5D algorithms, an extension of 3D algorithms, which are designed to minimize communication between processors. These parallel algorithms employ limited data-replication to asymptotically lower communication costs with respect to standard (ScaLAPACK/Elemental) 2D algorithms. In particular, we can reduce the amount of data sent along the critical path of execution in matrix multiplication, LU, Cholesky, and QR factorizations, triangular solve, and the symmetric eigenvalue problem. The amount of messages sent is reduced for some of these algorithms but increased for others. This interesting discrepancy will be justified by lower-bound proofs which show the interdependence of latency and bandwidth costs. The algorithms are practical, which we demonstrate by presenting large-scale parallel results of a subset of these algorithms.

Time permitting, I will go on to discuss the extension of these method to tensor contractions and considerations for multi-dimensional symmetries. These new tensor contraction algorithms are being implemented in a new parallel software library, Cyclops Tensor Framework, which already supports Coupled Cluster with single and double excitations (CCSD). This software scales on the new BlueGene/Q architecture and outperforms the CCSD implementation of NWChem on Cray XE6., 510-516-4321