Generalizing of a high performance parallel Strassen implementation on
Strassen's algorithm to multiply two n $\times n$ matrices reduces the asymptotic operation count from $O(n3)$ of the traditional algorithm to $O(n2.81)$, thus designing efficient parallelizing for this algorithm becomes essential. In this paper, we present our generalizing of a parallel Strassen implementation which obtained a very nice performance on an Intel Paragon: faster 20\% for n $\approx 1000$ and more than 100\% for n $\approx 5000$ in comparison to the parallel traditional algorithms (as Fox, Cannon). Our method can be applied to all the matrix multiplication algorithms on distributed memory computers that use Strassen's algorithm at the system level, hence it gives us compatibility to find better parallel implementations of Strassen's algorithm.
Full Text: PDF