Fox Matrix Multiply Algorithm


The fox algorithm is a matrix multiply algorithm that uses a submatrix block cyclic data distribution.

The communication pattern is asymmetrical: rows broadcast, columns rotate.
(Note: The Cannon algorithm is similar to Fox, but uniformly does rotation).

Fox (and Cannon) treatments make the following assumptions:

(The square matrix requirement can be relaxed).

The fox algorithm:

    q = sqrt(p) // number of rows, cols in processor grid

    // A is operand 1, B is operand 2 in A * B
    // C is result

    // i,j = process row, column

    // src, dest rows for rotating 'up'
    src  = i+1 mod q;
    dest = i-1 mod q;

    for (stage = 0; stage < q; stage++) {
        k_bar = (i+stage) mod q;
        broadcast(A[i,k_bar]) to row i;
	C[i,j] = C[i,j] + A[i,k_bar]*B[k_bar,j]
        sendrecv(B[k_bar,j],src,dest);
    }   
Here is Pacheco's code