2D Discrete Convolution



I'm working on a program that is heavily dependent on the 2-D discrete
convolution.  The complete program needs to perform the convolution many
times.

I decided to work in maxima to take advantage of bfloats(I'm implementing an
iterative algorithm that is sensitive to round off error).   I implemented
the convolution function using for loops as shown below, but this turns out
to be extremely slow...especially for the size matrix I'd like to work with
(40x40).  In addition to running compile(conv2), is there anything I can do
to speed the computation up in maxima?

conv2(A,B) :=block([C,SA,SB,SPM,SPN,PA,PB],
SA : msize(A),  <--computes the size of matrix A
SB : msize(B),
SPM : SA[1]+SB[1]-1,
SPN : SA[2]+SB[2]-1,
PA : PAD(SPM,SPN,A),    <--resizes A
PB : PAD(SPM,SPN,B),    <--resizes B
C : zeromatrix(SPM,SPN),
for i : 1 thru SPM do(
for j : 1 thru SPN do(
for m : 1 thru i do(
for n : 1 thru j do(
C[i,j] : C[i,j]+PA[m,n]*PB[i-m+1,j-n+1])))),
return(C));


Thanks,
Chris