I am trying to benchmark the blas routines dgemv and dgemm in Fortran. For that I have written this simple codes:
matmul.f90:
program test
implicit none
integer :: i, s
real(8), allocatable :: a(:,:), b(:,:), c(:,:), x(:), y(:)
real(8) :: t1, t2
print *, "Matrix-Matrix multiplication"
do i=1,9
s = 2**i
if (allocated(a)) deallocate(a)
if (allocated(b)) deallocate(b)
if (allocated(c)) deallocate(c)
allocate(a(s,s), b(s,s), c(s,s))
call random_number(a)
call random_number(b)
call cpu_time(t1)
c = matmul(a, b)
call cpu_time(t2)
write(*,*) s, t2-t1
end do
print *, "Matrix-vector multiplication"
do i=1,9
s = 2**i
if (allocated(a)) deallocate(a)
if (allocated(x)) deallocate(x)
if (allocated(y)) deallocate(y)
allocate(a(s,s), x(s), y(s))
call random_number(a)
call random_number(x)
call cpu_time(t1)
y = matmul(a, x)
call cpu_time(t2)
write(*,*) s, t2-t1
end do
end program
blas.f90:
program test
implicit none
integer :: i, s
real(8), allocatable :: a(:,:), b(:,:), c(:,:), x(:), y(:)
real(8) :: t1, t2
print *, "Matrix-Matrix multiplication"
do i=1,9
s = 2**i
if (allocated(a)) deallocate(a)
if (allocated(b)) deallocate(b)
if (allocated(c)) deallocate(c)
allocate(a(s,s), b(s,s), c(s,s))
call random_number(a)
call random_number(b)
call cpu_time(t1)
c = matmul(a, b)
call cpu_time(t2)
print *, s, t2-t1
end do
print *, "Matrix-vector multiplication"
do i=1,9
s = 2**i
if (allocated(a)) deallocate(a)
if (allocated(x)) deallocate(x)
if (allocated(y)) deallocate(y)
allocate(a(s,s), x(s), y(s))
call random_number(a)
call random_number(x)
call cpu_time(t1)
call dgemv('n', s, s, 1.d0, a, s, x, 1, 0.d0, y, 1)
call cpu_time(t2)
print *, s, t2-t1
end do
end program
Note that in the blas program i am still using matmul and i have only changed the matrix-vector multiplication
I compile both codes with:
gfortran -o matmul matmul.f90
gfortran -o blas blas.f90 -lblas
I would expect the times of the matrix-matrix multiplication to be the same, but the results I get are (i write them side-by-side):
Matrix size matmul.f90 blas.f90
2 4.0000000000000105E-005 1.7969999999999930E-003
4 1.0999999999999725E-005 1.2780000000000014E-003
8 2.3000000000000017E-005 8.0599999999997340E-004
16 4.8000000000000299E-005 2.4819999999999842E-003
32 1.1799999999999962E-004 5.3250000000000242E-003
64 3.4200000000000029E-004 1.7175999999999969E-002
128 1.2079999999999999E-003 5.3132000000000013E-002
256 4.7889999999999999E-003 0.20821500000000004
512 1.2796999999999999E-002 1.1222690000000000
Apparently the behavior of the matmul subroutine has changed in the second program and i am not really sure why. Could someone clarify this behavior to me?
Thanks!
c - matmul(a,b)to see which one given the incorrect result.-fexternal-blasoption? It will map a few Fortran subprograms such asmatmulto a BLAS routines. There is also the-finline-matmul-limit=Noption that controls the size of matrices/arrays for which gfortran will do inlining.matmulwith or without-lblasmatmulorblasas you are finding. However, in the third computer where i was testing i keep getting the result i posted above. It is definitely something on my side and i will have to investigate what could have gone wrong... Thanks!