0

I would like to optimize for speed the following block of code:

DO i=1, dim1
  DO j=1, dim2
    DO k=1, dim3
      IF (A(k,j,i)>0) &
        B(k,j,i) = exp(C(k))/A(k,j,i)
    ENDDO
  ENDDO
ENDDO

Very importantly, A is an INTEGER and B and C are COMPLEX!

There are two issues: 1) How to replace this by a BLAS/LAPACK call? The issue is the condition. 2) Calculation of the exp is slow. How to speed that up?

4
  • Please ask one question per post. Commented Sep 19, 2018 at 11:20
  • 2
    Do you have any reason to think that the code can be sped up? One limit to consider is the memory bandwidth. How dense is the condition on A (proportion of positive elements here)? exp is slower than simpler arithmetics, I see no way around it. Also, you can use the where-end where construct to express the conditional. Commented Sep 19, 2018 at 11:25
  • 5
    Given you only need to evaluate exp(c(k)) at most dim3 times, you could look at not doing it dim1*dim2*dim3 times. Commented Sep 19, 2018 at 11:32
  • [Assuming C(k) is independent of the necessary things of course.] Commented Sep 19, 2018 at 13:44

2 Answers 2

3

I ran a couple of tests with idim[1-3] being various permutation of [40,40,1000] and found that using a temporary array for the exponential and keeping the original loop ordering to be fastest by a factor of 2 or more than the other answer supplied. You milage may vary with compiler etc.

d=exp(c)
DO i=1, dim1
  DO j=1, dim2
    DO k=1, dim3
      IF (A(k,j,i)>0) &
        B(k,j,i) = d(k)/A(k,j,i)
    ENDDO
  ENDDO
ENDDO
Sign up to request clarification or add additional context in comments.

1 Comment

I see. It makes sense to pre-compute the exp and store the result. Changing the loop order is not efficient because then accessing A is very slow. Thanks !
2
DO k=1, dim3
  expCk= exp(C(k))
  DO i=1, dim1
    DO j=1, dim2
      IF (A(k,j,i)>0) &
        B(k,j,i) = expCk/A(k,j,i)
    ENDDO
  ENDDO
ENDDO

I don't think that any BLAS/LAPACK function can be of help here. Inversion of matrix elements is not an operation encountered in linear algebra problems.

8 Comments

@francescalus: right, a diagonal matrix is a special case of banded. I would expect that adapting the matrix storage to this case will require twists. Also need to check if pure diagonal matrices are supported, and what the behavior is in case of zero denominators.
Might be worth noting that this is only faster if A is 'typically' nonzero. If it's almost exclusively 0, then some expCk values will never be needed.
And one has to be careful that c isn't a function or if it is, behaves nicely. (Question simply said complex, not complex array.)
@francescalus: that would make no difference.
@Ross: faster is highly unlikely, it only occurs if there is less than one nonzero per index k. But even so, the single evaluation of expCk is probably negligible in front of the dim1 * dim2 tests for zero.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.