1

The function below is used in my program the most.

    CONTAINS
     SUBROUTINE Delta4(R,Del)
      REAL(DP), DIMENSION(:), INTENT(IN) :: R
      REAL(DP), DIMENSION(:), INTENT(OUT) ::  Del
      INTEGER(I4B)  :: i, j, r_n, c_n
      REAL(DP) :: ar
      !r_n=size(R,1);
      Del=0.0_dp
      do i=1,4; ar=ABS(R(i))
          if (ar <= 1.0_dp) then
            Del(i)=(3.0_dp-2.0_dp*ar+  &
              sqrt(1.0_dp+4.0_dp*ar-4.0_dp*ar*ar))*0.125_dp
          else !! if (1.0_dp < ar .and. ar <= 2.0_dp)  then
            Del(i)=(5.0_dp-2.0_dp*ar-  &
              sqrt(-7.0_dp+12.0_dp*ar-4.0_dp*ar*ar))*0.125_dp
          end if
      end do

R, Del : length 4 vector

So I want to improve the speed of this function.
I as far as know, the if-else branch is slow. Furthermore, it is in a do loop. How can I optimize it?

3
  • I have added an interesting improvement. Commented Mar 13, 2019 at 8:55
  • You say that if-else branch is slow : what measurement did you do ? Optimizing without measuring is non-sense ! Commented Mar 13, 2019 at 17:26
  • Sorry Francois, you're right. I think if-else is slow, because it is Conditional statement. Commented Mar 14, 2019 at 1:58

1 Answer 1

2

IMO there is really little that you can gain on this function, which is essentially a bunch of arithmetic operations.

You can absorb the *0.125_dp in the other constants.

Better, you can rewrite the computation as (pseudocode)

ar= 1 + 2 * ((ar > 1) - ar)
Del= (2 + ar + sqrt(1 - ar * ar)) * 0.125

This assumes an implicit conversion of .false. to 0 and .true. to 1, which may not hold with your compiler. Hopefully, it is compiled as branchless.

As the vector length is just four, you can completely unroll the loop (but maybe the compiler already does it).

My bet is that won't make a visible difference.

To improve performance, you should review the code from a more global perspective, and maybe consider parallelization.


Update:

As pointed by @kvantour, I dropped the change of sign. We can fix with

i= 2 * (ar > 1) - 1
ar= i + 2 * (1 - ar)
Del= (2 + ar + i * sqrt(1 - ar * ar)) * 0.125

Alternatively,

i= SIGN(1, ar - 1)

if that turns out to be efficient.

Sign up to request clarification or add additional context in comments.

2 Comments

Be aware that there is a minus sign in one of the sqrt's.
@kvantour: ouch, yes. Fixing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.