1

I want to find powers of a relatively small matrix, but this matrix consists of rational numbers of type Rational{BigInt}. By default, Julia utilizes only a single thread for such computations. I want to check if using multithreaded matrix multiplication would yield performance gains. How do I do this?

Below is an example of raising 32x32 matrix to the power of four. If I run it on i7-12700k it uses only one thread:

using Random
using LinearAlgebra

Random.seed!(42)

M = BigInt.(rand(Int128, 32, 32)) .// BigInt.(rand(Int128, 32, 32));

BLAS.set_num_threads(8)

@time M^4;

the output is:

19.976082 seconds (1.24 M allocations: 910.103 MiB, 0.19% gc time)

With just Float64 and big matricies I can see Julia correctly uses multiple threads.

N = rand(2^14,2^14)

@time N^4;

32.764584 seconds (1.71 M allocations: 4.113 GiB, 0.08% gc time, 1.14% compilation time)
4
  • 1
    You need to check the installed BLAS library you use is optimized and multithreaded (not all implementation are). Which one do you use? On Linux, you can track loaded libraries with strace. Commented Feb 10, 2024 at 23:17
  • 3
    I don't think BLAS does anything for Rational{BigInt}, this will be Julia's fallback routine, which is single threaded. Tullio.jl can do this multi-threaded, or a simple loop with Threads.@threads is the same speed. Commented Feb 11, 2024 at 5:35
  • 1
    BLAS only supports floating point (single/double and complex), it doesn't even support native integers, Rational{BigInt} is out of the question, being a native Julia number format. Commented Feb 11, 2024 at 8:20
  • 1
    maybe write the matrix multiplication yourself. on a 32x32 you're probably not losing much with a simple algorithm, and you'll gain because you can parallelize it Commented Feb 11, 2024 at 12:04

1 Answer 1

3

As noted in comments above, BLAS isn't involved in this at all.

Since I have it, here's a very simple multi-threaded function:

julia> M3 = @time M^3;
  8.113582 seconds (1.24 M allocations: 644.222 MiB, 0.60% gc time)

julia> function mul_th(A::AbstractMatrix, B::AbstractMatrix)
         C = similar(A, size(A,1), size(B,2))
         size(A,2) == size(B,1) || error("sizes don't match up")
         Threads.@threads for i in axes(A,1)
           for j in axes(B,2)
             acc = zero(eltype(C))
             for k in axes(A,2)
               acc += A[i,k] * B[k,j]
             end
             C[i,j] = acc
           end
         end
         C
       end;

julia> M3 == @time mul_th(mul_th(M, M), M)
  2.313267 seconds (1.24 M allocations: 639.237 MiB, 2.29% gc time, 5.94% compilation time)
true

julia> Threads.nthreads()  # running e.g. julia -t4
4

Various packages can write this for you, e.g. using Einsum; mul(A,B) = @vielsum C[i,k] := A[i,j] * B[j,k] or else using Tullio; mul(A,B) = @tullio C[i,k] := A[i,j] * B[j,k] threads=10.

Higher powers are much slower, because the numbers involved are larger:

julia> M2 = @time M^2;
  0.133534 seconds (621.57 k allocations: 51.243 MiB, 3.22% gc time)

julia> M3 = @time M^3;
  8.084701 seconds (1.24 M allocations: 644.222 MiB, 0.64% gc time)

julia> M4 = @time M^4;  # uses Base.power_by_squaring
 20.915199 seconds (1.24 M allocations: 910.664 MiB, 0.84% gc time)

julia> @time M2 * M2;  # all the time is here:
 20.659935 seconds (621.57 k allocations: 859.421 MiB, 0.69% gc time)

julia> mean(x -> abs(x.den), M)
6.27823462640995725259881990669421930274423828125e+37

julia> mean(x -> abs(x.den), M2)
4.845326324048551470412760353413448348641588891008324543404627136353750441508056e+2349
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.