Newest 'blas' Questions

0 votes

0 answers

56 views

JAX script fails with INTERNAL: No BLAS support for stream when running multiple processes in parallel on a shared server

I'm facing a frustrating JAX runtime error on a multi-GPU server. My script works fine for a simple test but fails with a No BLAS support for stream error when I try to run multiple instances of it in ...

PowerPoint Trenton

115

asked Nov 16 at 13:31

Advice

0 votes

1 replies

73 views

BLAS speed much worse on one (supposedly heterogenous) compute node

We have a small local compute cluster consisting of 5 compute nodes (all supposedly having the same hardware and software) and a login/storage node. I'm running an in-house Fortran software that uses ...

Jonatan Öström

2,649

asked Nov 3 at 14:34

4 votes

1 answer

276 views

Why is Eigen C++ int matrix multiplication 10x slower than float multiplication (even slower than naive n^3 algorithm) when compiled with AVX512

I'm testing int matrix multiplication, but I found that it's extremely slow everywhere (python numpy using BLAS backend is also just as slow). Int matmul being slower than float matmul is ...

Huy Le

1,989

asked Nov 1 at 10:53

3 votes

1 answer

147 views

Build numpy 2.3+ without accelerated libraries

Related post: Compile numpy WITHOUT Intel MKL/BLAS/ATLAS/LAPACK Recent versions of numpy use meson for build configuration, I can build numpy from source but failed to exclude BLAS/LAPACK/... deps. ...

nochenon

376

asked Oct 26 at 11:52

1 vote

1 answer

76 views

OpenBLAS gemm 2x slower in Lisp CFFI compared to direct C calls with same BLAS library

I'm experiencing a significant performance difference where OpenBLAS matrix multiplication runs 2x slower when called through Lisp CFFI compared to direct C calls, despite using the exact same ...

user31676144

11

asked Oct 14 at 20:52

0 votes

0 answers

63 views

Undefined reference to BLAS

I'm trying to install the HurdleNormal R package as a dependency for another package (COZINE), and I'm getting the following error: C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ...

Miranda Green

1

asked May 16 at 15:58

0 votes

0 answers

18 views

BLAS/LAPACK compatibility

I've been trying to figure out whether the newer version of BLAS/LAPACK are backward compatible with the older releases but I can't find anything on the netlib website or docs. Are they compatible ...

lll

19

asked Feb 5 at 11:34

1 vote

1 answer

70 views

"Invalid read of size 8" warning from Valgrind when calling zhemv blas function in C++

I'm computing a hermitian (self-adjoint) matrix times a complex vector multiplication by means of ZHEMV in BLAS by calling the function from a C++ interface. The problem I see is getting an "...

Dimorga

11

asked Jan 17 at 0:44

1 vote

0 answers

145 views

Ifx cannot find modern generic MKL routines like GEMM_F95

I am compiling Fortran code with the ifx compiler (version 2025.0.4) on Windows. I have the Intel MKL library downloaded as well and I am trying to compile a program using it, like this: ifx test.f90 ...

FusRoDah

149

asked Jan 10 at 9:22

1 vote

1 answer

311 views

MKL and openBLAS interactions - a question about linking

I'm using a binary (R) that dynamically links to a generic version of BLAS, for instance (and in a lot of cases) this is openBLAS. Now, inside R, I'm dynamically loading another shared library (...

Daniel Falbel

1,723

asked Nov 8, 2024 at 22:56

1 vote

2 answers

127 views

Undefined reference to cblas_* with cmake on windows

I'working on a project that uses SAF (Spatial Audio Framework) which has OpenBlas and LAPACK as Dependecies. (The Project includes a lot of libraries so I only show the code that relates to my problem:...

TheBaum

164

asked Nov 8, 2024 at 22:06

1 vote

0 answers

47 views

Confused about cblas_dgemm arguments

Say I want to calculate x^T * Y, x is an n by 1 matrix and Y is an n by n matrix: cblas_dgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const ...

hansoko

389

asked Oct 28, 2024 at 15:52

5 votes

2 answers

204 views

crossprod(m1, m2) is running slower than t(m1) %*% m2 on my machine

Why does t(mat1) %*% mat2 work quicker than crossprod(mat1, mat2). Isn't the whole point of the latter that it calls a more efficient low-level routine? r$> mat1 <- array(rnorm(100 * 600), dim = ...

Turdle

53

asked Oct 3, 2024 at 5:24

5 votes

1 answer

159 views

How to control (BLAS?) parallelization when using mgcv::gam

I am running some fairly large gam models and don't want to parallelize the computations, or at least want to be able to control the degree of parallelization. (Besides not wanting to fry my machine ...

Ben Bolker

230k

asked Sep 8, 2024 at 15:38

2 votes

1 answer

86 views

Parallelize operations on arrays and merge results into one array using OpenMP

I am trying to speed up a function that, given a complex-valued array arr with n entries, calculates the sum of m operations on that array using BLAS routines. Finally, it replaces the values of arr. ...

Timo59

33

asked Sep 2, 2024 at 10:39

0 votes

0 answers

98 views

Unexpected behaviour of matmul when compiled with blas in Fortran

I am trying to benchmark the blas routines dgemv and dgemm in Fortran. For that I have written this simple codes: matmul.f90: program test ...

pablo

69

asked Aug 9, 2024 at 11:16

1 vote

0 answers

125 views

How to use BLAS in C, using gcc on Linux?

On Linux, in the file a.c, I do #include <cblas.h> and later I do cblas_sgemm(...). Compiling with gcc -O2 -march=native -fopenmp a.c or with gcc -O2 -march=native -lblas -fopenmp a.c results in ...

Sasha

371

asked Aug 8, 2024 at 6:17

0 votes

1 answer

161 views

Problems evaluating CUDNN for SGEMM

I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below, Configuration GPU: T1000/SM_75 cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...

sof

9,767

asked Jul 17, 2024 at 10:08

0 votes

0 answers

140 views

How can I use multithreaded BLAS from a single threaded EIgen C++ application?

I'm trying to speed up Eigen dense matrix * matrix operation by using multihreaded BLAS library calls. I've achieved 100% speed increase using AMD AOCL-BLAS library from within Eigen. But I seem ...

Pavel Fantys

1

asked Jun 27, 2024 at 20:43

0 votes

1 answer

1k views

Numpy/Scipy BLAS/LAPACK Linking on macOS (with Apple Accelerate)

Question I am trying to find out if the latest version of NumPy (2.0.0) is taking advantage of the updated Accelerate BLAS/LAPACK library, including ILP64. Numpy Numpy in their 2.0.0 release added ...

Wasserwaage

262

asked Jun 20, 2024 at 14:28

0 votes

0 answers

153 views

What is the time-complexity of BLAS level 2 and 3 functions from a vendor which optimized the operations?

BLAS level 2 performs Matrix-vector multiplications, and I know this is O(n^2) in time, when the matrix is shaped (n, n) and the vector is shaped (n, 1). BLAS level 3 performs Matrix-Matrix ...

velenos14

586

asked May 14, 2024 at 20:10

1 vote

0 answers

619 views

How can I select the AOCL BLIS/Lapack libraries for building Numpy on Windows 10?

I have an AMD Ryzen 7 2700X and I’m trying to compile Numpy in Anaconda virtual environment using the BLIS/Lapack libraries of AMD AOCL 4.2,that I installed locally. I tried to compile through pip in ...

user25005585

11

asked May 12, 2024 at 23:29

0 votes

0 answers

57 views

Installation of C++ libraries 'Boost' and 'BLAS' for Python project fail on Windows

I'm working on a Python project. After cloning a remote git repository I followed the instructions in the README file, executing multiple pip install commands in my VSCode PowerShell terminal to set ...

Ozymandias

13

asked May 12, 2024 at 1:10

1 vote

0 answers

321 views

Change the BLAS version used by R

I am currently having issues with the 'Eigen()' function in R. It was mentioned that I should try using 'OpenBLAS'. How abouts should I go about installing this and make R use this version of BLAS. I ...

Dylan Dijk

297

asked Apr 30, 2024 at 14:56

5 votes

0 answers

523 views

CMake Error: Could NOT Find BLAS Using VCPKG and CMake on Windows

As a follow-up to this question, I am trying to set up a project using CMake with VCPKG on Windows to link the BLAS library. Despite following the instructions from the official VCPKG guide, I'm ...

Foad S. Farimani

14.4k

asked Apr 27, 2024 at 22:30

0 votes

1 answer

1k views

CMake cannot find BLAS libraries after installing OpenBLAS via Conan

I am working on a project that uses Fortran and requires BLAS libraries. I've decided to use OpenBLAS, which I installed via Conan. However, I'm encountering an issue where CMake cannot find the BLAS ...

Foad S. Farimani

14.4k

asked Apr 27, 2024 at 19:22

2 votes

1 answer

168 views

Why is libopenblas from numpy so big?

We are deploying an open source application based on numpy that includes libopenblas.{cryptic string}.gfortran-win32.dll. It is part of the Python numpy package. This dll is over 27MB in size. I'm ...

Max Yaffe

1,358

asked Apr 2, 2024 at 16:06

0 votes

1 answer

245 views

arithmetic intensity of zgemv versus dgemv/sgemv?

The arithmetic intensity of sgemv (or dgemv) is derived in this set of exercises (https://florian.world/wp-content/uploads/FM-High-Performance-Computing-I-Assignment-1.pdf) to be: 0.5 / (1+c), where c ...

velenos14

586

asked Mar 16, 2024 at 22:11

1 vote

1 answer

294 views

How to force Julia to use multiple threads for matrix multiplication?

I want to find powers of a relatively small matrix, but this matrix consists of rational numbers of type Rational{BigInt}. By default, Julia utilizes only a single thread for such computations. I want ...

Yrogirg

2,403

asked Feb 10, 2024 at 23:07

1 vote

1 answer

193 views

Can I multiply the real parts of two complex matrices using dgemm?

I have two complex matrices A and B, with matching shapes. Is there a way to cleverly setup the dgemm arguments so as to get the result of the matrix multiplications of the real parts of these ...

G. Fougeron

501

asked Feb 7, 2024 at 15:02

2 votes

1 answer

1k views

In Xcode, how do you set compiler flags for standalone module (framework)?

I was writing my own standalone module and wanted to use cblas_dasum for efficient calculation of the sum of absolute values of a double array. Though a message pops up saying that I have to specify ...

Baffo rasta

135

asked Jan 24, 2024 at 16:49

1 vote

0 answers

125 views

Why BLAS cblas_sgemm in C is slower than np.dot?

I made a simple benchmark between Python NumPy and C OpenBLAS to multiply two 500x500 matrices. It seems that np.dot performs almost 9 times faster than cblas_sgemm. Is there anything I'm doing wrong? ...

Momo

990

asked Dec 26, 2023 at 16:54

0 votes

1 answer

154 views

How to properly link mkl interfaces with fortls

In my project I'm doing massive use of the blas subroutines under the mkl implementation, I have no problems in compiling the project thanks to the Intel Advisor, but I can't get fortls to recognize ...

VINCENZO BISOGNO

1

asked Dec 20, 2023 at 8:59

0 votes

1 answer

419 views

Installing scipy on CentOS 6 (OpenBLAS problem)

I'm trying to install scipy on CentOS 6 with python 3.9.18 and get error: ../scipy/meson.build:159:9: ERROR: Dependency "OpenBLAS" not found, tried pkgconfig The problem is that CentOS 6 ...

Kerim

221

asked Dec 14, 2023 at 18:05

0 votes

1 answer

213 views

Fortran with Sparse BLAS not flushing memory

I have a subroutine that builds sparse matrices, and I need to call it several times. However, it seems that if I call this subroutine a lot of times (and/or if the sparse matrices are very large), ...

Looper

364

asked Dec 12, 2023 at 5:37

0 votes

1 answer

120 views

Why multiplying wide matrices are slower than square matrices?

I have noticed the following while trying to increase the performance of my code: >>> a, b = torch.randn(1000,1000), torch.randn(1000,1000) >>> c, d = torch.randn(10000, 100), torch....

Fırat Kıyak

509

asked Dec 11, 2023 at 22:50

0 votes

1 answer

210 views

How do I make np.multiply use more than one core?

The title says it already. I am currently parallelizing my code and a major bottleneck is posed by element-wise multiplication of two three-dimensional ndarrays. My system monitor reveals that only ...

ArtPe

13

asked Dec 6, 2023 at 13:46

4 votes

1 answer

4k views

No GPU support while running llama-cpp-python inside a docker container

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container. Following this repo for installation of llama_cpp_python==0.2.6. DOCKERFILE # Use the ...

Pratyush

39

asked Nov 23, 2023 at 6:09

1 vote

1 answer

203 views

How Does NumPy Internally Handle Matrix Multiplication with Non-continuous Slices?

Hello Stack Overflow community, I'm working with NumPy for matrix operations and I have a question regarding how NumPy handles matrix multiplication, especially when dealing with non-continuous slices ...

musako

1,357

asked Nov 22, 2023 at 1:55

0 votes

1 answer

158 views

Repeated single precison complex matrix vector multiplication (speed and accuracy improvement)

I've boiled a long running function down to a "simple" series of matrix vector multiplications. The matrix does not change, but there are a a lot of vectors. I have put together a test ...

js1

3

asked Oct 26, 2023 at 2:36

0 votes

0 answers

81 views

cannot call SASUM by itself as in x=SASUM without fortran 'call'

compiled a program with 'call segesv()' to solve system of 3 Equates in 3 Vars and that works fine so I know I'm linked to Blas and Lapack however, 'call SASUM' also compiles (I pass a vector of ...

Broglie

1

asked Oct 12, 2023 at 23:33

0 votes

0 answers

171 views

Is BLIS suitable for cross-plattform development, including Apple Silicon?

I am currently re-working a scientific C++ project that makes heavy use of matrix-vector operations like multiplying a (skew)-symmetric matrix with a vector, adding or multiplying two vectors or ...

Lutzimilian

1

asked Sep 17, 2023 at 23:00

0 votes

0 answers

274 views

Linker errors with BLAS/LAPACK symbols (snrm2_, sdot_, etc) when building Fortran project with gfortran on Windows

I'm trying to build the Elmer finite element software (version 9.0) using gfortran 10.2.0 and OpenBLAS 0.3.15 libraries on Windows 10. I'm running into linker errors when creating the shared libraries,...

Foad S. Farimani

14.4k

asked Aug 23, 2023 at 8:28

3 votes

2 answers

262 views

snrm2 calculation instability for single-precision floats on Accelerate

I'm trying to use snrm2 to perform a single precision float calculation in Rust. I'm linking to the Accelerate framework on OSX and using the blas crate for the C-bridge. Regardless of the randomly ...

icyfox

479

asked Aug 21, 2023 at 17:04

0 votes

1 answer

187 views

What is wrong with my sparse matrix-multiple vectors (SpMM) product function for CSR?

I have the following code for the sparse matrix-vector (SpMV) product in C assuming a CSR storage format: void dcsrmv(SparseMatrixCSR *A, double *x, double *y) { for (int i=0; i<A->m; i++) { ...

Nicolas Venkovic

5

asked Aug 15, 2023 at 10:08

1 vote

0 answers

115 views

numpy built with locally built blis does not use multithreading

I'm looking for help with an issue I'm having building Numpy against locally built blis for zen3. I've configured blis to enable threading using openmp. (it is installed and working on my machine, ...

Crispy Holiday

472

asked Aug 10, 2023 at 15:06

-1 votes

1 answer

206 views

Why does the magma_dgemm function not use tensor cores on the V100 GPU?

I run MAGMA testing_dgemm code both on V100 and H100 GPU. With Nsight Systems, I found that on the V100 the code doesn't use tensor cores, but code on the H100 it does. V100 result: H100 result: The ...

ingridli

5

asked Aug 9, 2023 at 13:31

0 votes

0 answers

219 views

`np.dot` yields a different result when computed in two pieces

import numpy as np N = 4 m = 2 mat = np.random.rand(N,2*m) vec = np.random.rand(N) dot1 = np.dot(vec,mat) dot2 = np.concatenate([np.dot(vec,mat[:,:m]), np.dot(vec,mat[:,m:])]) print('Max difference:',...

Junyan Xu

103

asked Jul 17, 2023 at 19:53

1 vote

1 answer

509 views

How to see details behind CPU-only Libtorch Matrix-Matrix multiplication routines?

I have downloaded the libtorch CPU-only version from the website and unzipped it. Inside my .cpp application which uses libtorch, I write (I am using intel-mkl for other parts of the application, and ...

velenos14

586

asked Jun 22, 2023 at 17:49

0 votes

1 answer

185 views

"undefined reference to" error during linking process

I am building my application using OpenMPI (built with LLVM) and few other external libraries including netcdf-fortran, BLAS and LAPACK. The files compile without any problem, but in the last stage ...

Redshoe

333

asked Jun 3, 2023 at 21:39

Collectives™ on Stack Overflow