Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
56 views

I'm facing a frustrating JAX runtime error on a multi-GPU server. My script works fine for a simple test but fails with a No BLAS support for stream error when I try to run multiple instances of it in ...
PowerPoint Trenton's user avatar
Advice
0 votes
1 replies
73 views

We have a small local compute cluster consisting of 5 compute nodes (all supposedly having the same hardware and software) and a login/storage node. I'm running an in-house Fortran software that uses ...
Jonatan Öström's user avatar
4 votes
1 answer
276 views

I'm testing int matrix multiplication, but I found that it's extremely slow everywhere (python numpy using BLAS backend is also just as slow). Int matmul being slower than float matmul is ...
Huy Le's user avatar
  • 1,989
3 votes
1 answer
147 views

Related post: Compile numpy WITHOUT Intel MKL/BLAS/ATLAS/LAPACK Recent versions of numpy use meson for build configuration, I can build numpy from source but failed to exclude BLAS/LAPACK/... deps. ...
nochenon's user avatar
  • 376
1 vote
1 answer
76 views

I'm experiencing a significant performance difference where OpenBLAS matrix multiplication runs 2x slower when called through Lisp CFFI compared to direct C calls, despite using the exact same ...
user31676144's user avatar
0 votes
0 answers
63 views

I'm trying to install the HurdleNormal R package as a dependency for another package (COZINE), and I'm getting the following error: C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ...
Miranda Green's user avatar
0 votes
0 answers
18 views

I've been trying to figure out whether the newer version of BLAS/LAPACK are backward compatible with the older releases but I can't find anything on the netlib website or docs. Are they compatible ...
lll's user avatar
  • 19
1 vote
1 answer
70 views

I'm computing a hermitian (self-adjoint) matrix times a complex vector multiplication by means of ZHEMV in BLAS by calling the function from a C++ interface. The problem I see is getting an "...
Dimorga's user avatar
  • 11
1 vote
0 answers
145 views

I am compiling Fortran code with the ifx compiler (version 2025.0.4) on Windows. I have the Intel MKL library downloaded as well and I am trying to compile a program using it, like this: ifx test.f90 ...
FusRoDah's user avatar
  • 149
1 vote
1 answer
311 views

I'm using a binary (R) that dynamically links to a generic version of BLAS, for instance (and in a lot of cases) this is openBLAS. Now, inside R, I'm dynamically loading another shared library (...
Daniel Falbel's user avatar
1 vote
2 answers
127 views

I'working on a project that uses SAF (Spatial Audio Framework) which has OpenBlas and LAPACK as Dependecies. (The Project includes a lot of libraries so I only show the code that relates to my problem:...
TheBaum's user avatar
  • 164
1 vote
0 answers
47 views

Say I want to calculate x^T * Y, x is an n by 1 matrix and Y is an n by n matrix: cblas_dgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const ...
hansoko's user avatar
  • 389
5 votes
2 answers
204 views

Why does t(mat1) %*% mat2 work quicker than crossprod(mat1, mat2). Isn't the whole point of the latter that it calls a more efficient low-level routine? r$> mat1 <- array(rnorm(100 * 600), dim = ...
Turdle's user avatar
  • 53
5 votes
1 answer
159 views

I am running some fairly large gam models and don't want to parallelize the computations, or at least want to be able to control the degree of parallelization. (Besides not wanting to fry my machine ...
Ben Bolker's user avatar
  • 230k
2 votes
1 answer
86 views

I am trying to speed up a function that, given a complex-valued array arr with n entries, calculates the sum of m operations on that array using BLAS routines. Finally, it replaces the values of arr. ...
Timo59's user avatar
  • 33
0 votes
0 answers
98 views

I am trying to benchmark the blas routines dgemv and dgemm in Fortran. For that I have written this simple codes: matmul.f90: program test ...
pablo's user avatar
  • 69
1 vote
0 answers
125 views

On Linux, in the file a.c, I do #include <cblas.h> and later I do cblas_sgemm(...). Compiling with gcc -O2 -march=native -fopenmp a.c or with gcc -O2 -march=native -lblas -fopenmp a.c results in ...
Sasha's user avatar
  • 371
0 votes
1 answer
161 views

I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below, Configuration GPU: T1000/SM_75 cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...
sof's user avatar
  • 9,767
0 votes
0 answers
140 views

I'm trying to speed up Eigen dense matrix * matrix operation by using multihreaded BLAS library calls. I've achieved 100% speed increase using AMD AOCL-BLAS library from within Eigen. But I seem ...
Pavel Fantys's user avatar
0 votes
1 answer
1k views

Question I am trying to find out if the latest version of NumPy (2.0.0) is taking advantage of the updated Accelerate BLAS/LAPACK library, including ILP64. Numpy Numpy in their 2.0.0 release added ...
Wasserwaage's user avatar
0 votes
0 answers
153 views

BLAS level 2 performs Matrix-vector multiplications, and I know this is O(n^2) in time, when the matrix is shaped (n, n) and the vector is shaped (n, 1). BLAS level 3 performs Matrix-Matrix ...
velenos14's user avatar
  • 586
1 vote
0 answers
619 views

I have an AMD Ryzen 7 2700X and I’m trying to compile Numpy in Anaconda virtual environment using the BLIS/Lapack libraries of AMD AOCL 4.2,that I installed locally. I tried to compile through pip in ...
user25005585's user avatar
0 votes
0 answers
57 views

I'm working on a Python project. After cloning a remote git repository I followed the instructions in the README file, executing multiple pip install commands in my VSCode PowerShell terminal to set ...
Ozymandias's user avatar
1 vote
0 answers
321 views

I am currently having issues with the 'Eigen()' function in R. It was mentioned that I should try using 'OpenBLAS'. How abouts should I go about installing this and make R use this version of BLAS. I ...
Dylan Dijk's user avatar
5 votes
0 answers
523 views

As a follow-up to this question, I am trying to set up a project using CMake with VCPKG on Windows to link the BLAS library. Despite following the instructions from the official VCPKG guide, I'm ...
Foad S. Farimani's user avatar
0 votes
1 answer
1k views

I am working on a project that uses Fortran and requires BLAS libraries. I've decided to use OpenBLAS, which I installed via Conan. However, I'm encountering an issue where CMake cannot find the BLAS ...
Foad S. Farimani's user avatar
2 votes
1 answer
168 views

We are deploying an open source application based on numpy that includes libopenblas.{cryptic string}.gfortran-win32.dll. It is part of the Python numpy package. This dll is over 27MB in size. I'm ...
Max Yaffe's user avatar
  • 1,358
0 votes
1 answer
245 views

The arithmetic intensity of sgemv (or dgemv) is derived in this set of exercises (https://florian.world/wp-content/uploads/FM-High-Performance-Computing-I-Assignment-1.pdf) to be: 0.5 / (1+c), where c ...
velenos14's user avatar
  • 586
1 vote
1 answer
294 views

I want to find powers of a relatively small matrix, but this matrix consists of rational numbers of type Rational{BigInt}. By default, Julia utilizes only a single thread for such computations. I want ...
Yrogirg's user avatar
  • 2,403
1 vote
1 answer
193 views

I have two complex matrices A and B, with matching shapes. Is there a way to cleverly setup the dgemm arguments so as to get the result of the matrix multiplications of the real parts of these ...
G. Fougeron's user avatar
2 votes
1 answer
1k views

I was writing my own standalone module and wanted to use cblas_dasum for efficient calculation of the sum of absolute values of a double array. Though a message pops up saying that I have to specify ...
Baffo rasta's user avatar
1 vote
0 answers
125 views

I made a simple benchmark between Python NumPy and C OpenBLAS to multiply two 500x500 matrices. It seems that np.dot performs almost 9 times faster than cblas_sgemm. Is there anything I'm doing wrong? ...
Momo's user avatar
  • 990
0 votes
1 answer
154 views

In my project I'm doing massive use of the blas subroutines under the mkl implementation, I have no problems in compiling the project thanks to the Intel Advisor, but I can't get fortls to recognize ...
VINCENZO BISOGNO's user avatar
0 votes
1 answer
419 views

I'm trying to install scipy on CentOS 6 with python 3.9.18 and get error: ../scipy/meson.build:159:9: ERROR: Dependency "OpenBLAS" not found, tried pkgconfig The problem is that CentOS 6 ...
Kerim's user avatar
  • 221
0 votes
1 answer
213 views

I have a subroutine that builds sparse matrices, and I need to call it several times. However, it seems that if I call this subroutine a lot of times (and/or if the sparse matrices are very large), ...
Looper's user avatar
  • 364
0 votes
1 answer
120 views

I have noticed the following while trying to increase the performance of my code: >>> a, b = torch.randn(1000,1000), torch.randn(1000,1000) >>> c, d = torch.randn(10000, 100), torch....
Fırat Kıyak's user avatar
0 votes
1 answer
210 views

The title says it already. I am currently parallelizing my code and a major bottleneck is posed by element-wise multiplication of two three-dimensional ndarrays. My system monitor reveals that only ...
ArtPe's user avatar
  • 13
4 votes
1 answer
4k views

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container. Following this repo for installation of llama_cpp_python==0.2.6. DOCKERFILE # Use the ...
Pratyush's user avatar
1 vote
1 answer
203 views

Hello Stack Overflow community, I'm working with NumPy for matrix operations and I have a question regarding how NumPy handles matrix multiplication, especially when dealing with non-continuous slices ...
musako's user avatar
  • 1,357
0 votes
1 answer
158 views

I've boiled a long running function down to a "simple" series of matrix vector multiplications. The matrix does not change, but there are a a lot of vectors. I have put together a test ...
js1's user avatar
  • 3
0 votes
0 answers
81 views

compiled a program with 'call segesv()' to solve system of 3 Equates in 3 Vars and that works fine so I know I'm linked to Blas and Lapack however, 'call SASUM' also compiles (I pass a vector of ...
Broglie's user avatar
0 votes
0 answers
171 views

I am currently re-working a scientific C++ project that makes heavy use of matrix-vector operations like multiplying a (skew)-symmetric matrix with a vector, adding or multiplying two vectors or ...
Lutzimilian's user avatar
0 votes
0 answers
274 views

I'm trying to build the Elmer finite element software (version 9.0) using gfortran 10.2.0 and OpenBLAS 0.3.15 libraries on Windows 10. I'm running into linker errors when creating the shared libraries,...
Foad S. Farimani's user avatar
3 votes
2 answers
262 views

I'm trying to use snrm2 to perform a single precision float calculation in Rust. I'm linking to the Accelerate framework on OSX and using the blas crate for the C-bridge. Regardless of the randomly ...
icyfox's user avatar
  • 479
0 votes
1 answer
187 views

I have the following code for the sparse matrix-vector (SpMV) product in C assuming a CSR storage format: void dcsrmv(SparseMatrixCSR *A, double *x, double *y) { for (int i=0; i<A->m; i++) { ...
Nicolas Venkovic's user avatar
1 vote
0 answers
115 views

I'm looking for help with an issue I'm having building Numpy against locally built blis for zen3. I've configured blis to enable threading using openmp. (it is installed and working on my machine, ...
Crispy Holiday's user avatar
-1 votes
1 answer
206 views

I run MAGMA testing_dgemm code both on V100 and H100 GPU. With Nsight Systems, I found that on the V100 the code doesn't use tensor cores, but code on the H100 it does. V100 result: H100 result: The ...
ingridli's user avatar
0 votes
0 answers
219 views

import numpy as np N = 4 m = 2 mat = np.random.rand(N,2*m) vec = np.random.rand(N) dot1 = np.dot(vec,mat) dot2 = np.concatenate([np.dot(vec,mat[:,:m]), np.dot(vec,mat[:,m:])]) print('Max difference:',...
Junyan Xu's user avatar
  • 103
1 vote
1 answer
509 views

I have downloaded the libtorch CPU-only version from the website and unzipped it. Inside my .cpp application which uses libtorch, I write (I am using intel-mkl for other parts of the application, and ...
velenos14's user avatar
  • 586
0 votes
1 answer
185 views

I am building my application using OpenMPI (built with LLVM) and few other external libraries including netcdf-fortran, BLAS and LAPACK. The files compile without any problem, but in the last stage ...
Redshoe's user avatar
  • 333

1
2 3 4 5
19