I'm using a binary (R) that dynamically links to a generic version of BLAS, for instance (and in a lot of cases) this is openBLAS.
Now, inside R, I'm dynamically loading another shared library (libtorch.so) essentially using dlopen(). Turns out libtorch statically links to MKL BLAS.
My understanding about static and dynamic linking is that this shouldn't be a problem. Ie, since libtorch is statically linked to MKL. When calling libtorch's code it will always prefer it's own symbols instead of other similarly named symbols that might be dynamically loaded.
Indeed, this seems to be the usual behavior. For instance if I take out BLAS and LibTorch from the game, I can compile an executable that links to a shared library libA, implementing eg print() and to another shared library libB that is statically linked to libA. When calling code from libB it will correcly call the definitions from it's own version of libA.
But that doesn't happen with libtorch/MKL and openBLAS. If I compile an executable that dynamically links to both libTorch and openBlas, then libtorch will start using openBLAS routines instead of the statically linked MKL ones.
For instance:
#0 0x00007ffff5537da0 in sgemm_ () from /lib/x86_64-linux-gnu/libopenblas.so.0
#1 0x00007fffde5385d6 in at::native::cpublas::gemm(at::native::TransposeType, at::native::TransposeType, long, long, long, float, float const*, long, float const*, long, float, float*, long) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#2 0x00007fffde67c139 in at::native::addmm_impl_cpu_(at::Tensor&, at::Tensor const&, at::Tensor, at::Tensor, c10::Scalar const&, c10::Scalar const&) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#3 0x00007fffde67d475 in at::native::structured_mm_out_cpu::impl(at::Tensor const&, at::Tensor const&, at::Tensor const&) ()
from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#4 0x00007fffdf42309b in at::(anonymous namespace)::wrapper_CPU_mm(at::Tensor const&, at::Tensor const&) ()
from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#5 0x00007fffdf423123 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CPU_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#6 0x00007fffdf1eaa70 in at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) ()
from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
This happens, even though libtorch_cpu.so includes it's own version of sgemm_, eg:
nm libtorch/lib/libtorch_cpu.so | grep "T sgemm_"
0000000006c531b0 T sgemm_
0000000006c53870 T sgemm_64
0000000006c53870 T sgemm_64_
My question is, in what circunstances symbols from a dynamically loaded library can get in front of the statically loaded library? I'm surely missing something important here and any advice will be extremely helpful.
Reproducible example:#include <torch/torch.h>
#include <iostream>
#include <cblas.h>
extern "C" void execute () {
for (auto i = 1; i < 10; i++) {
torch::Tensor tensor = torch::randn({2000, 2000});
auto k = tensor.mm(tensor);
}
}
int main() {
int m = 3; // rows of A
int n = 3; // cols of A
// Matrix A (m x n) in row-major order
double A[] = {1.0, 2.0, 3.0,
4.0, 5.0, 6.0,
7.0, 8.0, 9.0};
// Vector x (size n)
double x[] = {1.0, 1.0, 1.0};
// Result vector y (size m), initially zero
double y[] = {0.0, 0.0, 0.0};
// Scalar multipliers
double alpha = 1.0, beta = 0.0;
// Perform y = alpha * A * x + beta * y
cblas_dgemv(CblasRowMajor, CblasNoTrans, m, n, alpha, A, n, x, 1, beta, y, 1);
execute();
return 0;
}
With a CMakeLists.txt
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(example)
find_package(Torch REQUIRED)
find_package(BLAS)
add_executable(example example.cpp)
target_link_libraries(example "${TORCH_LIBRARIES}" "${BLAS_LIBRARIES}")
set_property(TARGET example PROPERTY CXX_STANDARD 17)
LibTorch can be obtained from pytorch website with direct download link
To run
mkdir build && cd build
cmake .. -DCMAKE_PREFIX_PATH=<path to libtorch>
cmake --build .
libtorchhide all symbols which belong toMKL BLAS? If your binary sees aBLASsymbol then it assumes it belongs to your ownBLASlibrary.nm libtorch/lib/libtorch_cpu.so | grep "T sgemv_" 0000000006c4d610 T sgemv_ 0000000006c4db70 T sgemv_64 0000000006c4db70 T sgemv_64_libtorchlibrary are safe and must reach the correct library (libtorchdid not knew yourBLASlibrary whenlibtorchwas built). Problems would arrise if your executable andlibtorchwould exchange instances of seemingly similarBLASdata types. Your executable could clean-up/release instances which were initialized/allocated by the other library or the other way around..so. What doldd <path to libtorch.so>andldd <your executable>report?