1

I am struggling with the (de)serialization of PyTorch data. I would like to save my model to a PT(H) file after training it with PyTorch (using GPU). Next I would like to load that serialized model in C++ context (using libtorch). Currently I am just experimenting with basic export/import functionality to get the hang of it.

The code is provided below. I am getting the following error:

Error loading model
Unrecognized data format
Exception raised from load at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\serialization\import.cpp:449 (most recent call first):
00007FFBB1FFDA2200007FFBB1FFD9C0 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FFBB1FFD43E00007FFBB1FFD3F0 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>]
00007FFB4B87B54700007FFB4B87B4E0 torch_cpu.dll!torch::jit::load [<unknown file> @ <unknown line number>]
00007FFB4B87B42A00007FFB4B87B380 torch_cpu.dll!torch::jit::load [<unknown file> @ <unknown line number>]
00007FF6089A737A00007FF6089A7210 pytroch_load_model.exe!main [c:\users\USER\projects\cmake dx cuda pytorch\cmake_integration_examples\pytorch\src\pytroch_load_model.cpp @ 19]
00007FF6089D8A9400007FF6089D8A60 pytroch_load_model.exe!invoke_main [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 79]
00007FF6089D893E00007FF6089D8810 pytroch_load_model.exe!__scrt_common_main_seh [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
00007FF6089D87FE00007FF6089D87F0 pytroch_load_model.exe!__scrt_common_main [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 331]
00007FF6089D8B2900007FF6089D8B20 pytroch_load_model.exe!mainCRTStartup [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp @ 17]
00007FFBDF8C703400007FFBDF8C7020 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]
00007FFBDFBA265100007FFBDFBA2630 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]

Here is the code:

Python (PyTorch):

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

class TestModel(nn.Module):
    def __init__(self):
        super(TestModel, self).__init__()
        self.x = 2

    def forward(self):
        return self.x

test_net = torch.jit.script(Net())
test_module = torch.jit.script(TestModel())
torch.jit.save(test_net, 'test_net.pt')
torch.jit.save(test_module, 'test_module.pt')

C++ (libtorch)

#include <torch/script.h>
#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
    if (argc != 2) {
        std::cerr << "usage: example-app <path-to-exported-script-module>\n";
        return -1;
    }

    torch::jit::script::Module module;
    try {
        std::cout << "Trying to load model..." << std::endl;
        // Deserialize the ScriptModule from a file using torch::jit::load().
        module = torch::jit::load(argv[1]);
    }
    catch (const c10::Error& e) {
        std::cerr << "Loading failed" << std::endl;
        std::cerr << e.what() << std::endl;
        return -1;
    }

    std::cout << "Loading successful" << std::endl;
}

I am using the shared distribution of libtorch 1.12.1. I have tried with both the GPU and CPU version (release, not debug builds) on Windows 10. The TestModel is even taken straight from the Torch JIT documentation...

CMakeLists.txt

cmake_minimum_required (VERSION 3.12 FATAL_ERROR)

project(pytroch
  DESCRIPTION "CMake example for PyTorch (libtorch C++) integration"
  LANGUAGES CXX
)

set(CMAKE_CXX_STANDARD 14)

set(SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/src")
set(CMAKE_PREFIX_PATH "${CMAKE_SOURCE_DIR}/deps/libtorch/1.12.1/release/cpu/share/cmake/Torch")
find_package(Torch REQUIRED)
if(TORCH_FOUND)
    message(STATUS "Found Torch")
else()
    message(CRITICAL_ERROR "Unable to find Torch")
endif(TORCH_FOUND)

add_executable(pytroch_load_model
    "${SRC_DIR}/pytroch_load_model.cpp"
)
target_include_directories(pytroch_load_model PUBLIC ${TORCH_INCLUDE_DIRS})
target_link_libraries(pytroch_load_model PRIVATE ${TORCH_LIBRARIES})
message("${TORCH_LIBRARIES}")
file(GLOB LIBTORCH_DLLS
  "${CMAKE_SOURCE_DIR}/deps/libtorch/1.12.1/release/cpu/lib/*.dll"
)
file(COPY
    ${LIBTORCH_DLLS}
    DESTINATION "${CMAKE_BINARY_DIR}/bin/"
)

The CMakeLists.txt above is part of a larger project. I am posting it here to demonstrate how I am linking against the libraries required to run my code.

Since the PT file has mostly non-readable characters inside (after all it is serialized) I cannot really check what is going on in there. I can see though that Net as well as cpu are present as words (one can only partially read such a file).

3
  • Just checking. Are you 100% absolutely sure the file path you are loading is correct? Or doesn't the path contain any Unicode characters? Cause I have seen that error message when I had a wrong path. By the way, the PT file is actually a standard ZIP file, so it is possible to unzip it and see what's in it (But I don't think that would help much). Commented Oct 13, 2022 at 19:23
  • @ken Thanks for the feedback. Well I tried both relative (my_binary.exe test.pt) as well as absolute (my_binary.exe C:\Users\USER\CMakeBuilds\38f3e235-7163-5330-8115-6d75a7c66e5a\build\x64-Debug (default)\bin\test.pt) path. Neither worked. The binary (where libtorch is used) is stored in the typical folder when using VS (in my case 2017) and the integrated CMake. No Unicode involved (yes, this thought also crossed my mind given how Windows's e.g. PowerShell is not on good terms with it without explicit configuration). Commented Oct 14, 2022 at 6:27
  • I will try to get the debug version of libtorch since trying to solve this with just a silly message from a thrown exception is anything but ok. Commented Oct 14, 2022 at 6:31

1 Answer 1

2

I have created an issue on PyTorch GitHub page. It appears that one cannot combine a release build of the libtorch library with a debug build of the software that links against it.

The issue is gone once I switch to a release build. I will check with the debug build at some point but currently the code I have that uses libtorch is very tiny so no need for extensive debugging.

I see two problems with this:

  • The developer is forced to use the huge (especially the CUDA version) debug build of libtorch
  • The developer may not want to use a debug build especially if they don't want to debug libtorch itself.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.