1

I have the following program which uses Python C API. It creates a number of threads (NUM_THREADS constant). In each thread there is an infinite loop that does a very simple operation: creates a Python dictionary which key id set to the thread id, then dumps that dictionary into a string a print it (using dumps function in the json Python module). After that the thread waits WAIT_TIME seconds and do the same again.

// g++ -g -o multithread multithread.cpp -I/usr/include/python3.11/ -lpython3.11 -lpthread

#include <Python.h>
#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>

// WAIT_TIME is in seconds
#define NUM_THREADS 20
#define WAIT_TIME   1

// Global semaphore declaration
sem_t semaphore;

// Global JSON module object, to be accesses in every thread
PyObject* jsonModule;

// Function to be executed by each thread
void* thread_function(void* arg) {
    long thread_id = (long)arg;
    while(true) {
        sem_wait(&semaphore); // mark 1
        PyObject* myDict = Py_BuildValue("{s:i}", "id", thread_id);
        PyObject* result = PyObject_CallMethod(jsonModule, "dumps", "O", myDict);
        PyObject* repr = PyObject_Repr(result);
        const char* result_str = PyUnicode_AsUTF8(repr);
        printf("Thread %ld result: %s\n", thread_id, result_str);
        Py_XDECREF(result);
        Py_XDECREF(myDict);
        Py_XDECREF(repr);
        sem_post(&semaphore); // mark 2
        sleep(WAIT_TIME);
    }        
    pthread_exit(NULL);
}

int main() {
    pthread_t threads[NUM_THREADS];
    int i;

    // Initialize the Python interpreter
    Py_Initialize();

    // Import json module
    jsonModule = PyImport_ImportModule("json");

    // Initialize the semaphore
    sem_init(&semaphore, 0, 1);

    // Create threads
    for (i = 0; i < NUM_THREADS; ++i) {
        if (pthread_create(&threads[i], NULL, thread_function, (void*)(long)i) != 0) {
            fprintf(stderr, "Error creating thread\n");
            return 1;
        }
    }

    // Join threads
    for (i = 0; i < NUM_THREADS; ++i) {
        if (pthread_join(threads[i], NULL) != 0) {
            fprintf(stderr, "Error joining thread\n");
            return 1;
        }
    }

    // Free resources (never reach this point, but added for simmetry)
    Py_XDECREF(jsonModule);

    // Finalize the Python interpreter
    Py_Finalize();

    // Destroy the semaphore
    sem_destroy(&semaphore);

    printf("All threads have completed\n");
    return 0;
}

As far as I have empirically checked, the program works as long as the semaphore is taken before starting invoking Py* functions. In other words, as long as the lines in mark 1 and mark 2 points are used.

If I remove mark 1 and mark 2 statements (so removing semaphore base exclusion) then the program eventually crashed very soon. Looking into the backtrace of the generated core file, it seems the problem is in the call of the PyObject_CallMethod() function.

(gdb) bt
#0  0x00007fb315289c19 in ?? () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#1  0x00007fb31526aac6 in ?? () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#2  0x00007fb31517d80b in ?? () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#3  0x00007fb31517ddd9 in PyObject_CallMethod () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#4  0x000055e1a763f2ef in thread_function (arg=0x11) at multithread.cpp:24
#5  0x00007fb314ea8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fb314f287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

This is a bit surprising as all the PyObject* variables are local to the thread function (myDict, result and repr). The only PyObject* variable non local to the thread is the one for the module itself (jsonModule). Is that the one causing the problem?

Does this means that Python C library is not thread-safe so no more than one Py* function can be run at the same time? Is there any alternative to the one I have used (i.e. semaphore implemented in my own code)? Any good implementation pattern for this kind of programs (i.e. multi-threaded using Python C API)?

Thanks in advance!

1

2 Answers 2

4

The CPython interpreter data structures are not thread safe. This is why the Global Interpreter Lock (GIL) exists, and why multi-threaded CPython programs are doomed to be generally slow (since only unlocked parts can be actually multi-threaded and most library cannot released the GIL because it is not safe to do so on pure-Python objects). To quote the documentation:

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. The GIL prevents race conditions and ensures thread safety. A nice explanation of how the Python GIL helps in these areas can be found here. In short, this mutex is necessary mainly because CPython's memory management is not thread-safe.

You can release the GIL when you are not working on CPython objects (e.g. for pure numerical computations or IO operations). Multiprocessing is often used instead to circumvent this strong limitation. This means using multiple interpreter processes and generally communicating using inter-process communication (IPC) which is often quite expensive (shared memory is possible though but not on interpreter Python objects).

Sign up to request clarification or add additional context in comments.

Comments

3

Python is not thread-safe:

The Python interpreter is not fully thread-safe. In order to support multi-threaded Python programs, there’s a global lock, called the global interpreter lock or GIL, that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.

Therefore, the rule exists that only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions. In order to emulate concurrency of execution, the interpreter regularly tries to switch threads (see sys.setswitchinterval()). The lock is also released around potentially blocking I/O operations like reading or writing a file, so that other Python threads can run in the meantime.

The Python interpreter keeps some thread-specific bookkeeping information inside a data structure called PyThreadState. There’s also one global variable pointing to the current PyThreadState: it can be retrieved using PyThreadState_Get().

...

Your use of a semaphore is effectively replicating the functionality of Python's "global interpreter lock".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.