summaryrefslogtreecommitdiff
path: root/src/port/pg_numa.c
AgeCommit message (Collapse)Author
2025-11-20Handle EPERM in pg_numa_initTomas Vondra
When running in Docker, the container may not have privileges needed by get_mempolicy(). This is called by numa_available() in libnuma, but versions prior to 2.0.19 did not expect that. The numa_available() call seemingly succeeds, but then we get unexpected failures when trying to query status of pages: postgres =# select * from pg_shmem_allocations_numa; ERROR: XX000: failed NUMA pages inquiry status: Operation not permitted LOCATION: pg_get_shmem_allocations_numa, shmem.c:691 The best solution is to call get_mempolicy() first, and proceed to numa_available() only when it does not fail with EPERM. Otherwise we'd need to treat older libnuma versions as insufficient, which seems a bit too harsh, as this only affects containerized systems. Fix by me, based on suggestions by Christoph. Backpatch to 18, where the NUMA functions were introduced. Reported-by: Christoph Berg <myon@debian.org> Reviewed-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/aPDZOxjrmEo_1JRG@msg.df7cb.de Backpatch-through: 18
2025-07-01Fix indentation in pg_numa codeTomas Vondra
Broken by commits 7fe2f67c7c9f, 81f287dc923f and bf1119d74a79. Backpatch to 18, same as the offending commits. Backpatch-through: 18
2025-07-01Add CHECK_FOR_INTERRUPTS into pg_numa_query_pagesTomas Vondra
Querying the NUMA status can be quite time consuming, especially with large shared buffers. 8cc139bec34a called numa_move_pages() once, for all buffers, and we had to wait for the syscall to complete. But with the chunking, introduced by 7fe2f67c7c to work around a kernel bug, we can do CHECK_FOR_INTERRUPTS() after each chunk, allowing users to abort the execution. Reviewed-by: Christoph Berg <myon@debian.org> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de Backpatch-through: 18
2025-07-01Limit the size of numa_move_pages requestsTomas Vondra
There's a kernel bug in do_pages_stat(), affecting systems combining 64-bit kernel and 32-bit user space. The function splits the request into chunks of 16 pointers, but forgets the pointers are 32-bit when advancing to the next chunk. Some of the pointers get skipped, and memory after the array is interpreted as pointers. The result is that the produced status of memory pages is mostly bogus. Systems combining 64-bit and 32-bit environments like this might seem rare, but that's not the case - all 32-bit Debian packages are built in a 32-bit chroot on a system with a 64-bit kernel. This is a long-standing kernel bug (since 2010), affecting pretty much all kernels, so it'll take time until all systems get a fixed kernel. Luckily, we can work around the issue by chunking the requests the same way do_pages_stat() does, at least on affected systems. We don't know what kernel a 32-bit build will run on, so all 32-bit builds use chunks of 16 elements (the largest chunk before hitting the issue). 64-bit builds are not affected by this issue, and so could work without the chunking. But chunking has other advantages, so we apply chunking even for 64-bit builds, with chunks of 1024 elements. Reported-by: Christoph Berg <myon@debian.org> Author: Christoph Berg <myon@debian.org> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de Context: https://marc.info/?l=linux-mm&m=175077821909222&w=2 Backpatch-through: 18
2025-04-09Cleanup of pg_numa.cTomas Vondra
This moves/renames some of the functions defined in pg_numa.c: * pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and moved to src/backend/storage/ipc/shmem.c. The new name better reflects that the page size is not related to NUMA, and it's specifically about the page size used for the main shared memory segment. * move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into the backend (which more appropriate for functions callable from SQL). While at it, improve the comment to explain what page size it returns. * remove unnecessary includes from src/port/pg_numa.c, adding unnecessary dependencies (src/port should be suitable for frontent). These were either leftovers or unnecessary thanks to the other changes in this commit. This eliminates unnecessary dependencies on backend symbols, which we don't want in src/port. Reported-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
2025-04-07Add support for basic NUMA awarenessTomas Vondra
Add basic NUMA awareness routines, using a minimal src/port/pg_numa.c portability wrapper and an optional build dependency, enabled by --with-libnuma configure option. For now this is Linux-only, other platforms may be supported later. A built-in SQL function pg_numa_available() allows checking NUMA support, i.e. that the server was built/linked with the NUMA library. The main function introduced is pg_numa_query_pages(), which allows determining the NUMA node for individual memory pages. Internally the function uses move_pages(2) syscall, as it allows batching, and is more efficient than get_mempolicy(2). Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com