Skip to content

Commit a12248f

Browse files
erthalionCommitfest Bot
authored andcommitted
Memory and address space management for buffer resizing
This has three changes 1. Allow to use multiple shared memory mappings ============================================ Currently all the work with shared memory is done via a single anonymous memory mapping, which limits ways how the shared memory could be organized. Introduce possibility to allocate multiple shared memory mappings, where a single mapping is associated with a specified shared memory segment. A new shared memory API is introduced, extended with a segment as a new parameter. As a path of least resistance, the original API is kept in place, utilizing the main shared memory segment. Modifies pg_shmem_allocations to report shared memory segment as well. Adds pg_shmem_segments to report shared memory segment information. 2. Address space reservation for shared memory ============================================ Currently the shared memory layout is designed to pack everything tight together, leaving no space between mappings for resizing. Here is how it looks like for one mapping in /proc/$PID/maps, /dev/zero represents the anonymous shared memory we talk about: 00400000-00490000 /path/bin/postgres ... 012d9000-0133e000 [heap] 7f443a800000-7f470a800000 /dev/zero (deleted) 7f470a800000-7f471831d000 /usr/lib/locale/locale-archive 7f4718400000-7f4718401000 /usr/lib64/libstdc++.so.6.0.34 ... Make the layout more dynamic via splitting every shared memory segment into two parts: * An anonymous file, which actually contains shared memory content. Such an anonymous file is created via memfd_create, it lives in memory, behaves like a regular file and semantically equivalent to an anonymous memory allocated via mmap with MAP_ANONYMOUS. * A reservation mapping, which size is much larger than required shared segment size. This mapping is created with flag MAP_NORESERVE (to not count the reserved space against memory limits). The anonymous file is mapped into this reservation mapping. If we have to change the address maps while resizing the shared buffer pool, it is needed to be done in Postmaster too, so that the new backends will inherit the resized address space from the Postmaster. However, Postmaster is not invovled in ProcSignalBarrier mechanism and we don't want it to spend time in things other than its core functionality. To achive that, maximum required address space maps are setup upfront with read and write access when starting the server. When resizing the buffer pool only the backing file object is resized from the coordinator. This also makes the ProcSignalBarrier handling code light for backends other than the coordinator. The resulting layout looks like this: 00400000-00490000 /path/bin/postgres ... 3f526000-3f590000 rw-p [heap] 7fbd827fe000-7fbd8bdde000 rw-s /memfd:main (deleted) -- anon file 7fbd8bdde000-7fbe82800000 ---s /memfd:main (deleted) -- reservation 7fbe82800000-7fbe90670000 r--p /usr/lib/locale/locale-archive 7fbe90800000-7fbe90941000 r-xp /usr/lib64/libstdc++.so.6.0.34 To resize a shared memory segment in this layout it's possible to use ftruncate on the memory mapped file. This approach also do not impact the actual memory usage as reported by the kernel. TODO: Verify that Cgroup v2 doesn't have any problems with that as well. To verify a new cgroup was created with the memory limit 256 MB, then PostgreSQL was launched within this cgroup with shared_buffers = 128 MB: $ cd /sys/fs/cgroup $ mkdir postgres $ cd postres $ echo 268435456 > memory.max $ echo $MASTER_PID_SHELL > cgroup.procs # postgres from the master branch has being successfully launched # from that shell $ cat memory.current 17465344 (~16.6 MB) # stop postgres $ echo $PATCH_PID_SHELL > cgroup.procs # postgres from the patch has being successfully launched from that shell $ cat memory.current 20770816 (~19.8 MB) There are also few unrelated advantages of using memory mapped files: * We've got a file descriptor, which could be used for regular file operations (modification, truncation, you name it). * The file could be given a name, which improves readability when it comes to process maps. * By default, Linux will not add file-backed shared mappings into a core dump, making it more convenient to work with them in PostgreSQL: no more huge dumps to process. - Some hackers have expressed concerns over it. The downside is that memfd_create is Linux specific. 3. Refactor CalculateShmemSize() ============================= This function calls many functions which return the amount of shared memory required for different shared memory data structures. Up until now, the returned total of these sizes was used to create a single shared memory segment. With this change, CalculateShmemSize() needs to estimate memory requirements for each of the segments. It now takes an array of MemoryMappingSizes, containing as many elements as the number of segments, as an argument. The sizes returned by all the function it calls, except BufferManagerShmemSize(), are added and saved in the first element (index 0) of the array. BufferManagerShmemSize() is modified to save the amount of memory required for buffer manager related segments in the corresponding array element. Additionally it also saves the amount of reserved space. For now, the amount of reserved address space is same as the amount of required memory but that is expected to change with the next commit which implements buffer pool resize. CalculateShmemSize() now returns the total of sizes corresponding to all the sizes. Author: Dmitrii Dolgov and Ashutosh Bapat Reviewed-by: Tomas Vondra
1 parent 81da8b3 commit a12248f

File tree

19 files changed

+755
-233
lines changed

19 files changed

+755
-233
lines changed

doc/src/sgml/system-views.sgml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4233,6 +4233,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
42334233
</para></entry>
42344234
</row>
42354235

4236+
<row>
4237+
<entry role="catalog_table_entry"><para role="column_definition">
4238+
<structfield>segment</structfield> <type>text</type>
4239+
</para>
4240+
<para>
4241+
The name of the shared memory segment concerning the allocation.
4242+
</para></entry>
4243+
</row>
4244+
42364245
<row>
42374246
<entry role="catalog_table_entry"><para role="column_definition">
42384247
<structfield>off</structfield> <type>int8</type>

src/backend/catalog/system_views.sql

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -668,6 +668,13 @@ GRANT SELECT ON pg_shmem_allocations TO pg_read_all_stats;
668668
REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
669669
GRANT EXECUTE ON FUNCTION pg_get_shmem_allocations() TO pg_read_all_stats;
670670

671+
CREATE VIEW pg_shmem_segments AS
672+
SELECT * FROM pg_get_shmem_segments();
673+
674+
REVOKE ALL ON pg_shmem_segments FROM PUBLIC;
675+
GRANT SELECT ON pg_shmem_segments TO pg_read_all_stats;
676+
REVOKE EXECUTE ON FUNCTION pg_get_shmem_segments() FROM PUBLIC;
677+
GRANT EXECUTE ON FUNCTION pg_get_shmem_segments() TO pg_read_all_stats;
671678
CREATE VIEW pg_shmem_allocations_numa AS
672679
SELECT * FROM pg_get_shmem_allocations_numa();
673680

0 commit comments

Comments
 (0)