Commit a12248f
Memory and address space management for buffer resizing
This has three changes
1. Allow to use multiple shared memory mappings
============================================
Currently all the work with shared memory is done via a single anonymous
memory mapping, which limits ways how the shared memory could be organized.
Introduce possibility to allocate multiple shared memory mappings, where
a single mapping is associated with a specified shared memory segment.
A new shared memory API is introduced, extended with a segment as a new
parameter. As a path of least resistance, the original API is kept in
place, utilizing the main shared memory segment.
Modifies pg_shmem_allocations to report shared memory segment as well.
Adds pg_shmem_segments to report shared memory segment information.
2. Address space reservation for shared memory
============================================
Currently the shared memory layout is designed to pack everything tight
together, leaving no space between mappings for resizing. Here is how it
looks like for one mapping in /proc/$PID/maps, /dev/zero represents the
anonymous shared memory we talk about:
00400000-00490000 /path/bin/postgres
...
012d9000-0133e000 [heap]
7f443a800000-7f470a800000 /dev/zero (deleted)
7f470a800000-7f471831d000 /usr/lib/locale/locale-archive
7f4718400000-7f4718401000 /usr/lib64/libstdc++.so.6.0.34
...
Make the layout more dynamic via splitting every shared memory segment
into two parts:
* An anonymous file, which actually contains shared memory content.
Such an anonymous file is created via memfd_create, it lives in
memory, behaves like a regular file and semantically equivalent to an
anonymous memory allocated via mmap with MAP_ANONYMOUS.
* A reservation mapping, which size is much larger than required shared
segment size. This mapping is created with flag MAP_NORESERVE (to not
count the reserved space against memory limits). The anonymous file is
mapped into this reservation mapping.
If we have to change the address maps while resizing the shared buffer
pool, it is needed to be done in Postmaster too, so that the new
backends will inherit the resized address space from the Postmaster.
However, Postmaster is not invovled in ProcSignalBarrier mechanism and
we don't want it to spend time in things other than its core
functionality. To achive that, maximum required address space maps are
setup upfront with read and write access when starting the server. When
resizing the buffer pool only the backing file object is resized from
the coordinator. This also makes the ProcSignalBarrier handling code
light for backends other than the coordinator.
The resulting layout looks like this:
00400000-00490000 /path/bin/postgres
...
3f526000-3f590000 rw-p [heap]
7fbd827fe000-7fbd8bdde000 rw-s /memfd:main (deleted) -- anon file
7fbd8bdde000-7fbe82800000 ---s /memfd:main (deleted) -- reservation
7fbe82800000-7fbe90670000 r--p /usr/lib/locale/locale-archive
7fbe90800000-7fbe90941000 r-xp /usr/lib64/libstdc++.so.6.0.34
To resize a shared memory segment in this layout it's possible to use
ftruncate on the memory mapped file.
This approach also do not impact the actual memory usage as reported by
the kernel.
TODO: Verify that Cgroup v2 doesn't have any problems with that as well. To verify a new cgroup
was created with the memory limit 256 MB, then PostgreSQL was launched within
this cgroup with shared_buffers = 128 MB:
$ cd /sys/fs/cgroup
$ mkdir postgres
$ cd postres
$ echo 268435456 > memory.max
$ echo $MASTER_PID_SHELL > cgroup.procs
# postgres from the master branch has being successfully launched
# from that shell
$ cat memory.current
17465344 (~16.6 MB)
# stop postgres
$ echo $PATCH_PID_SHELL > cgroup.procs
# postgres from the patch has being successfully launched from that shell
$ cat memory.current
20770816 (~19.8 MB)
There are also few unrelated advantages of using memory mapped files:
* We've got a file descriptor, which could be used for regular file
operations (modification, truncation, you name it).
* The file could be given a name, which improves readability when it
comes to process maps.
* By default, Linux will not add file-backed shared mappings into a core dump,
making it more convenient to work with them in PostgreSQL: no more huge dumps
to process. - Some hackers have expressed concerns over it.
The downside is that memfd_create is Linux specific.
3. Refactor CalculateShmemSize()
=============================
This function calls many functions which return the amount of shared
memory required for different shared memory data structures. Up until
now, the returned total of these sizes was used to create a single
shared memory segment. With this change, CalculateShmemSize() needs to
estimate memory requirements for each of the segments. It now takes an
array of MemoryMappingSizes, containing as many elements as the number
of segments, as an argument. The sizes returned by all the function it
calls, except BufferManagerShmemSize(), are added and saved in the first
element (index 0) of the array. BufferManagerShmemSize() is modified to
save the amount of memory required for buffer manager related segments
in the corresponding array element. Additionally it also saves the
amount of reserved space. For now, the amount of reserved address space
is same as the amount of required memory but that is expected to change
with the next commit which implements buffer pool resize.
CalculateShmemSize() now returns the total of sizes corresponding to all
the sizes.
Author: Dmitrii Dolgov and Ashutosh Bapat
Reviewed-by: Tomas Vondra1 parent 81da8b3 commit a12248f
File tree
19 files changed
+755
-233
lines changed- doc/src/sgml
- src
- backend
- catalog
- port
- storage
- buffer
- ipc
- lmgr
- include
- catalog
- portability
- storage
- test/regress/expected
19 files changed
+755
-233
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4233 | 4233 | | |
4234 | 4234 | | |
4235 | 4235 | | |
| 4236 | + | |
| 4237 | + | |
| 4238 | + | |
| 4239 | + | |
| 4240 | + | |
| 4241 | + | |
| 4242 | + | |
| 4243 | + | |
| 4244 | + | |
4236 | 4245 | | |
4237 | 4246 | | |
4238 | 4247 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
668 | 668 | | |
669 | 669 | | |
670 | 670 | | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
671 | 678 | | |
672 | 679 | | |
673 | 680 | | |
| |||
0 commit comments