5

I just made an pg_dump backup from my database and its size is about 95GB but the size of the direcory /pgsql/data is about 38GB.

I run a vacuum FULL and the size of the dump does not change. The version of my postgres installation is 9.3.4, on a CentOS release 6.3 server.

It is very weird the size of the dump comparing with the physical size or I can consider this normal?

Thanks in advance!

Regards.

Neme.

1
  • 1
    This can happen if you have a lot of (not NULLable, high-valued) numerical fields. The dump is basically ASCII, and a maximum-value 4byte integer field takes about 10 bytes in ASCII (plus one byte for the \t or \n separators) Apparently you don't have many indexes on your tables, since indexes are not included in the dump, only the DDL to reconstruct them. Commented May 16, 2016 at 14:50

2 Answers 2

3

The size of pg_dump output and the size of a Postgres cluster (aka 'instance') on disk have very, very little correlation. Consider:

  • pg_dump has 3 different output formats, 2 of which allow compression on-the-fly
  • pg_dump output contains only schema definition and raw data in a text (or possibly "binary" format). It contains no index data.
  • The text/"binary" representation of different data types can be larger or smaller than actual data stored in the database. For example, the number 1 stored in a bigint field will take 8 bytes in a cluster, but only 1 byte in pg_dump.

This is also why VACUUM FULL had no effect on the size of the backup.

Note that a Point In Time Recovery (PITR) based backup is entirely different from a pg_dump backup. PITR backups are essentially copies of the data on disk.

Sign up to request clarification or add additional context in comments.

1 Comment

Jim, your answer says the exact opposite of what the original question was. The backup / dump is twice the size of the actual database on disk. This seems unlikely, but I'm seeing the exact same thing.
1

Postgres does compress its data in certain situations, using a technique called TOAST:

PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or "the best thing since sliced bread").

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.