Check postgres replication status

Question

Can someone suggest the steps to check pgsql replication status and how to identify if the replication is not happening properly?

We use streaming replication with pgsql9.0 and pgsql9.4

stackoverflow.com/questions/28323355/…

Vao Tsun
– Vao Tsun

2017-04-13 09:25:11 +00:00
Commented Apr 13, 2017 at 9:25 — Vao Tsun
– Vao Tsun, Commented Apr 13, 2017 at 9:25

Alexey · Accepted Answer · 2019-01-12 22:31:32Z

89

I use following SQL queries to check status on Postgres v11 usually.

On master:

select * from pg_stat_replication;

On replica (streaming replication in my case):

select * from pg_stat_wal_receiver;

edited Jan 12, 2019 at 22:31

answered Jan 12, 2019 at 22:21

Alexey

2,2141 gold badge18 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

x-yuri Over a year ago

For vertical output replace ; with \x\g\x.

Cyril Duchon-Doris Over a year ago

Could you explain what we are supposed to see if everything is going well or if something is wrong ? sync_state : sync on the primary is a good sign ?

gerardw Over a year ago

sync_state refers to how the replication was configured. state is the field to check in pg_stat_replication.

truth · Accepted Answer · 2018-07-10 09:29:17Z

31

Show replication status in PostgreSQL

on server

postgres=# select usename,application_name,client_addr,backend_start,state,sync_state from pg_stat_replication ;

usename   | application_name |  client_addr   |         backend_start         |   state   | sync_state 
------------+------------------+----------------+-------------------------------+-----------+------------
replicator | walreceiver      | 192.168.10.132 | 2018-07-06 06:12:20.786918+03 | streaming | async
(1 row)

on client

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
 (1 row)


postgres=# select pg_last_xlog_receive_location();
 pg_last_xlog_receive_location 
-------------------------------
 0/540C1DB8


postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location 
------------------------------
 0/540C1DB8
 (1 row)

postgres=#    SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
                  THEN 0
                ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
              END AS log_delay;
 log_delay 
-----------
 0
 (1 row)

edited Jul 10, 2018 at 9:29

answered Jul 10, 2018 at 9:14

truth

4294 silver badges5 bronze badges

5 Comments

user39950 Over a year ago

on postgres 10, this is not working entirely but SELECT pg_last_xact_replay_timestamp(); gives useful information!

Sergey Nemchinov Over a year ago

Compare master and slave statuses in a single query:

select pg_last_xlog_receive_location(), pg_last_xlog_replay_location(), pg_last_xact_replay_timestamp(),        CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0             ELSE EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;

GypsyCosmonaut Over a year ago

To check master IP from slave, dba.stackexchange.com/a/45062/123063

Marcos Dione Over a year ago

@SergeyNemchinov i guess that query works only on the backup, not on the primary? ``` cloudmaster=# select pg_last_xlog_receive_location(); ERROR: function pg_last_xlog_receive_location() does not exist ```

Sergey Nemchinov Over a year ago

@MarcosDione It seems that the pg_last_xlog_receive_location() function does not exist in newer versions after 9.x

Reinsbrain · Accepted Answer · 2019-01-15 01:49:04Z

31

On your master, pg_stat_replication provides data about ongoing replication:

select client_addr, state, sent_location, write_location,
        flush_location, replay_location from pg_stat_replication;

On postgresql v10:

select client_addr, state, sent_lsn, write_lsn,
    flush_lsn, replay_lsn from pg_stat_replication;

edited Jan 15, 2019 at 1:49

answered Apr 27, 2017 at 15:59

Reinsbrain

2,6512 gold badges27 silver badges40 bronze badges

2 Comments

gvasquez Over a year ago

Ok, what's wrong then if the above query returns 0 rows?

Reinsbrain Over a year ago

0 rows means you have no replication configured at your server. When configured correctly should get a row(s) but that doesn't mean you have replication either as the slave may not be connecting (check 'state' column)

Tombart · Accepted Answer · 2025-09-09 20:59:55Z

After starting backup, check from primary:

psql -c "select TO_CHAR(backup_streamed::decimal/backup_total*100,'fm00D00%'),phase from pg_stat_progress_basebackup;" -t

For larger databases (>> 1TB) each replication phase might take significant time, thus it might be important realize in which stage of replication you are at.

When performing full DB replication there are multiple stages:

A) full backup - pg_basebackup cloning database from primary to a standby server (standby is not running)
B) recovery - standby server starting up (not accepting connections yet)
C) WAL streaming - Both primary and standby(s) in consistent state and accepting connections

A) Full backup

Since PostgreSQL 13 you can use pg_stat_progress_basebackup table, where you can get progress for each replication phase (from primary):

psql -c "select TO_CHAR(backup_streamed::decimal/backup_total*100,'fm00D00%'),phase from pg_stat_progress_basebackup;" -t
 31.79%  | streaming database files

The replication process has multiple phases, progress is reported per each phase:

initializing
starting file transfer
streaming database files
finishing file transfer
transferring WAL

On earlier PostgreSQL version you're left with plain disk usage:

$ primary=$(du -s /var/lib/postgresql/14/main/base | awk '{print $1}')
$ replica=4648394012 # same command as above only from replica
$ echo ${replica}/${primary}*100 | bc -l
34.90793993287965597200

pg_stat_replication gives you e.g. replay_lag which doesn't help much with overall progress. You can only estimate how many WAL you might actually need to keep.

psql -c "select state, client_hostname, write_lag, replay_lag, flush_lag from pg_stat_replication;"
   state   | client_hostname |    write_lag    |   replay_lag    |    flush_lag    
-----------+-----------------+-----------------+-----------------+-----------------
 backup    |                 |                 |                 | 
 streaming |                 | 00:00:00.000516 | 12:23:15.991732 | 00:00:00.310261
(2 rows)

B) recovery

On primary server, use pg_stat_replication table:

psql -c "select state, client_hostname, write_lag, replay_lag, flush_lag from pg_stat_replication;"

  state  | client_hostname |    write_lag    |   replay_lag    |    flush_lag    
---------+-----------------+-----------------+-----------------+-----------------
 catchup |                 | 00:04:50.006556 | 00:04:50.006556 | 00:04:50.006556

Standby server has finished the first stage, assuming the server was able to start (correct configuration of critical parameters). Now the postgres process has to replay accumulated WALs on the standby or copy them from archive (e.g. using archive_command) in order to reach consistent recovery state. You can obtain the LSN using pg_controldata:

$ /usr/lib/postgresql/14/bin/pg_controldata /var/lib/postgresql/14/main | grep "Minimum recovery"
Minimum recovery ending location:     4034A/1BFFFB58

NOTE: There's no record in pg_stat_replication table on the primary, at this stage.

You'll notice in logs something like:

FATAL:  the database system is not yet accepting connections
DETAIL:  Consistent recovery state has not been yet reached.
LOG:  restored log file "000000020004022E00000029" from archive

The HEX number points to a WAL file, that can be dumped (typically single line is enough)

/usr/lib/postgresql/14/bin/pg_waldump 14/main/pg_wal/00000002000402340000003D -n 1
rmgr: Heap        len (rec/tot):     48/    48, tx: 1350166609, lsn: 40234/F4001C08, prev 40234/F3FFFDB0, desc: HEAP_CONFIRM off 52, blkref #0: rel 1663/16421/2324630632 blk 85514

we're looking for the LSN part, e.g. lsn: 40234/F4001C08. Now, when we get back to the primary server, we can obtain the lag in file size units:

 psql -c "SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '40234/F4001C08'));" -t
 1813 GB

1813 GB is the diff between current WAL and the one being replayed.

Another option is to use pg_controldata

$ /usr/lib/postgresql/14/bin/pg_controldata /var/lib/postgresql/14/main | grep location
Latest checkpoint's REDO location:    40335/C001D040

where you can obtain following LSN pointers:

Backup start location
Latest checkpoint's REDO location - current recovery position
Minimum recovery ending location

C) WAL streaming

Primary and standby are in almost the same state, both servers are accepting connections. Except there might be some lag between those two servers, which might be caused by frequency of writes to WAL, network transfer speed, etc.

On standby:

psql -c "SELECT extract(epoch from now() - pg_last_xact_replay_timestamp());"

On primary:

psql -c "select state, client_hostname, write_lag, replay_lag, flush_lag from pg_stat_replication;"

This will give you time difference between two servers.

Collectives™ on Stack Overflow

Check postgres replication status

4 Answers 4

3 Comments

on server

on client

5 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

on server

on client

5 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related