53

Can someone suggest the steps to check pgsql replication status and how to identify if the replication is not happening properly?

We use streaming replication with pgsql9.0 and pgsql9.4

1

4 Answers 4

89

I use following SQL queries to check status on Postgres v11 usually.

On master:

select * from pg_stat_replication;

On replica (streaming replication in my case):

select * from pg_stat_wal_receiver;
Sign up to request clarification or add additional context in comments.

3 Comments

For vertical output replace ; with \x\g\x.
Could you explain what we are supposed to see if everything is going well or if something is wrong ? sync_state : sync on the primary is a good sign ?
sync_state refers to how the replication was configured. state is the field to check in pg_stat_replication.
31

Show replication status in PostgreSQL

on server

postgres=# select usename,application_name,client_addr,backend_start,state,sync_state from pg_stat_replication ;

usename   | application_name |  client_addr   |         backend_start         |   state   | sync_state 
------------+------------------+----------------+-------------------------------+-----------+------------
replicator | walreceiver      | 192.168.10.132 | 2018-07-06 06:12:20.786918+03 | streaming | async
(1 row)

on client

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
 (1 row)


postgres=# select pg_last_xlog_receive_location();
 pg_last_xlog_receive_location 
-------------------------------
 0/540C1DB8


postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location 
------------------------------
 0/540C1DB8
 (1 row)

postgres=#    SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
                  THEN 0
                ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
              END AS log_delay;
 log_delay 
-----------
 0
 (1 row)

5 Comments

on postgres 10, this is not working entirely but SELECT pg_last_xact_replay_timestamp(); gives useful information!
Compare master and slave statuses in a single query: select pg_last_xlog_receive_location(), pg_last_xlog_replay_location(), pg_last_xact_replay_timestamp(), CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay;
To check master IP from slave, dba.stackexchange.com/a/45062/123063
@SergeyNemchinov i guess that query works only on the backup, not on the primary? ``` cloudmaster=# select pg_last_xlog_receive_location(); ERROR: function pg_last_xlog_receive_location() does not exist ```
@MarcosDione It seems that the pg_last_xlog_receive_location() function does not exist in newer versions after 9.x
31

On your master, pg_stat_replication provides data about ongoing replication:

select client_addr, state, sent_location, write_location,
        flush_location, replay_location from pg_stat_replication;

On postgresql v10:

select client_addr, state, sent_lsn, write_lsn,
    flush_lsn, replay_lsn from pg_stat_replication;

2 Comments

Ok, what's wrong then if the above query returns 0 rows?
0 rows means you have no replication configured at your server. When configured correctly should get a row(s) but that doesn't mean you have replication either as the slave may not be connecting (check 'state' column)
-1

After starting backup, check from primary:

psql -c "select TO_CHAR(backup_streamed::decimal/backup_total*100,'fm00D00%'),phase from pg_stat_progress_basebackup;" -t

For larger databases (>> 1TB) each replication phase might take significant time, thus it might be important realize in which stage of replication you are at.

When performing full DB replication there are multiple stages:

  • A) full backup - pg_basebackup cloning database from primary to a standby server (standby is not running)
  • B) recovery - standby server starting up (not accepting connections yet)
  • C) WAL streaming - Both primary and standby(s) in consistent state and accepting connections

A) Full backup

Since PostgreSQL 13 you can use pg_stat_progress_basebackup table, where you can get progress for each replication phase (from primary):

psql -c "select TO_CHAR(backup_streamed::decimal/backup_total*100,'fm00D00%'),phase from pg_stat_progress_basebackup;" -t
 31.79%  | streaming database files

The replication process has multiple phases, progress is reported per each phase:

  1. initializing
  2. starting file transfer
  3. streaming database files
  4. finishing file transfer
  5. transferring WAL

On earlier PostgreSQL version you're left with plain disk usage:

$ primary=$(du -s /var/lib/postgresql/14/main/base | awk '{print $1}')
$ replica=4648394012 # same command as above only from replica
$ echo ${replica}/${primary}*100 | bc -l
34.90793993287965597200

pg_stat_replication gives you e.g. replay_lag which doesn't help much with overall progress. You can only estimate how many WAL you might actually need to keep.

psql -c "select state, client_hostname, write_lag, replay_lag, flush_lag from pg_stat_replication;"
   state   | client_hostname |    write_lag    |   replay_lag    |    flush_lag    
-----------+-----------------+-----------------+-----------------+-----------------
 backup    |                 |                 |                 | 
 streaming |                 | 00:00:00.000516 | 12:23:15.991732 | 00:00:00.310261
(2 rows)

B) recovery

On primary server, use pg_stat_replication table:

psql -c "select state, client_hostname, write_lag, replay_lag, flush_lag from pg_stat_replication;"

  state  | client_hostname |    write_lag    |   replay_lag    |    flush_lag    
---------+-----------------+-----------------+-----------------+-----------------
 catchup |                 | 00:04:50.006556 | 00:04:50.006556 | 00:04:50.006556

Standby server has finished the first stage, assuming the server was able to start (correct configuration of critical parameters). Now the postgres process has to replay accumulated WALs on the standby or copy them from archive (e.g. using archive_command) in order to reach consistent recovery state. You can obtain the LSN using pg_controldata:

$ /usr/lib/postgresql/14/bin/pg_controldata /var/lib/postgresql/14/main | grep "Minimum recovery"
Minimum recovery ending location:     4034A/1BFFFB58

NOTE: There's no record in pg_stat_replication table on the primary, at this stage.

You'll notice in logs something like:

FATAL:  the database system is not yet accepting connections
DETAIL:  Consistent recovery state has not been yet reached.
LOG:  restored log file "000000020004022E00000029" from archive

The HEX number points to a WAL file, that can be dumped (typically single line is enough)

/usr/lib/postgresql/14/bin/pg_waldump 14/main/pg_wal/00000002000402340000003D -n 1
rmgr: Heap        len (rec/tot):     48/    48, tx: 1350166609, lsn: 40234/F4001C08, prev 40234/F3FFFDB0, desc: HEAP_CONFIRM off 52, blkref #0: rel 1663/16421/2324630632 blk 85514

we're looking for the LSN part, e.g. lsn: 40234/F4001C08. Now, when we get back to the primary server, we can obtain the lag in file size units:

 psql -c "SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '40234/F4001C08'));" -t
 1813 GB

1813 GB is the diff between current WAL and the one being replayed.

Another option is to use pg_controldata

$ /usr/lib/postgresql/14/bin/pg_controldata /var/lib/postgresql/14/main | grep location
Latest checkpoint's REDO location:    40335/C001D040

where you can obtain following LSN pointers:

  • Backup start location
  • Latest checkpoint's REDO location - current recovery position
  • Minimum recovery ending location

C) WAL streaming

Primary and standby are in almost the same state, both servers are accepting connections. Except there might be some lag between those two servers, which might be caused by frequency of writes to WAL, network transfer speed, etc.

On standby:

psql -c "SELECT extract(epoch from now() - pg_last_xact_replay_timestamp());"

On primary:

psql -c "select state, client_hostname, write_lag, replay_lag, flush_lag from pg_stat_replication;"

This will give you time difference between two servers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.