合 PostgreSQL高可用之repmgr（1主2从+1witness）+Pgpool-II实现主从切换+读写分离+负载均衡

发布日期 2021年6月21日 · 已更新 2022年1月20日

Tags： PG 高可用 Pgpool-II pgpoolAdmin 负载均衡读写分离 repmgr 主从流复制主从切换 witness

👉 本文共约2967个字，系统预计阅读时间或需12分钟。

6.1、switchover正常主从切换
6.1.1、把备库62变为主库
6.1.2、把新备库61切换为主库
6.2、failover切换
6.2.1、failover异常手工主从切换
6.2.2、failover异常自动主从切换
7.1、安装Pgpool-II
7.2、配置pgpool.conf
7.3、在主库创建用户
7.4、启动Pgpool-II
7.5、查询
7.6、switchover故障转移后，查询读写分离

五、检查repmgr集群

repmgr -f /pg13/pg13/repmgr.conf  cluster matrix 
repmgr -f /pg13/pg13/repmgr.conf  cluster crosscheck 

repmgr -f /pg13/pg13/repmgr.conf node status
repmgr -f /pg13/pg13/repmgr.conf node check

repmgr -f /pg13/pg13/repmgr.conf cluster matrix

repmgr -f /pg13/pg13/repmgr.conf cluster crosscheck

repmgr -f /pg13/pg13/repmgr.conf node status

repmgr -f /pg13/pg13/repmgr.conf node check

一些结果：

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf  cluster matrix 
INFO: connecting to database
 Name           | ID | 1 | 2 | 3 | 4
----------------+----+---+---+---+---
 lhrrepmgr64361 | 1  | * | * | * | * 
 lhrrepmgr64362 | 2  | * | * | * | * 
 lhrrepmgr64363 | 3  | * | * | * | * 
 lhrrepmgr64364 | 4  | * | * | * | * 

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status
Node "lhrrepmgr64361":
        PostgreSQL version: 13.2
        Total data size: 37 MB
        Conninfo: host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2
        Role: primary
        WAL archiving: enabled
        Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f
        WALs pending archiving: 0 pending files
        Replication connections: 2 (of maximal 10)
        Replication slots: 0 physical (of maximal 10; 0 missing)
        Replication lag: n/a

[pg13@lhrrepmgr64361 ~]$ 
[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf node check
Node "lhrrepmgr64361":
        Server role: OK (node is primary)
        Replication lag: OK (N/A - node is primary)
        WAL archiving: OK (0 pending archive ready files)
        Upstream connection: OK (N/A - node is primary)
        Downstream servers: OK (2 of 2 downstream nodes attached)
        Replication slots: OK (node has no physical replication slots)
        Missing physical replication slots: OK (node has no missing physical replication slots)
        Configured data directory: OK (configured "data_directory" is "/pg13/pgdata")

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status
Node "lhrrepmgr64362":
        PostgreSQL version: 13.2
        Total data size: 37 MB
        Conninfo: host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2
        Role: standby
        WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)
        Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f
        WALs pending archiving: 0 pending files
        Replication connections: 0 (of maximal 10)
        Replication slots: 0 physical (of maximal 10; 0 missing)
        Upstream node: lhrrepmgr64361 (ID: 1)
        Replication lag: 0 seconds
        Last received LSN: 0/8001480
        Last replayed LSN: 0/8001480

[pg13@lhrrepmgr64364 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status
Node "lhrrepmgr64364":
        PostgreSQL version: 13.2
        Total data size: 29 MB
        Conninfo: host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2
        Role: witness
        WAL archiving: enabled
        Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f
        WALs pending archiving: 0 pending files
        Replication connections: 0 (of maximal 10)
        Replication slots: 0 physical (of maximal 10; 0 missing)
        Replication lag: n/a

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster matrix

INFO: connecting to database

Name | ID | 1 | 2 | 3 | 4

----------------+----+---+---+---+---

lhrrepmgr64361 | 1 | * | * | * | *

lhrrepmgr64362 | 2 | * | * | * | *

lhrrepmgr64363 | 3 | * | * | * | *

lhrrepmgr64364 | 4 | * | * | * | *

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status

Node "lhrrepmgr64361":

PostgreSQL version: 13.2

Total data size: 37 MB

Conninfo: host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2

Role: primary

WAL archiving: enabled

Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f

WALs pending archiving: 0 pending files

Replication connections: 2 (of maximal 10)

Replication slots: 0 physical (of maximal 10; 0 missing)

Replication lag: n/a

[pg13@lhrrepmgr64361 ~]$

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf node check

Node "lhrrepmgr64361":

Server role: OK (node is primary)

Replication lag: OK (N/A - node is primary)

WAL archiving: OK (0 pending archive ready files)

Upstream connection: OK (N/A - node is primary)

Downstream servers: OK (2 of 2 downstream nodes attached)

Replication slots: OK (node has no physical replication slots)

Missing physical replication slots: OK (node has no missing physical replication slots)

Configured data directory: OK (configured "data_directory" is "/pg13/pgdata")

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status

Node "lhrrepmgr64362":

PostgreSQL version: 13.2

Total data size: 37 MB

Conninfo: host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2

Role: standby

WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)

Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f

WALs pending archiving: 0 pending files

Replication connections: 0 (of maximal 10)

Replication slots: 0 physical (of maximal 10; 0 missing)

Upstream node: lhrrepmgr64361 (ID: 1)

Replication lag: 0 seconds

Last received LSN: 0/8001480

Last replayed LSN: 0/8001480

[pg13@lhrrepmgr64364 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status

Node "lhrrepmgr64364":

PostgreSQL version: 13.2

Total data size: 29 MB

Conninfo: host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2

Role: witness

WAL archiving: enabled

Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f

WALs pending archiving: 0 pending files

Replication connections: 0 (of maximal 10)

Replication slots: 0 physical (of maximal 10; 0 missing)

Replication lag: n/a

六、主从切换

6.1、switchover正常主从切换

switchover切换即主变备，备变主。

官网：https://repmgr.org/docs/current/repmgr-standby-switchover.html

6.1.1、把备库62变为主库

在62上操作：

repmgr -f /pg13/pg13/repmgr.conf cluster show
repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
repmgr -f /pg13/pg13/repmgr.conf cluster show

-- 可以debug打印详细的切换过程
repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose

repmgr -f /pg13/pg13/repmgr.conf cluster show

repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind

repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind

repmgr -f /pg13/pg13/repmgr.conf cluster show

-- 可以debug打印详细的切换过程

repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose

--siblings-follow 表示所有的从库同步源自动改成最新的主库节点
--force-rewind 如果repmgr检测到需要执行pg_rewind(同步)的时候，在执行pg_rewind之前，在新主节点执行checkpoint

执行过程：

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | lhrrepmgr64361 | primary | * running |                | default  | 100      | 3        | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 2  | lhrrepmgr64362 | standby |   running | lhrrepmgr64361 | default  | 100      | 3        | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 3  | lhrrepmgr64363 | standby |   running | lhrrepmgr64361 | default  | 100      | 3        | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 4  | lhrrepmgr64364 | witness | * running | lhrrepmgr64361 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2
[pg13@lhrrepmgr64362 ~]$ 
[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
NOTICE: checking switchover on node "lhrrepmgr64362" (ID: 2) in --dry-run mode
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "172.72.6.61" succeeded
INFO: able to execute "repmgr" on remote host "172.72.6.61"
INFO: all sibling nodes are reachable via SSH
INFO: 3 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "lhrrepmgr64362" (ID: 2) would be promoted to primary; current primary "lhrrepmgr64361" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "lhrrepmgr64361":
  "/pg13/pg13/bin/pg_ctl  -D '/pg13/pgdata' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
[pg13@lhrrepmgr64362 ~]$ 
[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
NOTICE: executing switchover on node "lhrrepmgr64362" (ID: 2)
NOTICE: local node "lhrrepmgr64362" (ID: 2) will be promoted to primary; current primary "lhrrepmgr64361" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "lhrrepmgr64361" (ID: 1)
NOTICE: issuing CHECKPOINT on node "lhrrepmgr64361" (ID: 1) 
DETAIL: executing server command "/pg13/pg13/bin/pg_ctl  -D '/pg13/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/9000028
NOTICE: promoting standby to primary
DETAIL: promoting server "lhrrepmgr64362" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "lhrrepmgr64362" (ID: 2) was successfully promoted to primary
NOTICE: issuing CHECKPOINT on node "lhrrepmgr64362" (ID: 2) 
ERROR: unable to execute CHECKPOINT
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 0/9000028; rejoin target node's fork point: 0/90000A0
NOTICE: setting node 1  upstream to node 2
WARNING: unable to ping "host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/pg13/pg13/bin/pg_ctl  -w -D '/pg13/pgdata' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
NOTICE: node  "lhrrepmgr64362" (ID: 2) promoted to primary, node "lhrrepmgr64361" (ID: 1) demoted to standby
NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "lhrrepmgr64362" is now primary and node "lhrrepmgr64361" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
[pg13@lhrrepmgr64362 ~]$ 
[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | lhrrepmgr64361 | standby |   running | lhrrepmgr64362 | default  | 100      | 3        | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 2  | lhrrepmgr64362 | primary | * running |                | default  | 100      | 4        | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 3  | lhrrepmgr64363 | standby |   running | lhrrepmgr64362 | default  | 100      | 3        | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 4  | lhrrepmgr64364 | witness | * running | lhrrepmgr64362 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2

C:\Users\lhrxxt>psql -U repmgr -h192.168.66.35 -p64361 -d repmgr
Password for user repmgr:
psql (13.2)
Type "help" for help.

repmgr=# select * from nodes;
 node_id | upstream_node_id | active |   node_name    |  type   | location | priority |                                 conninfo                                  | repluser | slot_name |      config_file
---------+------------------+--------+----------------+---------+----------+----------+---------------------------------------------------------------------------+----------+-----------+------------------------
       2 |                  | t      | lhrrepmgr64362 | primary | default  |      100 | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2 | repmgr   |           | /pg13/pg13/repmgr.conf
       1 |                2 | t      | lhrrepmgr64361 | standby | default  |      100 | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2 | repmgr   |           | /pg13/pg13/repmgr.conf
       3 |                2 | t      | lhrrepmgr64363 | standby | default  |      100 | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2 | repmgr   |           | /pg13/pg13/repmgr.conf
       4 |                2 | t      | lhrrepmgr64364 | witness | default  |        0 | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2 | repmgr   |           | /pg13/pg13/repmgr.conf
(4 rows)

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------

[pg13@lhrrepmgr64362 ~]$

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind

NOTICE: checking switchover on node "lhrrepmgr64362" (ID: 2) in --dry-run mode

INFO: prerequisites for using pg_rewind are met

INFO: SSH connection to host "172.72.6.61" succeeded

INFO: able to execute "repmgr" on remote host "172.72.6.61"

INFO: all sibling nodes are reachable via SSH

INFO: 3 walsenders required, 10 available

INFO: demotion candidate is able to make replication connection to promotion candidate

INFO: 0 pending archive files

INFO: replication lag on this standby is 0 seconds

NOTICE: local node "lhrrepmgr64362" (ID: 2) would be promoted to primary; current primary "lhrrepmgr64361" (ID: 1) would be demoted to standby

INFO: following shutdown command would be run on node "lhrrepmgr64361":

"/pg13/pg13/bin/pg_ctl -D '/pg13/pgdata' -W -m fast stop"

INFO: parameter "shutdown_check_timeout" is set to 60 seconds

INFO: prerequisites for executing STANDBY SWITCHOVER are met

[pg13@lhrrepmgr64362 ~]$

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind

NOTICE: executing switchover on node "lhrrepmgr64362" (ID: 2)

NOTICE: local node "lhrrepmgr64362" (ID: 2) will be promoted to primary; current primary "lhrrepmgr64361" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "lhrrepmgr64361" (ID: 1)

NOTICE: issuing CHECKPOINT on node "lhrrepmgr64361" (ID: 1)

DETAIL: executing server command "/pg13/pg13/bin/pg_ctl -D '/pg13/pgdata' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/9000028

NOTICE: promoting standby to primary

DETAIL: promoting server "lhrrepmgr64362" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "lhrrepmgr64362" (ID: 2) was successfully promoted to primary

NOTICE: issuing CHECKPOINT on node "lhrrepmgr64362" (ID: 2)

ERROR: unable to execute CHECKPOINT

INFO: local node 1 can attach to rejoin target node 2

DETAIL: local node's recovery point: 0/9000028; rejoin target node's fork point: 0/90000A0

NOTICE: setting node 1 upstream to node 2

WARNING: unable to ping "host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/pg13/pg13/bin/pg_ctl -w -D '/pg13/pgdata' start"

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

NOTICE: node "lhrrepmgr64362" (ID: 2) promoted to primary, node "lhrrepmgr64361" (ID: 1) demoted to standby

NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings

INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes

NOTICE: switchover was successful

DETAIL: node "lhrrepmgr64362" is now primary and node "lhrrepmgr64361" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

[pg13@lhrrepmgr64362 ~]$

[pg13@lhrrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------

C:\Users\lhrxxt>psql -U repmgr -h192.168.66.35 -p64361 -d repmgr

Password for user repmgr:

psql (13.2)

Type "help" for help.

repmgr=# select * from nodes;

---------+------------------+--------+----------------+---------+----------+----------+---------------------------------------------------------------------------+----------+-----------+------------------------

(4 rows)

可以看到，主库变为lhrrepmgr64362，而lhrrepmgr64361变为了从库。

6.1.2、把新备库61切换为主库

-- 在61上做操作
repmgr -f /pg13/pg13/repmgr.conf cluster show
repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind

-- 可以debug打印详细的切换过程
repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose

-- 在61上做操作

repmgr -f /pg13/pg13/repmgr.conf cluster show

repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind

repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind

-- 可以debug打印详细的切换过程

repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose

执行过程：

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | lhrrepmgr64361 | standby |   running | lhrrepmgr64362 | default  | 100      | 4        | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 2  | lhrrepmgr64362 | primary | * running |                | default  | 100      | 4        | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 3  | lhrrepmgr64363 | standby |   running | lhrrepmgr64362 | default  | 100      | 4        | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 4  | lhrrepmgr64364 | witness | * running | lhrrepmgr64362 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2
[pg13@lhrrepmgr64361 ~]$ 
[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
NOTICE: checking switchover on node "lhrrepmgr64361" (ID: 1) in --dry-run mode
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "172.72.6.62" succeeded
INFO: able to execute "repmgr" on remote host "172.72.6.62"
INFO: all sibling nodes are reachable via SSH
INFO: 3 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "lhrrepmgr64361" (ID: 1) would be promoted to primary; current primary "lhrrepmgr64362" (ID: 2) would be demoted to standby
INFO: following shutdown command would be run on node "lhrrepmgr64362":
  "/pg13/pg13/bin/pg_ctl  -D '/pg13/pgdata' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
[pg13@lhrrepmgr64361 ~]$ 
[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
NOTICE: executing switchover on node "lhrrepmgr64361" (ID: 1)
NOTICE: local node "lhrrepmgr64361" (ID: 1) will be promoted to primary; current primary "lhrrepmgr64362" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "lhrrepmgr64362" (ID: 2)
NOTICE: issuing CHECKPOINT on node "lhrrepmgr64362" (ID: 2) 
DETAIL: executing server command "/pg13/pg13/bin/pg_ctl  -D '/pg13/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/A000028
NOTICE: promoting standby to primary
DETAIL: promoting server "lhrrepmgr64361" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "lhrrepmgr64361" (ID: 1) was successfully promoted to primary
NOTICE: issuing CHECKPOINT on node "lhrrepmgr64361" (ID: 1) 
ERROR: unable to execute CHECKPOINT
INFO: local node 2 can attach to rejoin target node 1
DETAIL: local node's recovery point: 0/A000028; rejoin target node's fork point: 0/A0000A0
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/pg13/pg13/bin/pg_ctl  -w -D '/pg13/pgdata' start"
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1
NOTICE: node  "lhrrepmgr64361" (ID: 1) promoted to primary, node "lhrrepmgr64362" (ID: 2) demoted to standby
NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "lhrrepmgr64361" is now primary and node "lhrrepmgr64362" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | lhrrepmgr64361 | primary | * running |                | default  | 100      | 5        | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 2  | lhrrepmgr64362 | standby |   running | lhrrepmgr64361 | default  | 100      | 4        | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 3  | lhrrepmgr64363 | standby |   running | lhrrepmgr64361 | default  | 100      | 4        | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 4  | lhrrepmgr64364 | witness | * running | lhrrepmgr64361 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------

[pg13@lhrrepmgr64361 ~]$

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind

NOTICE: checking switchover on node "lhrrepmgr64361" (ID: 1) in --dry-run mode

INFO: prerequisites for using pg_rewind are met

INFO: SSH connection to host "172.72.6.62" succeeded

INFO: able to execute "repmgr" on remote host "172.72.6.62"

INFO: all sibling nodes are reachable via SSH

INFO: 3 walsenders required, 10 available

INFO: demotion candidate is able to make replication connection to promotion candidate

INFO: 0 pending archive files

INFO: replication lag on this standby is 0 seconds

NOTICE: local node "lhrrepmgr64361" (ID: 1) would be promoted to primary; current primary "lhrrepmgr64362" (ID: 2) would be demoted to standby

INFO: following shutdown command would be run on node "lhrrepmgr64362":

"/pg13/pg13/bin/pg_ctl -D '/pg13/pgdata' -W -m fast stop"

INFO: parameter "shutdown_check_timeout" is set to 60 seconds

INFO: prerequisites for executing STANDBY SWITCHOVER are met

[pg13@lhrrepmgr64361 ~]$

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind

NOTICE: executing switchover on node "lhrrepmgr64361" (ID: 1)

NOTICE: local node "lhrrepmgr64361" (ID: 1) will be promoted to primary; current primary "lhrrepmgr64362" (ID: 2) will be demoted to standby

NOTICE: stopping current primary node "lhrrepmgr64362" (ID: 2)

NOTICE: issuing CHECKPOINT on node "lhrrepmgr64362" (ID: 2)

DETAIL: executing server command "/pg13/pg13/bin/pg_ctl -D '/pg13/pgdata' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/A000028

NOTICE: promoting standby to primary

DETAIL: promoting server "lhrrepmgr64361" (ID: 1) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "lhrrepmgr64361" (ID: 1) was successfully promoted to primary

NOTICE: issuing CHECKPOINT on node "lhrrepmgr64361" (ID: 1)

ERROR: unable to execute CHECKPOINT

INFO: local node 2 can attach to rejoin target node 1

DETAIL: local node's recovery point: 0/A000028; rejoin target node's fork point: 0/A0000A0

NOTICE: setting node 2's upstream to node 1

WARNING: unable to ping "host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/pg13/pg13/bin/pg_ctl -w -D '/pg13/pgdata' start"

NOTICE: NODE REJOIN successful

DETAIL: node 2 is now attached to node 1

NOTICE: node "lhrrepmgr64361" (ID: 1) promoted to primary, node "lhrrepmgr64362" (ID: 2) demoted to standby

NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings

INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes

NOTICE: switchover was successful

DETAIL: node "lhrrepmgr64361" is now primary and node "lhrrepmgr64362" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------

6.2、failover切换

6.2.1、failover异常手工主从切换

主库出现故障，然后直接将从库提升为主库：

-- 主库61宕机
pg_ctl -m fast stop

-- 从库63提升为主库
repmgr -f /pg13/pg13/repmgr.conf cluster show
repmgr -f /pg13/pg13/repmgr.conf --siblings-follow standby promote
repmgr -f /pg13/pg13/repmgr.conf cluster show

-- 等原主库61修复之后，由于新提升的主库63数据已经很新了，所以只能把61作为备库来运行
pg_ctl stop
repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --dry-run --verbose
repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose

-- 主库61宕机

pg_ctl -m fast stop

-- 从库63提升为主库

repmgr -f /pg13/pg13/repmgr.conf cluster show

repmgr -f /pg13/pg13/repmgr.conf --siblings-follow standby promote

repmgr -f /pg13/pg13/repmgr.conf cluster show

-- 等原主库61修复之后，由于新提升的主库63数据已经很新了，所以只能把61作为备库来运行

pg_ctl stop

repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --dry-run --verbose

repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose

执行过程：

-- 主库61宕机
[pg13@lhrrepmgr64361 ~]$ pg_ctl -m fast stop
waiting for server to shut down.... done
server stopped
[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
ERROR: connection to database failed
DETAIL: 
could not connect to server: Connection refused
        Is the server running on host "172.72.6.61" and accepting
        TCP/IP connections on port 5432?

DETAIL: attempted to connect using:
  user=repmgr password=lhr connect_timeout=2 dbname=repmgr host=172.72.6.61 fallback_application_name=repmgr options=-csearch_path=

-- 从库63提升为主库  
[pg13@lhrrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status        | Upstream         | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+---------------+------------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | lhrrepmgr64361 | primary | ? unreachable | ?                | default  | 100      |          | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 2  | lhrrepmgr64362 | standby |   running     | ? lhrrepmgr64361 | default  | 100      | 5        | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 3  | lhrrepmgr64363 | standby |   running     | ? lhrrepmgr64361 | default  | 100      | 5        | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 4  | lhrrepmgr64364 | witness | * running     | ? lhrrepmgr64361 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "lhrrepmgr64361" (ID: 1)
  - node "lhrrepmgr64361" (ID: 1) is registered as an active primary but is unreachable
  - unable to connect to node "lhrrepmgr64362" (ID: 2)'s upstream node "lhrrepmgr64361" (ID: 1)
  - unable to determine if node "lhrrepmgr64362" (ID: 2) is attached to its upstream node "lhrrepmgr64361" (ID: 1)
  - unable to connect to node "lhrrepmgr64363" (ID: 3)'s upstream node "lhrrepmgr64361" (ID: 1)
  - unable to determine if node "lhrrepmgr64363" (ID: 3) is attached to its upstream node "lhrrepmgr64361" (ID: 1)
  - unable to connect to node "lhrrepmgr64364" (ID: 4)'s upstream node "lhrrepmgr64361" (ID: 1)'

HINT: execute with --verbose option to see connection error messages
[pg13@lhrrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf --siblings-follow standby promote
NOTICE: promoting standby to primary
DETAIL: promoting server "lhrrepmgr64363" (ID: 3) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "lhrrepmgr64363" (ID: 3) was successfully promoted to primary
NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
[pg13@lhrrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show                     
 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | lhrrepmgr64361 | primary | - failed  | ?              | default  | 100      |          | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 2  | lhrrepmgr64362 | standby |   running | lhrrepmgr64363 | default  | 100      | 5        | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 3  | lhrrepmgr64363 | primary | * running |                | default  | 100      | 6        | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 4  | lhrrepmgr64364 | witness | * running | lhrrepmgr64363 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "lhrrepmgr64361" (ID: 1)

HINT: execute with --verbose option to see connection error messages

-- 主库61宕机

[pg13@lhrrepmgr64361 ~]$ pg_ctl -m fast stop

waiting for server to shut down.... done

server stopped

[pg13@lhrrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

ERROR: connection to database failed

DETAIL:

could not connect to server: Connection refused

Is the server running on host "172.72.6.61" and accepting

TCP/IP connections on port 5432?

DETAIL: attempted to connect using:

user=repmgr password=lhr connect_timeout=2 dbname=repmgr host=172.72.6.61 fallback_application_name=repmgr options=-csearch_path=

-- 从库63提升为主库

[pg13@lhrrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

----+----------------+---------+---------------+------------------+----------+----------+----------+---------------------------------------------------------------------------

WARNING: following issues were detected

- unable to connect to node "lhrrepmgr64361" (ID: 1)

- node "lhrrepmgr64361" (ID: 1) is registered as an active primary but is unreachable

- unable to connect to node "lhrrepmgr64362" (ID: 2)'s upstream node "lhrrepmgr64361" (ID: 1)

- unable to determine if node "lhrrepmgr64362" (ID: 2) is attached to its upstream node "lhrrepmgr64361" (ID: 1)

- unable to connect to node "lhrrepmgr64363" (ID: 3)'s upstream node "lhrrepmgr64361" (ID: 1)

- unable to determine if node "lhrrepmgr64363" (ID: 3) is attached to its upstream node "lhrrepmgr64361" (ID: 1)

- unable to connect to node "lhrrepmgr64364" (ID: 4)'s upstream node "lhrrepmgr64361" (ID: 1)'

HINT: execute with --verbose option to see connection error messages

[pg13@lhrrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf --siblings-follow standby promote

NOTICE: promoting standby to primary

DETAIL: promoting server "lhrrepmgr64363" (ID: 3) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "lhrrepmgr64363" (ID: 3) was successfully promoted to primary

NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings

INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes

[pg13@lhrrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------

WARNING: following issues were detected

- unable to connect to node "lhrrepmgr64361" (ID: 1)

HINT: execute with --verbose option to see connection error messages

修复好61后，再启动61数据库，作为63的备库：

[pg13@lhrrepmgr64361 pg13]$  repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind  --verbose 
NOTICE: using provided configuration file "/pg13/pg13/repmgr.conf"
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3
DETAIL: rejoin target server  timeline 6 forked off current database system timeline 5 before current recovery point 1/1C000028
INFO: prerequisites for using pg_rewind are met
INFO: 0 files copied to "/tmp/repmgr-config-archive-lhrrepmgr64361"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/pg13/pg13/bin/pg_rewind -D '/pg13/pgdata' --source-server='host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2'"
ERROR: pg_rewind execution failed
DETAIL: pg_rewind: servers diverged at WAL location 1/190000A0 on timeline 5
pg_rewind: rewinding from last common checkpoint at 1/19000028 on timeline 5
pg_rewind: error: could not open file "/pg13/pgdata/pg_wal/00000005000000010000001A": No such file or directory
pg_rewind: fatal: could not read WAL record at 1/1A000000

[pg13@lhrrepmgr64361 pg13]$ cp ./archive/00000005000000010000001A /pg13/pgdata/pg_wal/00000005000000010000001A
[pg13@lhrrepmgr64361 pg13]$ repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind  --verbose 
NOTICE: using provided configuration file "/pg13/pg13/repmgr.conf"
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3
DETAIL: rejoin target server timeline 6 forked off current database system timeline 5 before current recovery point 1/1C000028
INFO: prerequisites for using pg_rewind are met
INFO: 0 files copied to "/tmp/repmgr-config-archive-lhrrepmgr64361"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/pg13/pg13/bin/pg_rewind -D '/pg13/pgdata' --source-server='host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2'"
NOTICE: 0 files copied to /pg13/pgdata
INFO: directory "/tmp/repmgr-config-archive-lhrrepmgr64361" deleted
NOTICE: setting node 1's upstream to node 3'
WARNING: unable to ping "host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/pg13/pg13/bin/pg_ctl  -w -D '/pg13/pgdata' start"
INFO: node "lhrrepmgr64361" (ID: 1) is pingable
INFO: node "lhrrepmgr64361" (ID: 1) has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 3
[pg13@lhrrepmgr64361 pg13]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | lhrrepmgr64361 | standby |   running | lhrrepmgr64363 | default  | 100      | 5        | host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 2  | lhrrepmgr64362 | standby |   running | lhrrepmgr64363 | default  | 100      | 6        | host=172.72.6.62 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 3  | lhrrepmgr64363 | primary | * running |                | default  | 100      | 6        | host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2
 4  | lhrrepmgr64364 | witness | * running | lhrrepmgr64363 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=lhr dbname=repmgr connect_timeout=2

[pg13@lhrrepmgr64361 pg13]$ repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose

NOTICE: using provided configuration file "/pg13/pg13/repmgr.conf"

NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3

DETAIL: rejoin target server timeline 6 forked off current database system timeline 5 before current recovery point 1/1C000028

INFO: prerequisites for using pg_rewind are met

INFO: 0 files copied to "/tmp/repmgr-config-archive-lhrrepmgr64361"

NOTICE: executing pg_rewind

DETAIL: pg_rewind command is "/pg13/pg13/bin/pg_rewind -D '/pg13/pgdata' --source-server='host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2'"

ERROR: pg_rewind execution failed

DETAIL: pg_rewind: servers diverged at WAL location 1/190000A0 on timeline 5

pg_rewind: rewinding from last common checkpoint at 1/19000028 on timeline 5

pg_rewind: error: could not open file "/pg13/pgdata/pg_wal/00000005000000010000001A": No such file or directory

pg_rewind: fatal: could not read WAL record at 1/1A000000

[pg13@lhrrepmgr64361 pg13]$ cp ./archive/00000005000000010000001A /pg13/pgdata/pg_wal/00000005000000010000001A

[pg13@lhrrepmgr64361 pg13]$ repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose

NOTICE: using provided configuration file "/pg13/pg13/repmgr.conf"

NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3

DETAIL: rejoin target server timeline 6 forked off current database system timeline 5 before current recovery point 1/1C000028

INFO: prerequisites for using pg_rewind are met

INFO: 0 files copied to "/tmp/repmgr-config-archive-lhrrepmgr64361"

NOTICE: executing pg_rewind

DETAIL: pg_rewind command is "/pg13/pg13/bin/pg_rewind -D '/pg13/pgdata' --source-server='host=172.72.6.63 user=repmgr password=lhr dbname=repmgr connect_timeout=2'"

NOTICE: 0 files copied to /pg13/pgdata

INFO: directory "/tmp/repmgr-config-archive-lhrrepmgr64361" deleted

NOTICE: setting node 1's upstream to node 3'

WARNING: unable to ping "host=172.72.6.61 user=repmgr password=lhr dbname=repmgr connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/pg13/pg13/bin/pg_ctl -w -D '/pg13/pgdata' start"

INFO: node "lhrrepmgr64361" (ID: 1) is pingable

INFO: node "lhrrepmgr64361" (ID: 1) has attached to its upstream node

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 3

[pg13@lhrrepmgr64361 pg13]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------

6.2.2、failover异常自动主从切换

生产建议配置自动failover切换，要配置自动failover，则需要做以下事情，在所有节点均配置：

1、在所有节点都配置：echo "shared_preload_libraries='repmgr' " >>/pg13/pgdata/postgresq.conf，配置后重启PG。该步骤我之前已配置过。

2、在所有节点都配置文件/pg13/pg13/repmgr.conf，增加内容包括：

cat >> /pg13/pg13/repmgr.conf << "EOF"

monitoring_history=yes
monitor_interval_secs=5
failover=automatic
reconnect_attempts=6
reconnect_interval=5
promote_command='repmgr standby promote -f /pg13/pg13/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /pg13/pg13/repmgr.conf --log-to-file --upstream-node-id=%n'
log_level=INFO
log_status_interval=10
log_file='/pg13/pg13/repmgr.log'
EOF

cat >> /etc/logrotate.conf <<"EOF"
/pg13/pg13/repmgr.log {
missingok
compress
rotate 30
daily
dateext
create 0600 pg13 pg13
}
EOF

cat >> /pg13/pg13/repmgr.conf << "EOF"

monitoring_history=yes

monitor_interval_secs=5

failover=automatic

reconnect_attempts=6

reconnect_interval=5

promote_command='repmgr standby promote -f /pg13/pg13/repmgr.conf --log-to-file'

follow_command='repmgr standby follow -f /pg13/pg13/repmgr.conf --log-to-file --upstream-node-id=%n'

log_level=INFO

log_status_interval=10

log_file='/pg13/pg13/repmgr.log'

EOF

cat >> /etc/logrotate.conf <<"EOF"

/pg13/pg13/repmgr.log {

missingok

compress

rotate 30

daily

dateext

create 0600 pg13 pg13

}

EOF

3、在所有节点都启动repmgrd进程

-- 启动
repmgrd -f /pg13/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid  --daemonize

-- 建议加到开机自动启动:/etc/rc.local
echo "repmgrd -f /pg13/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize" >> /etc/rc.local
chmod +x /etc/rc.d/rc.local

-- 停止
kill -9 `cat /tmp/repmgrd.pid`

-- 启动

repmgrd -f /pg13/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize

-- 建议加到开机自动启动:/etc/rc.local

echo "repmgrd -f /pg13/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize" >> /etc/rc.local

chmod +x /etc/rc.d/rc.local

-- 停止

kill -9 `cat /tmp/repmgrd.pid`

日志：

本人提供Oracle(OCP、OCM)、MySQL(OCP)、PostgreSQL(PGCA、PGCE、PGCM)等数据库的培训和考证业务，私聊QQ646634621或微信dbaup66，谢谢！

后续精彩内容已被站长无情隐藏，请输入验证码解锁本文！

获取验证码：请先关注本站微信公众号，然后回复“验证码”，获取验证码。在微信里搜索“AiDBA”或者“dbaup6”或者微信扫描右侧二维码都可以关注本站微信公众号。

打赏赞(14)

标签： PG 高可用 Pgpool-II pgpoolAdmin 负载均衡读写分离 repmgr 主从流复制主从切换 witness

小麦苗

学习或考证，均可联系麦老师，请加微信db_bao或QQ646634621

发表回复取消回复

要发表评论，您必须先登录。

1、声明：本网站有部分文章整合或转载自网络，文章著作权归属原作者或原团队；若有侵权，深表歉意，请联系本站站长进行处理，谢谢！！！
2、本站提供数据库的培训和考证业务，培训包含但不限于Oracle、MySQL、PostgreSQL、SQL Server、国产数据库等，考证包括但不限于Oracle OCP、Oracle OCM、MySQL OCP、PGCA、PGCE、PGCM等，私聊QQ646634621或微信dbaup66，非诚勿扰，谢谢！！！
3、麦老师新建了《西安DBA数据架构师》微信群，有志同道合的朋友可以私聊我入群，我们一起聊技术，聊生活，空了可以喝酒约饭。
4、为了方便大家更有针对性的讨论学习数据库，故新建数据库分类群，包括Oracle群、MySQL群、SQL Server群、PG&GP群、国产&信创群。志同道合的朋友可以入群聊技术聊人生，捣乱的勿进，喷子勿进，极端人士勿进，谢谢。

合 PostgreSQL高可用之repmgr（1主2从+1witness）+Pgpool-II实现主从切换+读写分离+负载均衡

五、检查repmgr集群

六、主从切换

6.1、switchover正常主从切换

6.1.1、把备库62变为主库

6.1.2、把新备库61切换为主库

6.2、failover切换

6.2.1、failover异常手工主从切换

6.2.2、failover异常自动主从切换

相关文章

您可能还喜欢...

发表回复取消回复

网站公告

网站寄语

本站其它工具

搜索本网站

标签云☁

网站日历

网站归档

网站分类

2024 年 9 月
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

合 PostgreSQL高可用之repmgr（1主2从+1witness）+Pgpool-II实现主从切换+读写分离+负载均衡

五、检查repmgr集群

六、主从切换

6.1、switchover正常主从切换

6.1.1、把备库62变为主库

6.1.2、把新备库61切换为主库

6.2、failover切换

6.2.1、failover异常手工主从切换

6.2.2、failover异常自动主从切换

相关文章

您可能还喜欢...

分别在MySQL5.7和8.0中测试主从复制中主库表缺失主键会导致主从延迟的情况

Greenplum数据库高可用性概述

PG不能启动，报错PANIC replication checkpoint has wrong magic 1714437491 instead of 307747550

发表回复 取消回复

网站公告

网站寄语

本站其它工具

搜索本网站

标签云☁

网站日历

网站归档

网站分类

发表回复取消回复