使用Patroni构建PostgreSQL HA集群实践

关山初度尘未洗,策马扬鞭再奋蹄!这篇文章主要讲述使用Patroni构建PostgreSQL HA集群实践相关的知识,希望能为你提供帮助。
使用Patroni构建PostgreSQL HA集群实践 前言?PostgreSQL是一款功能,性能,可靠性都可以和高端的国外商业数据库相媲美的开源数据库。而且PostgreSQL的许可和生态完全开放,不被任何一个单一的公司或国家所操控,保证了使用者没有后顾之忧。国内越来越多的企业开始用PostgreSQL代替原来昂贵的国外商业数据库。
?在部署PostgreSQL到生产环境中时,选择适合的高可用方案是一项必不可少的工作。本文介绍基于Patroni的PostgreSQL高可用的部署方法,供大家参考。
PostgreSQL的开源HA工具有很多种,下面几种算是比较常用的

  • PAF(PostgreSQL Automatic Failomianver)
  • repmgr
  • Patroni
它们的比较可以参考: https://scalegrid.io/blog/managing-high-availability-in-postgresql-part-1/
其中Patroni不仅简单易用而且功能非常强大。
  • 支持自动failover和按需switchover
    可以定时执行switchover
  • 支持一个和多个备节点
  • 支持级联复制
  • 支持同步复制,异步复制
  • 支持同步复制下备库故障时自动降级为异步复制(功效类似于mysql的半同步,但是更加智能)
  • 支持控制指定节点是否参与选主,是否参与负载均衡以及是否可以成为同步备机
  • 【使用Patroni构建PostgreSQL HA集群实践】支持通过pg_rewind自动修复旧主
  • 支持多种方式初始化集群和重建备机,包括pg_basebackup和支持wal_e,pgBackRest,barman等备份工具的自定义脚本
  • 支持自定义外部callback脚本
  • 支持REST API
  • 支持通过watchdog防止脑裂
  • 支持k8s,docker等容器化环境部署
  • 支持多种常见DCS(Distributed Configuration Store)存储元数据,包括etcd,ZooKeeper,Consul,Kubernetes
因此,Patroni是一款非常值得推荐的PostgreSQL高可用工具。下面将详细介绍基于Patroni搭建PostgreSQL高可用环境的步骤。
使用Patroni构建PostgreSQL HA集群实践

文章图片

环境节点说明
主机名 IP地址 角色 操作系统版本
HQ-SIT-kafka013 10.1.122.55 postgresql、patroni、ectd CentOS Linux release 7.5.1804
HQ-SIT-kafka024 10.1.122.56 postgresql、patroni、ectd CentOS Linux release 7.5.1804
HQ-SIT-kafka035 10.1.122.57 postgresql、patroni、ectd CentOS Linux release 7.5.1804
软件及下载地址
  • postgresql12-server-12.5-1PGDG.rhel7.x86_64.rpm、postgresql12-libs-12.5-1PGDG.rhel7.x86_64.rpm、postgresql12-12.5-1PGDG.rhel7.x86_64.rpm、postgresql12-contrib-12.5-1PGDG.rhel7.x86_64.rpm、python3-psycopg2-2.8.5-2.rhel7.x86_64.rpm
    https://mirrors.cloud.tencent.com/postgresql/repos/yum/12/redhat/rhel-7.5-x86_64/
  • patroni-1.6.5-1.rhel7.x86_64.rpm
    https://github.com/cybertec-postgresql/patroni-packaging/releases/download/1.6.5-1/patroni-1.6.5-1.rhel7.x86_64.rpm
  • etcd 3.3.11-2
    centos 系统自带extras repo里面有,直接yum install etcd即可。
VIP
10.1.122.222
操作系统参数设置 配置OS内核参数
编辑setup_sysctl.sh
touch /root/setup_sysctl.sh chmod +x /root/setup_sysctl.sh vi /root/setup_sysctl.sh

添加如下内存
#!/bin/bash MEMORY_KB=`grep MemTotal /proc/meminfo |awk \'{print $2}\'` MEMORY_BYTE=$((MEMORY_KB*1024)) PAGE_SIZE=`getconf PAGE_SIZE` #kernel.shmall设置为内存的80%,单位PAGE SHMALL=` expr $MEMORY_BYTE \\* 4 / 5 / $PAGE_SIZE` #kernel.shmmax 设置单个共享内存段大小,建议内存的一半,单位byte SHMMAX=` expr $MEMORY_BYTE / 2` sed -i \'/#---\\[begin postgresql\\]---#/,/#---\\[end postgresql\\]---#/d\' /etc/sysctl.conf cat > > /etc/sysctl.conf < < EOF #---[begin postgresql]---# fs.aio-max-nr = 1048576 fs.file-max = 76724600 kernel.sem = 4096 2147483647 2147483646 128 kernel.shmall = $SHMALL kernel.shmmax = $SHMMAX kernel.shmmni = 4096 net.core.netdev_max_backlog = 10000 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 4194304 net.core.somaxconn = 4096 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_keepalive_intvl = 20 #在tcp_keepalive_time之后,没有接收到对方确认,继续发送保活探测包次数,默认值为9(次) net.ipv4.tcp_keepalive_probes = 3 #KeepAlive的空闲时长,或者说每次正常发送心跳的周期,默认值为7200s(2小时) net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_mem = 8388608 12582912 16777216 net.ipv4.tcp_fin_timeout = 5 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_max_tw_buckets = 262144 net.ipv4.tcp_rmem = 8192 87380 16777216 net.ipv4.tcp_wmem = 8192 65536 16777216 vm.dirty_background_bytes = 409600000 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 95 vm.dirty_writeback_centisecs = 100 vm.mmap_min_addr = 65536 vm.overcommit_memory = 0 vm.overcommit_ratio = 90 vm.swappiness = 0 vm.zone_reclaim_mode = 0 net.ipv4.ip_local_port_range = 40000 65535 fs.nr_open=20480000 net.ipv4.tcp_max_syn_backlog = 16384 net.core.somaxconn = 16384 #---[end postgresql]---# EOF sysctl -p

执行脚本
/root/setup_sysctl.conf

配置OS资源限制
#!/bin/bash # nofile超过1048576的话,一定要先将sysctl的fs.nr_open设置为更大的值,并生效后才能继续设置nofile. sed -i \'/#\\[begin postgresql\\]/,/#\\[end postgresql\\]/d\' /etc/security/limits.conf cat > > /etc/security/limits.conf < < EOF #[begin postgresql] * softnofile1024000 * hardnofile1024000 * softnprocunlimited * hardnprocunlimited * softcoreunlimited * hardcoreunlimited * softmemlock unlimited * hardmemlock unlimited #[end postgresql] EOF

Selinux
如果没有这方面的需求,建议禁用
# vi /etc/sysconfig/selinux SELINUX=disabled SELINUXTYPE=targeted

Iptables
/sbin/chkconfig iptables off service iptables stop

firewalld
# systemctl stop firewalld.service # systemctl disable firewalld.service

部署文件系统
注意SSD对齐,延长寿命,避免写放大。
parted -s /dev/sda mklabel gpt parted -s /dev/sda mkpart primary 1MiB 100%

格式化(如果你选择ext4的话)
mkfs.ext4 /dev/sda1 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 -T largefile -L pgdata

建议使用的ext4 mount选项
# vi /etc/fstabLABEL=pgdata /pgdataext4defaults,noatime,nodiratime,nodelalloc,barrier=0,data=https://www.songbingjia.com/android/writeback0 0# mkdir /pgdata # mount -a

为什么需要data=https://www.songbingjia.com/android/writeback?
![writeback](使用Patroni构建PostgreSQL HA集群实践.assets/writeback.png)
设置磁盘调度
如果不是SSD的话,还是使用CFQ,否则建议使用DEADLINE。
临时设置(比如sda盘)
echo deadline > /sys/block/sda/queue/scheduler

注意,如果既有机械盘,又有SSD,那么可以使用/etc/rc.local,对指定磁盘修改为对应的调度策略。
关闭透明大页
# 及时生效 echo \'never\' > /sys/kernel/mm/transparent_hugepage/defrag echo \'never\' > /sys/kernel/mm/transparent_hugepage/enabled # 重启后自动生效 echo "echo \'never\' > /sys/kernel/mm/transparent_hugepage/defrag" > > /etc/rc.local echo "echo \'never\' > /sys/kernel/mm/transparent_hugepage/enabled" > > /etc/rc.local

关闭Numa
vim /etc/default/grub #GRUB_CMDLINE_LINUX最后添加numa=off GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=RootVG/lv_root rd.lvm.lv=RootVG/lv_swap net.ifnames=0 biosdevname=0 rhgb quiet numa=off" #重启服务器

安装postgresql 安装软件
yum install postgresql12-libs-12.5-1PGDG.rhel7.x86_64.rpm -y yum install postgresql12-12.5-1PGDG.rhel7.x86_64.rpm -y yum install postgresql12-server-12.5-1PGDG.rhel7.x86_64.rpm -y yum install postgresql12-contrib-12.5-1PGDG.rhel7.x86_64.rpm -y

postgres用户添加sudo权限
vi /etc/sudoers #添加下面内容 postgresALL=(ALL)NOPASSWD: ALL

安装配置etcd 安装软件
yum install etcd

修改配置文件
  • node1
vi /etc/etcd/ectd.conf ETCD_NAME="node1" ETCD_DATA_DIR="/var/lib/etcd/data.etcd" ETCD_LISTEN_PEER_URLS="http://10.1.122.55:2380" ETCD_LISTEN_CLIENT_URLS="http://10.1.122.55:2379,http://127.0.0.1:2379" ETCD_ADVERTISE_CLIENT_URLS="http://10.1.122.55:2379,http://127.0.0.1:2379" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.1.122.55:2380" ETCD_INITIAL_CLUSTER="node1=http://10.1.122.55:2380,node2=http://10.1.122.56:2380,node3=http://10.1.122.57:2380" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_CLUSTER_STATE="new" ETCD_ENABLE_V2="true"

  • node2
vi /etc/etcd/ectd.conf ETCD_NAME="node2" ETCD_DATA_DIR="/var/lib/etcd/data.etcd" ETCD_LISTEN_PEER_URLS="http://10.1.122.56:2380" ETCD_LISTEN_CLIENT_URLS="http://10.1.122.56:2379,http://127.0.0.1:2379" ETCD_ADVERTISE_CLIENT_URLS="http://10.1.122.56:2379,http://127.0.0.1:2379" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.1.122.56:2380" ETCD_INITIAL_CLUSTER="node1=http://10.1.122.55:2380,node2=http://10.1.122.56:2380,node3=http://10.1.122.57:2380" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_CLUSTER_STATE="new" ETCD_ENABLE_V2="true"

  • node3
ETCD_NAME="node3" ETCD_DATA_DIR="/var/lib/etcd/data.etcd" ETCD_LISTEN_PEER_URLS="http://10.1.122.57:2380" ETCD_LISTEN_CLIENT_URLS="http://10.1.122.57:2379,http://127.0.0.1:2379" ETCD_ADVERTISE_CLIENT_URLS="http://10.1.122.57:2379,http://127.0.0.1:2379" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.1.122.57:2380" ETCD_INITIAL_CLUSTER="node1=http://10.1.122.55:2380,node2=http://10.1.122.56:2380,node3=http://10.1.122.57:2380" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_CLUSTER_STATE="new" ETCD_ENABLE_V2="true"

启动ectd集群
systemctl daemon-reload systemctl enable etcd systemctl start etcd systemctl restart etcd

检查集群状态
选择任意节点执行如下命令
# etcdctl member list 168bd0d9329d6f57: name=node1 peerURLs=http://10.1.122.55:2380 clientURLs=http://10.1.122.55:2379,http://127.0.0.1:2379 isLeader=true 38381e9e86077014: name=node3 peerURLs=http://10.1.122.57:2380 clientURLs=http://10.1.122.57:2379,http://127.0.0.1:2379 isLeader=false f3c394c890a1a8c9: name=node2 peerURLs=http://10.1.122.56:2380 clientURLs=http://10.1.122.56:2379,http://127.0.0.1:2379 isLeader=false # etcdctl cluster-health member 168bd0d9329d6f57 is healthy: got healthy result from http://10.1.122.55:2379 member 38381e9e86077014 is healthy: got healthy result from http://10.1.122.57:2379 member f3c394c890a1a8c9 is healthy: got healthy result from http://10.1.122.56:2379 cluster is healthy

安装Watchdog适用场景
  • 主节点 patroni 进程被 kill
  • 主节点 patroni 因内存资源超出而照成的崩溃或者是高负载系统下 patroni 被卡死这样的 单点故障
  • 网络故障
处理方式
当遇到上述情景时,watch 会触发主节点系统重启,启动后用开机服务来自动开启 etcd和 patroni,postgres。从库在主库 down 掉后提升为主,原主在重启完毕恢复后降级为备机
安装软件
yum install watchdog -y

开启并授权
modprobe softdog chown postgres /dev/watchdog # 临时生效

回调脚本patroni可以通过回调脚本实现VIP自动漂移
vi /opt/app/patroni/bin/loadvip.sh

添加如下内容,根据环境修改前三行
#!/bin/bashVIP=10.1.122.222# VIP地址 GATEWAY=10.1.122.1 #网关,使用ip route命令查看 DEV=eth0#网卡设备action=$1 role=$2 cluster=$3log() { echo "loadvip: $*"|logger }load_vip() { ip a|grep -w ${DEV}|grep -w ${VIP} > /dev/null if [ $? -eq 0 ] ; then log "vip exists, skip load vip" else sudo ip addr add ${VIP}/32 dev ${DEV} > /dev/null rc=$? if [ $rc -ne 0 ] ; then log "fail to add vip ${VIP} at dev ${DEV} rc=$rc" exit 1 filog "added vip ${VIP} at dev ${DEV}"arping -U -I ${DEV} -s ${VIP} ${GATEWAY} -c 5 > /dev/null rc=$? if [ $rc -ne 0 ] ; then log "fail to call arping to gateway ${GATEWAY} rc=$rc" exit 1 filog "called arping to gateway ${GATEWAY}" fi }unload_vip() { ip a|grep -w ${DEV}|grep -w ${VIP} > /dev/null if [ $? -eq 0 ] ; then sudo ip addr del ${VIP}/32 dev ${DEV} > /dev/null rc=$? if [ $rc -ne 0 ] ; then log "fail to delete vip ${VIP} at dev ${DEV} rc=$rc" exit 1 filog "deleted vip ${VIP} at dev ${DEV}" else log "vip not exists, skip delete vip" fi }log "loadvip start args:\'$*\'"case $action in on_start|on_restart|on_role_change) case $role in master) load_vip ; ; replica) unload_vip ; ; *) log "wrong role \'$role\'" exit 1 ; ; esac ; ; on_stop) unload_vip ; ; *) log "wrong action \'$action\'" exit 1 ; ; esac

#赋予可执行权限 chmod +x /opt/app/patroni/bin/loadvip.sh

安装配置patroni 安装软件
yum install python3 -y yum install python3-psycopg2-2.8.5-2.rhel7.x86_64.rpm -y yum install patroni-1.6.5-1.rhel7.x86_64.rpm -y

配置环境变量
echo \'export PATRONICTL_CONFIG_FILE=/opt/app/patroni/etc/postgresql.yml\' > /etc/profile.d/patroni.sh source /etc/profile.d/patroni.sh

修改patroni启动服务
watchdog随patroni服务启动自动生效
vim /usr/lib/systemd/system/patroni.service #在[Service]节内添加如下两行 ExecStartPre=-/usr/bin/sudo /sbin/modprobe softdog ExecStartPre=-/usr/bin/sudo /bin/chown postgres /dev/watchdog

创建数据目录
mkdir -p /opt/postgres/12.5/data chown postgres.postgres -R /opt/postgres ln -s /opt/postgres/12.5/ /opt/postgres/pgsql

设置环境变量
su - postgres vi env_pg.sh export PGPORT=5432 export PGDATA=https://www.songbingjia.com/pgdata/postgresql12 export LANG=en_US.utf8 export PGHOME=/usr/pgsql-12 export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH export PATH=$PATH:$PGHOME/bin export DATE=`date +"%Y%m%d%H%M"` export MANPATH=$PGHOME/share/man:$MANPATH export PGHOST=$PGDATA export PGUSER=postgres export PGDATABASE=postgres alias rm=\'rm -i\' alias ll=\'ls -lh\'vi .bash_profile source env_pg.sh

配置Master节点
配置postgresql.yml
# 集群名称 scope: pgcluster namespace: /service/ #节点名称 name: pgnode1 restapi: #填本节点IP地址 listen: 10.1.122.55:8008 connect_address: 10.1.122.55:8008 etcd: hosts: - 10.1.122.55:2379 - 10.1.122.56:2379 - 10.1.122.57:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 master_start_timeout: 300 synchronous_mode: false postgresql: use_pg_rewind: true use_slots: true recovery_conf: restore_command: cp ../wal_archive/%f %p parameters: wal_level: replica hot_standby: "on" max_connections: 2000 superuser_reserved_connections: 10 max_prepared_transactions: 2000 max_wal_senders: 10 max_replication_slots: 10 wal_log_hints: "on" track_commit_timestamp: "off" archive_mode: "on" #wal日志归档目录为共享存储目录,只有master节点会归档 archive_command: mkdir -p ../wal_archive & & test ! -f ../wal_archive/%f & & cp %p ../wal_archive/%f unix_socket_directories: \'.\' tcp_keepalives_idle: 60 tcp_keepalives_interval: 10 tcp_keepalives_count: 10 vacuum_cost_delay: 0 bgwriter_delay: 10ms bgwriter_lru_maxpages: 1000 bgwriter_lru_multiplier: 10.0 bgwriter_flush_after: 256 max_parallel_workers_per_gather: 0 old_snapshot_threshold: -1 backend_flush_after: 256 synchronous_commit: off full_page_writes: on wal_buffers: 512MB wal_writer_delay: 10ms wal_writer_flush_after: 0 checkpoint_timeout: 30min max_wal_size: 16GB min_wal_size: 2GB checkpoint_completion_target: 0.5 checkpoint_flush_after: 256 random_page_cost: 1.3 parallel_tuple_cost: 0 parallel_setup_cost: 0 effective_cache_size: 4GB force_parallel_mode: off #------------log---------------------# logging_collector: on log_destination: \'stderr\' log_truncate_on_rotation: on log_checkpoints: on log_connections: on log_disconnections: on log_error_verbosity: default log_lock_waits: on log_temp_files: 0 log_autovacuum_min_duration: 0 log_min_duration_statement: 250 log_timezone: \'PRC\' log_filename: postgresql-%Y-%m-%d_%H.log log_line_prefix: \'%t [%p]: db=%d,user=%u,app=%a,client=%h \' #-----------auto explain------------# auto_explain.log_analyze: true auto_explain.log_buffers: true auto_explain.log_min_duration: 1000 auto_explain.log_nested_statements: true auto_explain.log_verbose: true vacuum_defer_cleanup_age: 0 hot_standby_feedback: off max_standby_archive_delay: 300s max_standby_streaming_delay: 300s autovacuum: on autovacuum_max_workers: 4 autovacuum_naptime: 45s autovacuum_vacuum_scale_factor: 0.1 autovacuum_analyze_scale_factor: 0.1 autovacuum_freeze_max_age: 1600000000 autovacuum_multixact_freeze_max_age: 1600000000 vacuum_freeze_table_age: 1500000000 vacuum_multixact_freeze_table_age: 1500000000 datestyle: \'iso, mdy\' timezone: \'PRC\' lc_messages: \'C\' lc_monetary: \'C\' lc_numeric: \'C\' lc_time: \'C\' default_text_search_config: \'pg_catalog.english\' shared_preload_libraries: \'pg_stat_statements,auto_explain\' autovacuum_work_mem: -1 shared_buffers: 2GB maintenance_work_mem: 512MB dynamic_shared_memory_type: posix huge_pages: off shared_memory_type: mmap temp_buffers: 8MB track_activity_query_size: 1024 work_mem: 4MB initdb: - encoding: UTF8 - locale: C - lc-ctype: zh_CN.UTF-8 - data-checksums pg_hba: - host replication replicator 10.1.122.0/24 md5 - host all all 0.0.0.0/0 md5 postgresql: #监听IP地址及端口 listen: 0.0.0.0:5432 #本节点IP地址,及监听端口 connect_address: 10.1.122.55:5432 data_dir: /opt/postgres/pgsql/data bin_dir: /usr/pgsql-12/bin pgpass: /tmp/pgpass0 authentication: replication: username: replicator password: rep-pass superuser: username: postgres password: postgres rewind: username: rewind_user password: rewind_user parameters: null callbacks: on_start: /bin/bash /opt/app/patroni/bin/loadvip.sh on_restart: /bin/bash /opt/app/patroni/bin/loadvip.sh on_role_change: /bin/bash /opt/app/patroni/bin/loadvip.sh on_stop: /bin/bash /opt/app/patroni/bin/loadvip.sh watchdog: mode: automatic device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: false

修改权限配置文件
chown postgres.postgres /opt/app/patroni/etc/postgresql.yml

启动patroni
systemctl enable patroni systemctl start patroni systemctl status patroni

使用Patroni构建PostgreSQL HA集群实践

文章图片
< img src=https://www.songbingjia.com/android/" 使用Patroni构建PostgreSQL HA集群实践.assets/image-20210111170305128.png" image-20210111170305128" style=" zoom:80%; " />
配置Slave节点
配置postgresql.yml
# 集群名称 scope: pgcluster namespace: /service/ #节点名称 name: pgnode2 restapi: #填本节点IP地址 listen: 10.1.122.56:8008 connect_address: 10.1.122.56:8008 etcd: hosts: - 10.1.122.55:2379 - 10.1.122.56:2379 - 10.1.122.57:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 master_start_timeout: 300 synchronous_mode: false postgresql: use_pg_rewind: true use_slots: true recovery_conf: restore_command: cp ../wal_archive/%f %p parameters: wal_level: replica hot_standby: "on" max_connections: 2000 superuser_reserved_connections: 10 max_prepared_transactions: 2000 max_wal_senders: 10 max_replication_slots: 10 wal_log_hints: "on" track_commit_timestamp: "off" archive_mode: "on" #wal日志归档目录为共享存储目录,只有master节点会归档 archive_command: mkdir -p ../wal_archive & & test ! -f ../wal_archive/%f & & cp %p ../wal_archive/%f unix_socket_directories: \'.\' tcp_keepalives_idle: 60 tcp_keepalives_interval: 10 tcp_keepalives_count: 10 vacuum_cost_delay: 0 bgwriter_delay: 10ms bgwriter_lru_maxpages: 1000 bgwriter_lru_multiplier: 10.0 bgwriter_flush_after: 256 max_parallel_workers_per_gather: 0 old_snapshot_threshold: -1 backend_flush_after: 256 synchronous_commit: off full_page_writes: on wal_buffers: 512MB wal_writer_delay: 10ms wal_writer_flush_after: 0 checkpoint_timeout: 30min max_wal_size: 16GB min_wal_size: 2GB checkpoint_completion_target: 0.5 checkpoint_flush_after: 256 random_page_cost: 1.3 parallel_tuple_cost: 0 parallel_setup_cost: 0 effective_cache_size: 4GB force_parallel_mode: off #------------log---------------------# logging_collector: on log_destination: \'stderr\' log_truncate_on_rotation: on log_checkpoints: on log_connections: on log_disconnections: on log_error_verbosity: default log_lock_waits: on log_temp_files: 0 log_autovacuum_min_duration: 0 log_min_duration_statement: 50 log_timezone: \'PRC\' log_filename: postgresql-%Y-%m-%d_%H.log log_line_prefix: \'%t [%p]: db=%d,user=%u,app=%a,client=%h \' #-----------auto explain------------# auto_explain.log_analyze: true auto_explain.log_buffers: true auto_explain.log_min_duration: 1000 auto_explain.log_nested_statements: true auto_explain.log_verbose: true vacuum_defer_cleanup_age: 0 hot_standby_feedback: off max_standby_archive_delay: 300s max_standby_streaming_delay: 300s autovacuum: on log_autovacuum_min_duration: 0 autovacuum_max_workers: 4 autovacuum_naptime: 45s autovacuum_vacuum_scale_factor: 0.1 autovacuum_analyze_scale_factor: 0.1 autovacuum_freeze_max_age: 1600000000 autovacuum_multixact_freeze_max_age: 1600000000 vacuum_freeze_table_age: 1500000000 vacuum_multixact_freeze_table_age: 1500000000 datestyle: \'iso, mdy\' timezone: \'PRC\' lc_messages: \'C\' lc_monetary: \'C\' lc_numeric: \'C\' lc_time: \'C\' default_text_search_config: \'pg_catalog.english\' shared_preload_libraries: \'pg_stat_statements,auto_explain\' autovacuum_work_mem: -1 shared_buffers: 2GB maintenance_work_mem: 512MB dynamic_shared_memory_type: posix huge_pages: off shared_memory_type: mmap temp_buffers: 8MB track_activity_query_size: 1024 work_mem: 4MB initdb: - encoding: UTF8 - locale: C - lc-ctype: zh_CN.UTF-8 - data-checksums pg_hba: - host replication replicator 10.1.122.0/24 md5 - host all all 0.0.0.0/0 md5 postgresql: #监听IP地址及端口 listen: 0.0.0.0:5432 #本节点IP地址,及监听端口 connect_address: 10.1.122.56:5432 data_dir: /opt/postgres/pgsql/data bin_dir: /usr/pgsql-12/bin pgpass: /tmp/pgpass0 authentication: replication: username: replicator password: rep-pass superuser: username: postgres password: postgres rewind: username: rewind_user password: rewind_user parameters: null callbacks: on_start: /bin/bash /opt/app/patroni/bin/loadvip.sh on_restart: /bin/bash /opt/app/patroni/bin/loadvip.sh on_role_change: /bin/bash /opt/app/patroni/bin/loadvip.sh on_stop: /bin/bash /opt/app/patroni/bin/loadvip.sh watchdog: mode: automatic device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: false

修改权限配置文件
chown postgres.postgres /opt/app/patroni/etc/postgresql.yml

启动patroni
systemctl enable patroni systemctl start patroni systemctl status patroni

使用Patroni构建PostgreSQL HA集群实践

文章图片

standby 数据库初始化成功
配置第二个salve节点
配置postgresql.yml
# 集群名称 scope: pgcluster namespace: /service/ #节点名称 name: pgnode3 restapi: #填本节点IP地址 listen: 10.1.122.57:8008 connect_address: 10.1.122.57:8008 etcd: hosts: - 10.1.122.55:2379 - 10.1.122.56:2379 - 10.1.122.57:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 master_start_timeout: 300 synchronous_mode: false postgresql: use_pg_rewind: true use_slots: true recovery_conf: restore_command: cp ../wal_archive/%f %p parameters: wal_level: replica hot_standby: "on" max_connections: 2000 superuser_reserved_connections: 10 max_prepared_transactions: 2000 max_wal_senders: 10 max_replication_slots: 10 wal_log_hints: "on" track_commit_timestamp: "off" archive_mode: "on" #wal日志归档目录为共享存储目录,只有master节点会归档 archive_command: mkdir -p ../wal_archive & & test ! -f ../wal_archive/%f & & cp %p ../wal_archive/%f unix_socket_directories: \'.\' tcp_keepalives_idle: 60 tcp_keepalives_interval: 10 tcp_keepalives_count: 10 vacuum_cost_delay: 0 bgwriter_delay: 10ms bgwriter_lru_maxpages: 1000 bgwriter_lru_multiplier: 10.0 bgwriter_flush_after: 256 max_parallel_workers_per_gather: 0 old_snapshot_threshold: -1 backend_flush_after: 256 synchronous_commit: off full_page_writes: on wal_buffers: 512MB wal_writer_delay: 10ms wal_writer_flush_after: 0 checkpoint_timeout: 30min max_wal_size: 16GB min_wal_size: 2GB checkpoint_completion_target: 0.5 checkpoint_flush_after: 256 random_page_cost: 1.3 parallel_tuple_cost: 0 parallel_setup_cost: 0 effective_cache_size: 4GB force_parallel_mode: off #------------log---------------------# logging_collector: on log_destination: \'stderr\' log_truncate_on_rotation: on log_checkpoints: on log_connections: on log_disconnections: on log_error_verbosity: default log_lock_waits: on log_temp_files: 0 log_autovacuum_min_duration: 0 log_min_duration_statement: 50 log_timezone: \'PRC\' log_filename: postgresql-%Y-%m-%d_%H.log log_line_prefix: \'%t [%p]: db=%d,user=%u,app=%a,client=%h \' #-----------auto explain------------# auto_explain.log_analyze: true auto_explain.log_buffers: true auto_explain.log_min_duration: 1000 auto_explain.log_nested_statements: true auto_explain.log_verbose: true vacuum_defer_cleanup_age: 0 hot_standby_feedback: off max_standby_archive_delay: 300s max_standby_streaming_delay: 300s autovacuum: on log_autovacuum_min_duration: 0 autovacuum_max_workers: 4 autovacuum_naptime: 45s autovacuum_vacuum_scale_factor: 0.1 autovacuum_analyze_scale_factor: 0.1 autovacuum_freeze_max_age: 1600000000 autovacuum_multixact_freeze_max_age: 1600000000 vacuum_freeze_table_age: 1500000000 vacuum_multixact_freeze_table_age: 1500000000 datestyle: \'iso, mdy\' timezone: \'PRC\' lc_messages: \'C\' lc_monetary: \'C\' lc_numeric: \'C\' lc_time: \'C\' default_text_search_config: \'pg_catalog.english\' shared_preload_libraries: \'pg_stat_statements,auto_explain\' autovacuum_work_mem: -1 shared_buffers: 2GB maintenance_work_mem: 512MB dynamic_shared_memory_type: posix huge_pages: off shared_memory_type: mmap temp_buffers: 8MB track_activity_query_size: 1024 work_mem: 4MB initdb: - encoding: UTF8 - locale: C - lc-ctype: zh_CN.UTF-8 - data-checksums pg_hba: - host replication replicator 10.1.122.0/24 md5 - host all all 0.0.0.0/0 md5 postgresql: #监听IP地址及端口 listen: 0.0.0.0:5432 #本节点IP地址,及监听端口 connect_address: 10.1.122.57:5432 data_dir: /opt/postgres/pgsql/data bin_dir: /usr/pgsql-12/bin pgpass: /tmp/pgpass0 authentication: replication: username: replicator password: rep-pass superuser: username: postgres password: postgres rewind: username: rewind_user password: rewind_user parameters: null callbacks: on_start: /bin/bash /opt/app/patroni/bin/loadvip.sh on_restart: /bin/bash /opt/app/patroni/bin/loadvip.sh on_role_change: /bin/bash /opt/app/patroni/bin/loadvip.sh on_stop: /bin/bash /opt/app/patroni/bin/loadvip.sh watchdog: mode: automatic device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: false

修改权限配置文件
chown postgres.postgres /opt/app/patroni/etc/postgresql.yml

启动patroni
systemctl enable patroni systemctl start patroni systemctl status patroni

使用Patroni构建PostgreSQL HA集群实践

文章图片

验证故障转移 关闭数据库
[root@HQ-SIT-kafka013 ~]# su - postgres postgres@HQ-SIT-kafka013-> pg_ctl stop waiting for server to shut down.... done server stopped

关闭patroni
systemctl stop patroni

在节点二上查看patroni的状态
使用Patroni构建PostgreSQL HA集群实践

文章图片

此时,节点3已经变为从库
启动patroni
再次启动节点1的patroni
systemctl stop patroni
使用Patroni构建PostgreSQL HA集群实践

文章图片

执行switchover
patronictl -c /opt/app/patroni/etc/postgresql.yml switchover

使用Patroni构建PostgreSQL HA集群实践

文章图片

当前master为pgnode3,将master切换到pgnode1上
pgnode1上查看ip地址
# ip a 1: lo: < LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: < BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000 link/ether 00:50:56:a9:28:dc brd ff:ff:ff:ff:ff:ff inet 10.1.122.55/24 brd 10.1.122.255 scope global eth0 valid_lft forever preferred_lft forever inet 10.1.122.222/32 scope global eth0 # vip valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fea9:28dc/64 scope link valid_lft forever preferred_lft forever You have mail in /var/spool/mail/root

查看/var/log/messages日志
使用Patroni构建PostgreSQL HA集群实践

文章图片

虚拟IP被添加到此节点
验证watchdog
[root@HQ-SIT-kafka013 bin]# patronictl -c /opt/app/patroni/etc/postgresql.yml list + Cluster: pgcluster (6916412917299165047) +----+-----------+ |Member |Host| Role| State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.55 || running | 27 |0| | pgnode2 | 10.1.122.56 || running | 27 |0| | pgnode3 | 10.1.122.57 | Leader | running | 27 || +---------+-------------+--------+---------+----+-----------+ # 杀死节点pgnode3上的patroni进程 [root@HQ-SIT-kafka035 bin]# ps -ef |grep patroni postgres366010 11:45 ?00:00:15 python3.6 /opt/app/patroni/bin/patroni /opt/app/patroni/etc/postgresql.yml root4799 247810 13:58 pts/000:00:00 grep --color=auto patroni [root@HQ-SIT-kafka035 bin]# kill -9 3660#稍等一会儿 [root@HQ-SIT-kafka013 bin]# patronictl -c /opt/app/patroni/etc/postgresql.yml list + Cluster: pgcluster (6916412917299165047) +----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.55 | Leader | running | 28 || | pgnode2 | 10.1.122.56 || running | 28 |0| +---------+-------------+--------+---------+----+-----------+ #现在master切换到pgnode1上#pgnode3已经被重启 [root@HQ-SIT-kafka035 ~]# uptime 14:00:33 up 0 min,1 user,load average: 1.15, 0.38, 0.14 #再次查看集群状态[root@HQ-SIT-kafka013 bin]# patronictl -c /opt/app/patroni/etc/postgresql.yml list + Cluster: pgcluster (6916412917299165047) +----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.55 | Leader | running | 28 || | pgnode2 | 10.1.122.56 || running | 28 |0| | pgnode3 | 10.1.122.57 || running | 27 |0| +---------+-------------+--------+---------+----+-----------+

patronictl运维集群常用集群运维命令
检查集群状态
# patronictl list + Cluster: pgcls (6919659663012391932) ----+----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.57 || running |6 |0 | | pgnode2 | 10.1.122.56 || running |6 |0 | | pgnode3 | 10.1.122.55 | Leader | running |6 || +---------+-------------+--------+---------+----+-----------+

重启集群
命令使用说明
Usage: patronictl restart [OPTIONS] CLUSTER_NAME [MEMBER_NAMES]...

重启指定节点
[root@HQ-SIT-kafka013 soft]# patronictl restart pgcls pgnode1 + Cluster: pgcls (6919659663012391932) ----+----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.57 || running |6 |0 | | pgnode2 | 10.1.122.56 || running |6 |0 | | pgnode3 | 10.1.122.55 | Leader | running |6 || +---------+-------------+--------+---------+----+-----------+ When should the restart take place (e.g. 2021-01-27T17:59)[now]: Are you sure you want to restart members pgnode1? [y/N]: y Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)[]: Success: restart on member pgnode1

重启集群
[root@HQ-SIT-kafka013 soft]# patronictl restart pgcls + Cluster: pgcls (6919659663012391932) ----+----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.57 || running |6 |0 | | pgnode2 | 10.1.122.56 || running |6 |0 | | pgnode3 | 10.1.122.55 | Leader | running |6 || +---------+-------------+--------+---------+----+-----------+ When should the restart take place (e.g. 2021-01-27T18:04)[now]: Are you sure you want to restart members pgnode3, pgnode1, pgnode2? [y/N]: y Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)[]: Success: restart on member pgnode3 Success: restart on member pgnode1 Success: restart on member pgnode2

定时重启
[root@HQ-SIT-kafka013 soft]# patronictl restart pgcls + Cluster: pgcls (6919659663012391932) ----+----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.57 || running |6 |0 | | pgnode2 | 10.1.122.56 || running |6 |0 | | pgnode3 | 10.1.122.55 | Leader | running |6 || +---------+-------------+--------+---------+----+-----------+ When should the restart take place (e.g. 2021-01-27T18:05)[now]: 2021-01-27T17:10 # 这里输入重启时间 Are you sure you want to schedule restart of members pgnode3, pgnode1, pgnode2 at 2021-01-27 17:10:00+08:00? [y/N]: y Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)[]: Success: restart scheduled on member pgnode3 Success: restart scheduled on member pgnode1 Success: restart scheduled on member pgnode2 [root@HQ-SIT-kafka013 soft]# patronictl list + Cluster: pgcls (6919659663012391932) ----+----+-----------+---------------------------+ |Member |Host|Role|State| TL | Lag in MB |Scheduled restart| +---------+-------------+--------+---------+----+-----------+---------------------------+ | pgnode1 | 10.1.122.57 || running |6 |0 | 2021-01-27T17:10:00+08:00 | | pgnode2 | 10.1.122.56 || running |6 |0 | 2021-01-27T17:10:00+08:00 | | pgnode3 | 10.1.122.55 | Leader | running |6 || 2021-01-27T17:10:00+08:00 | +---------+-------------+--------+---------+----+-----------+---------------------------+

取消定时重启
[root@HQ-SIT-kafka013 soft]# patronictl flush pgcls restart + Cluster: pgcls (6919659663012391932) ----+----+-----------+---------------------------+ |Member |Host|Role|State| TL | Lag in MB |Scheduled restart| +---------+-------------+--------+---------+----+-----------+---------------------------+ | pgnode1 | 10.1.122.57 || running |6 |0 | 2021-01-27T18:00:00+08:00 | | pgnode2 | 10.1.122.56 || running |6 |0 | 2021-01-27T18:00:00+08:00 | | pgnode3 | 10.1.122.55 | Leader | running |6 || 2021-01-27T18:00:00+08:00 | +---------+-------------+--------+---------+----+-----------+---------------------------+ Are you sure you want to flush members pgnode3, pgnode1, pgnode2? [y/N]: y Success: flush scheduled restart for member pgnode3 Success: flush scheduled restart for member pgnode1 Success: flush scheduled restart for member pgnode2 [root@HQ-SIT-kafka013 soft]# patronictl list + Cluster: pgcls (6919659663012391932) ----+----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.57 || running |6 |0 | | pgnode2 | 10.1.122.56 || running |6 |0 | | pgnode3 | 10.1.122.55 | Leader | running |6 || +---------+-------------+--------+---------+----+-----------+

switchover
[root@HQ-SIT-kafka013 soft]# patronictl switchover pgcls Master [pgnode3]: pgnode3 Candidate [\'pgnode1\', \'pgnode2\'] []: pgnode1 When should the switchover take place (e.g. 2021-01-27T18:14 )[now]:#这里输入时间,也可以定时switchover Current cluster topology + Cluster: pgcls (6919659663012391932) ----+----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.57 || running |6 |0 | | pgnode2 | 10.1.122.56 || running |6 |0 | | pgnode3 | 10.1.122.55 | Leader | running |6 || +---------+-------------+--------+---------+----+-----------+ Are you sure you want to switchover cluster pgcls, demoting current master pgnode3? [y/N]: y 2021-01-27 17:14:43.68997 Successfully switched over to "pgnode1" + Cluster: pgcls (6919659663012391932) ----+----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.57 | Leader | running |6 || | pgnode2 | 10.1.122.56 || running |6 |0 | | pgnode3 | 10.1.122.55 || stopped ||unknown| +---------+-------------+--------+---------+----+-----------+ #稍等一会 [root@HQ-SIT-kafka013 soft]# patronictl list + Cluster: pgcls (6919659663012391932) ----+----+-----------+ |Member |Host|Role|State| TL | Lag in MB | +---------+-------------+--------+---------+----+-----------+ | pgnode1 | 10.1.122.57 | Leader | running |7 || | pgnode2 | 10.1.122.56 || running |7 |0 | | pgnode3 | 10.1.122.55 || running |7 |0 | +---------+-------------+--------+---------+----+-----------+

修改数据库参数
Postgresql数据库交由patroni管理,不需直接修改数据库的postgresql.conf配置文件。即便修改了,也会被patroni再同步回旧值。
patroni的参数保存在两个地方
  1. DCS(ectd或zookeeper等分布式存储中)
    配置文件的bootstrap.dcs章节指定的参数,会在集群初始化的时候写入DCS中,后续配置文件中的bootstrap.dcs这部分参数不再生效。
    DCS中的配置为全局配置,对整个集群生效。
    修改DCS中的参数,通过patronictl edit-config命令实现
    [root@HQ-SIT-kafka013 soft]# patronictl edit-config --- +++ @@ -44,7 +44,7 @@ log_timezone: PRC log_truncate_on_rotation: true logging_collector: true -max_connections: 2000 +max_connections: 1000 max_parallel_workers_per_gather: 0 max_prepared_transactions: 2000 max_replication_slots: 10Apply these changes? [y/N]:y#保存时,会提示变更的内容 Configuration changed

    如修改的参数需要重启数据库才能生效
    [root@HQ-SIT-kafka013 soft]# patronictl list + Cluster: pgcls (6919659663012391932) ----+----+-----------+-----------------+ |Member |Host|Role|State| TL | Lag in MB | Pending restart |#Pending restart 为* 的节点表示有些参数修改完后需要重启数据库生效 +---------+-------------+--------+---------+----+-----------+-----------------+#否则修改完毕即生效 | pgnode1 | 10.1.122.57 | Leader | running |7 ||*| | pgnode2 | 10.1.122.56 || running |7 |0|*| | pgnode3 | 10.1.122.55 || running |7 |0|*| +---------+-------------+--------+---------+----+-----------+-----------------+

  2. patroni的配置文件中
    如需要对本地节点做定制化的配置,则修改配置文件中的postgresql章节内容。
    修改完毕后使用
    patronictl reload pgcls

    重新加载参数
附录 sysctl.conf参数说明
fs.aio-max-nr = 1048576 fs.file-max = 76724600 # 信号量, ipcs -l 或 -u 查看,每16个进程一组,每组信号量需要17个信号量。 kernel.sem = 4096 2147483647 2147483646 512000 # 所有共享内存段相加大小限制(建议内存的80%) kernel.shmall = 107374182 # 最大单个共享内存段大小(建议为内存一半), > 9.2的版本已大幅降低共享内存的使用 kernel.shmmax = 274877906944 # 一共能生成多少共享内存段,每个PG数据库集群至少2个共享内存段 kernel.shmmni = 819200 net.core.netdev_max_backlog = 10000 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 4194304 net.core.somaxconn = 4096 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_keepalive_intvl = 20 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_mem = 8388608 12582912 16777216 net.ipv4.tcp_fin_timeout = 5 net.ipv4.tcp_synack_retries = 2 # 开启SYN Cookies。当出现SYN等待队列溢出时,启用cookie来处理,可防范少量的SYN攻击 net.ipv4.tcp_syncookies = 1 # 减少time_wait net.ipv4.tcp_timestamps = 1 # 如果=1则开启TCP连接中TIME-WAIT套接字的快速回收,但是NAT环境可能导致连接失败,建议服务端关闭它 net.ipv4.tcp_tw_recycle = 0 # 开启重用。允许将TIME-WAIT套接字重新用于新的TCP连接 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_max_tw_buckets = 262144 net.ipv4.tcp_rmem = 8192 87380 16777216 net.ipv4.tcp_wmem = 8192 65536 16777216 #系统脏页到达这个值,系统后台刷脏页调度进程 pdflush(或其他) 自动将(dirty_expire_centisecs/100)秒前的脏页刷到磁盘 vm.dirty_background_bytes = 409600000 #比这个值老的脏页,将被刷到磁盘。3000表示30秒。 vm.dirty_expire_centisecs = 3000 #如果系统进程刷脏页太慢,使得系统脏页超过内存 95 % 时,则用户进程如果有写磁盘的操作(如fsync, fdatasync等调用),则需要主动把系统脏页刷出。 #有效防止用户进程刷脏页,在单机多实例,并且使用CGROUP限制单实例IOPS的情况下非常有效。 vm.dirty_ratio = 95 #pdflush(或其他)后台刷脏页进程的唤醒间隔, 100表示1秒。 vm.dirty_writeback_centisecs = 100vm.mmap_min_addr = 65536 #在分配内存时,允许少量over malloc, 如果设置为 1, 则认为总是有足够的内存,内存较少的测试环境可以使用 1 . vm.overcommit_memory = 0 #当overcommit_memory = 2 时,用于参与计算允许指派的内存大小。 vm.overcommit_ratio = 90 #关闭交换分区 vm.swappiness = 0 # 禁用 numa, 或者在vmlinux中禁止. vm.zone_reclaim_mode = 0 # 本地自动分配的TCP, UDP端口号范围 net.ipv4.ip_local_port_range = 40000 65535 # 单个进程允许打开的文件句柄上限 fs.nr_open=20480000 net.ipv4.tcp_max_syn_backlog = 16384 net.core.somaxconn = 16384# 以下参数请注意 # vm.extra_free_kbytes = 4096000 # vm.min_free_kbytes = 2097152 # vm.min_free_kbytes 建议每32G内存分配1G vm.min_free_kbytes # 如果是小内存机器,以上两个值不建议设置 # vm.nr_hugepages = 66536 #建议shared buffer设置超过64GB时 使用大页,页大小 /proc/meminfo Hugepagesize # vm.lowmem_reserve_ratio = 1 1 1 # 对于内存大于64G时,建议设置,否则建议默认值 256 256 32

patroni配置文件说明
#集群名,二级目录名Etcd:/< namespace> /< scope> /config scope: pgcluster #配置存储中的路径,Patroni将在其中保存关于集群的信息。##一级目录名Etcd:/< namespace> /< scope> /config namespace: /service/ #主机的名称,对于集群来说必须是唯一的 name: pgnode3restapi: # Patroni将监听REST API的IP地址(或主机名)和端口 listen: 10.1.122.57:8008 # 访问Patroni的REST API的IP地址(或主机名)和端口 # 集群的所有成员都必须能够连接到这个地址,所以除非Patroni设置是为了在本地主机中进行演示, # 否则这个地址必须是非“本地主机”或环回地址(即“localhost”或“127.0.0.1”)。 connect_address: 10.1.122.57:8008etcd: #提供主机来进行集群拓扑的初始发现: hosts: - 10.1.122.55:2379 - 10.1.122.56:2379 - 10.1.122.57:2379 #raft: #data_dir: . #self_addr: 127.0.0.1:2222 #partner_addrs: #- 127.0.0.1:2223 #- 127.0.0.1:2224# 本节集群初始化的时候会用到,一旦初始化完成后,这部分配置就不再被使用 bootstrap: # dcs配置: #1. 配置内容在集群初始化后被写入到分布式存储Etcd或者zookeeper中 #2. 分布式存储的路径/< namespace> /< scope> /config #3. 全局动态配置,集群共享 #4. 一旦分布式存储配置中有了dcs信息,后续的bootstrap.dcs的修改都是无效的,必须使用patronictl edit-config修改参数。 dcs: #获取leader锁的TTL(以秒为单位)。可以将其视为启动自动故障转移流程之前的时间长度。默认值:30 ttl: 30 #每次循环后休眠多久,单位秒,默认值为:10 loop_wait: 10 #DCS和PostgreSQL操作重试的超时时间(秒)。少于此长度的DCS或网络问题不会导致patroni将leader降级。默认值:10 retry_timeout: 10 #参与master选举的节点log最大延迟字节数 maximum_lag_on_failover: 1048576 #在触发故障转移之前,允许主机从故障中恢复的时间(以秒为单位)。默认为300秒。 #如果设置为0,则在检测到崩溃后立即进行故障转移。使用异步复制时,故障转移可能导致丢失事务。 #最坏情况下,主故障的故障转移时间是:loop_wait + master_start_timeout + loop_wait,除非master_start_timeout为0,在这种情况下,它只是loop_wait。 #根据您的耐久性/可用性权衡来设置值。 master_start_timeout: 300 #打开同步复制模式。在这种模式下,一个副本将被选择为同步副本,只有最新的leader和同步副本才能参与leader的选举。同步模式确保成功提交的事务不会在故障转移时丢失 synchronous_mode: false #当停止Postgres时允许Patroni等待的秒数,并且只有当synchronous_mode启用时生效。当设置为> 0并且启用了synchronous_mode时, #如果停止操作的运行时间超过了master_stop_timeout设置的值,那么Patroni将向postmaster发送SIGKILL。如果该参数没有设置或设置< = 0,则master_stop_timeout不生效。 master_stop_timeout: 10 #standby_cluster: #host: 127.0.0.1 #port: 1111 #primary_slot_name: patroni postgresql: use_pg_rewind: true use_slots: true recovery_conf: restore_command: cp ../wal_archive/%f %p # postgresql 参数配置,此处设置的参数是集群共享的 # 1. 有几个参数必须在此设置,并且本地配置文件和环境变量无法覆盖,启动时,patroni将其添加到命令行启动参数中 #wal_level、hot_standby、max_connections、max_wal_senders、max_prepared_transactions、 #max_locks_per_transaction、track_commit_timestamp、max_replication_slots、max_worker_processes、wal_log_hints # 2. 所有的参数都会转储到postgresql.conf中,并在postgresql.conf中设置一个include包含使用的基础配置文件 #(postgresql.base.conf或custom_conf中的配置) parameters: # wal_level决定多少信息写入到 WAL 中。默认值是replica,它会写入足够的数据以支持WAL归档和复制,包括在后备服务器上运行只读查询。 # minimal会去掉除从崩溃或者立即关机中进行恢复所需的信息之外的所有记录。 # 最后,logical会增加支持逻辑解码所需的信息。每个层次包括所有更低层次记录的信息。这个参数只能在服务器启动时设置。 wal_level: replica #从库恢复期间,是否能够连接并运行查询 hot_standby: "on" #最大链接数 max_connections: 2000 #为超级用户预留的链接数,防止用户链接达到max_connections superuser_reserved_connections: 10 # 设置可以同时处于prepared状态的事务的最大数目.如果你正在使用预备事务你将希望把max_prepared_transactions至少设置为max_connections一样大, # 因此每一个会话可以有一个预备事务待处理。 max_prepared_transactions: 2000 #同时运行 WAL 发送进程的最大数 max_wal_senders: 10 #指定服务器可以支持的复制槽最大数量。默认值为10。这个参数只能在服务器启动时设置。 #将它设置为一个比当前已有复制槽要少的值会阻碍服务器启动。此外,要允许使用复制槽, wal_level必须被设置为replica或 更高。 max_replication_slots: 10 #当这个参数为on时,PostgreSQL服务器一个检查点之后页面被第一次修改期间把该磁盘页面的整个内容都写入 WAL,即使对所谓的提示位做非关键修改也会这样做。 #如果启用了数据校验和,提示位更新总是会被 WAL 记录并且这个设置会被忽略。你可以使用这个 设置测试如果你的数据库启用了数据校验和,会有多少额外的 WAL 记录发生。 wal_log_hints: "on" # 记录事务的提交时间。 track_commit_timestamp: "off" archive_mode: "on" archive_timeout: 1800s #规定命令,"%p"表示将要归档的WAL文件的包含完整路径信息的文件名,用"%f"代表不包含路径信息的WAL文件的文件名。 archive_command: mkdir -p ../wal_archive & & test ! -f ../wal_archive/%f & & cp %p ../wal_archive/%f unix_socket_directories: \'.\' tcp_keepalives_idle: 60 tcp_keepalives_interval: 10 tcp_keepalives_count: 10 vacuum_cost_delay: 0 bgwriter_delay: 10ms bgwriter_lru_maxpages: 1000 bgwriter_lru_multiplier: 10.0 #IO很好的机器,不需要考虑平滑调度 bgwriter_flush_after: 256 #如果需要使用并行查询,设置为大于1 ,不建议超过 主机cores-2 max_parallel_workers_per_gather: 0 old_snapshot_threshold: -1 #IO很好的机器,不需要考虑平滑调度, 否则建议128~256kB backend_flush_after: 256 synchronous_commit: off # 支持原子写超过BLOCK_SIZE的块设备,在对齐后可以关闭。或者支持cow的文件系统可以关闭。 full_page_writes: on # min( 2047MB, shared_buffers/32 ): 512MB wal_buffers: 512MB wal_writer_delay: 10ms wal_writer_flush_after: 0 # 不建议频繁做检查点,否则XLOG会产生很多的FULL PAGE WRITE(when full_page_writes=on)。 checkpoint_timeout: 30min # 建议是SHARED BUFFER的2倍 max_wal_size: 16GB # max_wal_size/4 min_wal_size: 2GB # 硬盘好的情况下,可以让检查点快速结束,恢复时也可以快速达到一致状态。否则建议0.5~0.9 checkpoint_completion_target: 0.5 # IO很好的机器,不需要考虑平滑调度, 否则建议128~256kB checkpoint_flush_after: 256 random_page_cost: 1.3 parallel_tuple_cost: 0 parallel_setup_cost: 0 # 看着办,扣掉会话连接RSS,shared buffer, autovacuum worker, 剩下的都是OS可用的CACHE。 effective_cache_size: 4GB force_parallel_mode: off log_destination: \'csvlog\' # logging_collector 启用日志收集器,捕捉被发送到stderr的日志消息,并且将这些消息重定向到日志文件中。设置为off时,并不是不产生日志 logging_collector: on # 指定日志存放文件名称,当logging_collector开启,log_destination设置stderr时,日志写入log_filename设置的文件中,如果log_destination设置csvlog # 那么还会生成一个.csv的文件,文件名与log_filename设置一样,只是后缀名由.log改为.csv,并且在.log文件中记录csvlog重定向的位置。 log_filename: \'postgresql-%Y-%m-%d_%H%M%S.log\' #这个参数决定使用一个单个日志文件的最大时间量,之后将创立一个新的日志文件。 如果指定值时没有单位,则以分钟为单位。默认为24小时。 #将这个参数设置为零将禁用基于时间的新日志文件创建 log_rotation_age = 1440 #kb,文件多大后创建新的文件记录日志。0 表示禁扩展。 log_rotation_size = 10240 #当日志文件重名时,覆盖原有日志 log_truncate_on_rotation: on #控制哪些 SQL 语句被记录。有效值是 none (off)、ddl、mod和 all(所有语句)。ddl记录所有数据定义语句,例如CREATE、ALTER和 DROP语句。 #mod记录所有ddl语句,外加数据修改语句例如INSERT, UPDATE、DELETE、TRUNCATE, 和COPY FROM。 如果PREPARE、EXECUTE和 EXPLAIN ANALYZE包含合适类型的命令, #它们也会被记录。对于使用扩展查询协议的客户端,当收到一个执行消息时会产生日志并且会包括绑定参数的值(任何内嵌的单引号会被双写)。 log_statement = all #milliseconds,记录执行5秒及以上的语句,跟踪慢查询语句,单位为毫秒。如设置 5000,表示日志将记录执行5秒以上的SQL语句 log_min_duration_statement = 5000 log_checkpoints: on log_connections: on log_disconnections: on log_error_verbosity: verbose log_timezone: \'PRC\' vacuum_defer_cleanup_age: 0 # 建议关闭,以免备库长事务导致 主库无法回收垃圾而膨胀。 hot_standby_feedback: off max_standby_archive_delay: 300s max_standby_streaming_delay: 300s #------------------------------------AUTOVACUUM----------------------------------------------------# autovacuum: on log_autovacuum_min_duration: 0 # CPU核多,并且IO好的情况下,可多点,但是注意16*autovacuum mem,会消耗较多内存,所以内存也要有基础。 autovacuum_max_workers: 4 # 建议不要太高频率,否则会因为vacuum产生较多的XLOG。 autovacuum_naptime: 45s autovacuum_vacuum_scale_factor: 0.1 autovacuum_analyze_scale_factor: 0.1 autovacuum_freeze_max_age: 1600000000 autovacuum_multixact_freeze_max_age: 1600000000 vacuum_freeze_table_age: 1500000000 vacuum_multixact_freeze_table_age: 1500000000 datestyle: \'iso, mdy\' timezone: \'PRC\' lc_messages: \'C\' lc_monetary: \'C\' lc_numeric: \'C\' lc_time: \'C\' default_text_search_config: \'pg_catalog.english\' shared_preload_libraries: \'pg_stat_statements\' #------------------------------------RESOURCE USAGE MEMORY---------------------------------------------# #+V9.4 指定autovacuum_worker进程能使用的最大内存量。如果指定值时没有单位,则以千字节为单位。 #其默认值为 -1,表示转而使用 maintenance_work_mem的值。当运行在其他上下文环境中时, 这个设置对VACUUM的行为没有影响。 autovacuum_work_mem: -1 # 1/4 主机内存 shared_buffers: 2GB # min( 2G, (1/4 主机内存)/autovacuum_max_workers ) maintenance_work_mem: 512MB #+V9.4 dynamic_shared_memory_type: posix#hash_mem_multiplier+V13 # +V9.4 控制是否为主共享内存区域请求大页。有效值是try(默认)、on以及off。如果huge_pages被设置为try,则服务器将尝

    推荐阅读