Linux|MySQL-MHA数据库高可用

MySQL-MHA数据库高可用 简介 MHA介绍

  • MHA:Master High Availability,目前在MySQL高可用方面是一个相对成熟的解决方案
  • 由日本DeNA公司youshimaton(现就职于Facebook公司)开发
  • 主要实现故障切换和主从提升
  • 在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在较大程度上保证数据的一致性,以达到真正意义上的高可用。
工作原理
Linux|MySQL-MHA数据库高可用
文章图片

  • 当master出现故障时,通过对比slave之间I/O线程读取master上binlog的位置,选取最接近的slave做为最新的slave(latest slave)。
  • 其它slave通过与latest slave对比生成差异中继日志,并应用。
  • 在latest slave上应用从master保存的binlog,同时将latest slave提升为master。
  • 最后在其它slave上应用相应的差异中继日志并开始从新的master开始复制.
组件介绍
MHA Manager(管理节点)
  • MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上
  • MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明
  • Manager工具
工具 说明
masterha_check_ssh 检查MHA的SSH配置
masterha_check_repl 检查MySQL复制
masterha_manager 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 监测master是否宕机
masterha_master_switch 控制故障转移(自动或手动)
masterha_conf_host 添加或删除配置的server信息
MHA Node(数据节点)
  • MHA Node运行在每台MySQL服务器上
  • 拷贝并保存master的二进制日志
  • 从拥有最新数据的slave上生成差异中继日志,应用差异中继日志
  • 在不停止SQL线程的情况下,清除中继日志
  • Node工具(系统调用,不需手动触发)
工具 说明
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并应用于其它slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
MHA架构搭建 说明
主机名 IP地址 server-id 功能 备注
mha 192.168.226.10 管理节点
master 192.168.226.20 20 数据节点
slave1 192.168.226.30 30 数据节点
slave2 192.168.226.40 40 数据节点
环境准备
[root@server1 ~]# hostnamectl set-hostname mha [root@server1 ~]# su [root@server2 ~]# hostnamectl set-hostname master [root@server2 ~]# su [root@server3 ~]# hostnamectl set-hostname slave1 [root@server3 ~]# su [root@server4 ~]# hostnamectl set-hostname slave2 [root@server4 ~]# susystemctl stop NetworkManager systemctl disable NetworkManagersystemctl stop firewalld systemctl disable firewalldsetenforce 0 sed -i '/SELINUX=enforcing/cSELINUX=disabled' /etc/selinux/configcat >> /etc/hosts <

搭建MySQL主从复制
  • 在master上配置
[root@master ~]# vim mysql.sh #!/bin/bash#Desc='此脚本用于安装MySQL软件' #Usage=source mysql.shyum install -y libaio rm -rf /usr/local/mysql mv mysql-5.7.31-linux-glibc2.12-x86_64 /usr/local/mysql useradd -r -s /sbin/nologin mysql rm -rf /etc/my.cnf cd /usr/local/mysql mkdir mysql-files chown mysql:mysql mysql-files chmod 750 mysql-files bin/mysqld --initialize --user=mysql --basedir=/usr/local/mysql &> /root/password.txt bin/mysql_ssl_rsa_setup --datadir=/usr/local/mysql/data cp support-files/mysql.server /etc/init.d/mysqld service mysqld start echo 'export PATH=$PATH:/usr/local/mysql/bin' >> /etc/profile source /etc/profile [root@master ~]# source mysql.sh[root@master mysql]# cat /root/password.txt |grep password 2021-09-30T12:46:56.088480Z 1 [Note] A temporary password is generated for root@localhost: z(Avfsdgd1/3 [root@master mysql]# mysql_secure_installation[root@master mysql]# cd /usr/local/mysql/ [root@master mysql]# vim my.cnf [mysqld] basedir=/usr/local/mysql datadir=/usr/local/mysql/data socket=/tmp/mysql.sock port=3306 log-error=/usr/local/mysql/data/mysql.err log-bin=/usr/local/mysql/data/binlog server-id=20 character_set_server=utf8mb4 gtid-mode=on log-slave-updates=1 enforce-gtid-consistency[root@master mysql]# service mysqld start

  • 在slave1和slave2上配置(以slave1为例)
[root@slave1 ~]# vim mysql.sh #!/bin/bash#Desc='此脚本用于slave上安装MySQL软件' #Usage=source mysql.shyum install -y libaio rm -rf /usr/local/mysql mv mysql-5.7.31-linux-glibc2.12-x86_64 /usr/local/mysql useradd -r -s /sbin/nologin mysql rm -rf /etc/my.cnf cd /usr/local/mysql mkdir mysql-files chown mysql:mysql mysql-files chmod 750 mysql-files #bin/mysqld --initialize --user=mysql --basedir=/usr/local/mysql &> /root/password.txt #bin/mysql_ssl_rsa_setup --datadir=/usr/local/mysql/data cp support-files/mysql.server /etc/init.d/mysqld #service mysqld start echo 'export PATH=$PATH:/usr/local/mysql/bin' >> /etc/profile source /etc/profile [root@slave1 ~]# source mysql.sh[root@slave1 mysql]# cd /usr/local/mysql/ [root@slave1 mysql]# vim my.cnf [mysqld] basedir=/usr/local/mysql datadir=/usr/local/mysql/data socket=/tmp/mysql.sock port=3306 log-error=/usr/local/mysql/data/mysql.err #log-bin=/usr/local/mysql/data/binlog relay-log=/usr/local/mysql/data/relaylog server-id=30 character_set_server=utf8mb4 gtid-mode=on log-slave-updates=1 enforce-gtid-consistency skip-slave-start log-bin=/usr/local/mysql/data/binlog在master上操作 [root@master mysql]# service mysqld stop [root@master mysql]#rm -rf /usr/local/mysql/data/auto.cnf [root@master mysql]# rsync -av /usr/local/mysql/data root@192.168.226.30:/usr/local/mysql/service mysqld start

  • 配置主从(基于GTIDs)
在master中操作 [root@master mysql]# mysql -uroot -p123 mysql> create user 'slave'@'192.168.226.%' identified by '123'; mysql> grant replication slave on *.* to 'slave'@'192.168.226.%'; mysql> flush privileges; 创建一个mha账号,用于后期MHA监控主从同步状态 mysql> create user 'mha'@'192.168.226.%' identified by '123'; mysql> grant all privileges on *.* to 'mha'@'192.168.226.%'; mysql> flush privileges; 在slave中操作 [root@slave1 mysql]# mysql -uroot -p123 mysql> change master to master_host='192.168.226.20', master_user='slave', master_password='123', master_port=3306, master_auto_position=1; mysql> start slave; mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.226.20 Master_User: slave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000003 Read_Master_Log_Pos: 1387 Relay_Log_File: relaylog.000002 Relay_Log_Pos: 1594 Relay_Master_Log_File: binlog.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes

安装MHA
  • 安装(在所有节点安装mha-node软件包,在mha管理端再安装mha-manager软件包)
rz #传入提前下载的mha4mysql-node-0.57-0.el7.noarch.rpm yum install -y mha4mysql-node-0.57-0.el7.noarch.rpm [root@mha mha]# rz #传入提前下载的mha4mysql-manager-0.57-0.el7.noarch.rpm [root@mha mha]# yum install -y mha4mysql-manager-0.57-0.el7.noarch.rpm

  • 配置ssh免密登录(所有主机互信)
创建admin账号 useradd admin echo 123|passwd --stdin admin su - admin [admin@mha ~]$ ssh-keygen [admin@mha ~]$ for i in 20 30 40; do ssh-copy-id admin@192.168.226.$i; done [admin@master ~]$ ssh-keygen [admin@master ~]$ for i in 10 30 40; do ssh-copy-id admin@192.168.226.$i; done [admin@slave1 ~]$ ssh-keygen [admin@slave1 ~]$ for i in 10 20 40; do ssh-copy-id admin@192.168.226.$i; done [admin@slave2 ~]$ ssh-keygen [admin@slave2 ~]$ for i in 10 20 30; do ssh-copy-id admin@192.168.226.$i; done

上述操作有些麻烦,可直接拷贝.ssh目录(公钥和私钥),就能实现互信的目的
[admin@mha ~]$ ssh-keygen [admin@mha ~]$ cd .ssh/ [admin@mha .ssh]$ mv id_rsa.pub authorized_keys [admin@mha .ssh]$ for i in 10 20 30; do scp -r ../.ssh/ 192.168.226.$i:~/; done

  • 配置admin账号的sudo权限(master上)
[admin@master ~]$ su - root [root@master ~]# vim /etc/sudoers.d/admin User_Alias MYSQL_USERS = admin Runas_Alias MYSQL_RUNAS = root Cmnd_Alias MYSQL_CMNDS = /sbin/ifconfig,/sbin/arping MYSQL_USERS ALL = (MYSQL_RUNAS) NOPASSWD: MYSQL_CMNDS #选项说明: User_Alias 表示具有sudo权限的用户列表; Host_Alias表示主机的列表 Runas_Alias 表示用户以什么身份登录 Cmnd_Alias表示允许执行的命令列表(命令需要使用完整路径)把该权限分发给从服务器,当故障发生时,从服务器也可以自己设置VIP [root@master ~]# for i in 30 40; do scp /etc/sudoers.d/admin 192.168.226.$i:/etc/sudoers.d/; done测试master上的admin是否可以挂载VIP [root@master ~]# su - admin [admin@master ~]$ sudo /sbin/ifconfig ens33:1 192.168.226.100 broadcast 192.168.226.255 netmask 255.255.255.0 [admin@master ~]$ ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:60:de:7a brd ff:ff:ff:ff:ff:ff inet 192.168.226.20/24 brd 192.168.226.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever inet 192.168.226.100/24 brd 192.168.226.255 scope global secondary ens33:1 valid_lft forever preferred_lft forever inet6 fe80::1b78:bfb3:4567:b45c/64 scope link noprefixroute valid_lft forever preferred_lft forever

  • 创建mha相关配置文件
创建工作目录 [admin@mha ~]$ su - root [root@mha ~]# mkdir /etc/mha [root@mha ~]# mkdir -p /data/mha/masterha/app1 [root@mha ~]# chown -R admin.admin /data/mha编写配置文件 [root@mha ~]# vim /etc/mha/app1.conf[server default]# 设置监控用户和密码 user=mha password=123# 设置复制环境中的复制用户和密码 repl_user=slave repl_password=123# 设置ssh的登录用户名 ssh_user=admin# 设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行failover ping_interval=3# 设置mha的工作目录 manager_workdir=/data/mha/masterha/app1# 设置mysql master保存binlog的目录,以便MHA可以找到master的二进制日志 master_binlog_dir=/usr/local/mysql/data# 设置master的pid文件 master_pid_file=/usr/local/mysql/data/master.pid# 设置mysql master在发生切换时保存binlog的目录(在mysql master上创建这个目录) remote_workdir=/data/mysql/mha# 设置mha日志文件 manager_log=/data/mha/masterha/app1/app1-3306.log# MHA到master的监控之间出现问题,MHA Manager将会尝试从slave1和slave2登录到master上 secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.226.30 -s 192.168.226.40 --user=admin --port=22 --master_host=192.168.226.20 --master_port=3306# 设置自动failover时候的切换脚本(故障发生时,自动挂载VIP到SLAVE1或SLAVE2) master_ip_failover_script="/etc/mha/master_ip_failover.sh 192.168.226.100 1"# 设置手动切换时候的切换脚本 #master_ip_online_change_script="/etc/mha/master_ip_online_change.sh 192.168.226.100 1" # 设置故障发生后关闭故障主机脚本 # shutdown_script="/etc/mha/power_manager"[server1] hostname=192.168.226.20 port= 3306 candidate_master=1 [server2] hostname=192.168.226.30 port= 3306 candidate_master=1 [server3] hostname=192.168.226.40 port= 3306 candidate_master=1上传脚本 [root@mha mha]# rz #传入准备好的脚本 [root@mha mha]# ls app1.confmaster_ip_failover.sh [root@mha mha]# vim master_ip_failover.sh 注意:脚本内容中要修改网卡和连接用户为 my $interface = 'ens33'; 网卡名(38行) my $ssh_user = "admin"; 用户名(110行) [root@mha mha]# chmod +x /etc/mha/master_ip_failover.sh

  • 检查
检查ssh互信 [root@mha mha]# su - admin [admin@mha ~]$ masterha_check_ssh --conf=/etc/mha/app1.conf Fri Oct1 00:05:11 2021 - [info] All SSH connection tests passed successfully.检测MySQL主从集群状态 [admin@mha ~]$masterha_check_repl --conf=/etc/mha/app1.conf Fri Oct1 00:46:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln265] Checking slave sta on 192.168.226.20(192.168.226.20:3306). err=Got error when executing SHOW SLAVE STATUS. Access denied; at least one of) the SUPER, REPLICATION CLIENT privilege(s) for this operationMySQL Replication Health is NOT OK!

报错: Checking slave sta on 192.168.226.20(192.168.226.20:3306). err=Got error when executing SHOW SLAVE STATUS. Access denied; at least one of) the SUPER, REPLICATION
【Linux|MySQL-MHA数据库高可用】原因:经排查知,给mha授权时出现了问题
解决:grant all privileges on . to ‘mha’@‘192.168.226.%’;
[admin@mha ~]$ masterha_check_repl --conf=/etc/mha/app1.conf MySQL Replication Health is OK.检查mha状态 [admin@mha ~]$ masterha_check_status --conf=/etc/mha/app1.conf app1 is stopped(2:NOT_RUNNING). #将mha在后台启动 [admin@mha ~]$ nohup masterha_manager --conf=/etc/mha/app1.conf --remove_dead_master_conf --ignore_last_failover & [admin@mha ~]$ jobs [1]+运行中nohup masterha_manager --conf=/etc/mha/app1.conf --remove_dead_master_conf --ignore_last_failover & [admin@mha ~]$ masterha_check_status --conf=/etc/mha/app1.conf app1 (pid:12758) is running(0:PING_OK), master:192.168.226.20

自动Failover测试
在master上安装测试工具 [admin@master ~]$ su - root [root@master ~]# yum -y install sysbench插入测试数据 [root@master ~]# mysql -uroot -p123 mysql> create database test charset utf8mb4; mysql> grant all on *.* to 'mha'@'localhost' identified by '123'; mysql> flush privileges; mysql> quit [root@master ~]# sysbench /usr/share/sysbench/oltp_read_only.lua \ --mysql-host=192.168.226.20--mysql-port=3306--mysql-user=mha \ --mysql-password=123--mysql-socket=/tmp/mysql.sock \ --mysql-db=test--db-driver=mysql--tables=1 \ --table-size=100000 --report-interval=10 --threads=128 --time=120 preparemysql> use test; mysql> select count(*) from sbtest1; +----------+ | count(*) | +----------+ |100000 | +----------+在mha上监控日志 [root@mha ~]# tail -f /data/mha/masterha/app1/app1-3306.log模拟故障 [root@master ~]# service mysqld stopmha上的监控显示 #####VIP漂移 Enabling the VIP - 192.168.226.100 on the new master - 192.168.226.30 我们信任您已经从系统管理员那里了解了日常注意事项。 总结起来无外乎这三点:#1) 尊重别人的隐私。 #2) 输入前要先考虑(后果和风险)。 #3) 权力越大,责任越大。sudo: 没有终端存在,且未指定 askpass 程序 =======sudo /sbin/ifconfig ens33:1 192.168.226.100 broadcastnetmask&& sudo /sbin/arping -f -q -c 5 -w 5 -I ens33 -s 192.168.226.100-U =================#####故障切换 Started automated(non-interactive) failover. Invalidated master IP address on 192.168.226.20(192.168.226.20:3306) Selected 192.168.226.30(192.168.226.30:3306) as a new master. 192.168.226.30(192.168.226.30:3306): OK: Applying all logs succeeded. 192.168.226.30(192.168.226.30:3306): OK: Activated master IP address. 192.168.226.40(192.168.226.40:3306): OK: Slave started, replicating from 192.168.226.30(192.168.226.30:3306) 192.168.226.30(192.168.226.30:3306): Resetting slave info succeeded. Master failover to 192.168.226.30(192.168.226.30:3306) completed successfully. #自动选出192.168.226.30作为主服务器#查看192.168.226.40(slave2)Master_Host: 192.168.226.30 mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.226.30 Master_User: slave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000002 Read_Master_Log_Pos: 19092701 Relay_Log_File: relaylog.000002 Relay_Log_Pos: 405 Relay_Master_Log_File: binlog.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes

    推荐阅读