关于PostgreSQL同步复制下主从切换时的数据丢失问题

背景MySQL 5.7对半同步复制做了一个增强,增加了一个rpl_semi_sync_master_wait_point参数控制master什么时候等待slave的应答。
默认是AFTER_SYNC,即事务提交时,master按照下面的顺序执行
1. InnoDB prepare
2. 写事务到binlog以及发送到slave
3. binlog刷盘
4. 等待slave的接受应答
5. InnoDB commit
6. 返回客户端

rpl_semi_sync_master_wait_point的另外一个取值是AFTER_COMMIT,这也是MySQL 5.6及之前版本的行为
1. InnoDB prepare
2. 写事务到binlog以及发送到slave
3. binlog刷盘
4. InnoDB commit
5. 等待slave的接受应答
6. 返回客户端

简言之,AFTER_SYNC和AFTER_COMMIT的区别就是交换了下4和5的顺序。
这么做主要可以回避MySQL5.6 及以前的版本中存在的这样一个问题:
在事务已经提交到存储引擎但还没有收到slave的应答的期间,其它会话可以看到这个更新,如果此时master宕机,
slave升级为新的master后,可能会丢失这个事务,那么对于那些已经看到过这个事务的修改的会话来说就是发生了“数据丢失”。
详细参考:http://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html

那么,对于PostgreSQL来说,有没有这个问题呢?

PostgreSQL的同步复制PostgreSQL的同步复制过程大体如下:
1. 写WAL记录到本地WAL文件和slave
2. 刷WAL记录到本地WAL文件
3. 更新CLOG
4. 等待slave的接受应答
5. 返回客户端

这和MySQL5.6或者是5.7的AFTER_COMMIT类似(参考函数RecordTransactionCommit()),所以也应该存在丢失数据的问题。那么事实是不是这样呢?

实验验证经过测试验证,确实存在前面担心的问题。测试过程如下:
1. 配置基于Pacemaker+Corosync的1主2从集群
步骤参考 http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster

2. 同时执行下面2个测试脚本,验证主从切换时有没有数据丢失
failover_test.py
循环更新一个字段,每次加1。发生故障后,重建连接,等连接成功后,检查新master中的数值和最后一次更新的数据是否一致。这个数值等于最后一次成功的更新或最后一次失败的更新中的任意一个都说明没有丢数据。
点击(此处)折叠或打开

  1. #!/usr/bin/python
  2. import psycopg2
  3. import time

  4. url="host=192.168.1.246 dbname=postgres user=postgres"
  5. conn = psycopg2.connect(url)
  6. conn.autocommit=True

  7. cur = conn.cursor()
  8. cur.execute("CREATE TABLE IF NOT EXISTS pgsql_ha_test(id serial PRIMARY KEY, num integer); ")
  9. cur.execute("truncate pgsql_ha_test; ")
  10. cur.execute("insert into pgsql_ha_test values(1,0); ")


  11. print "Update process had started,please kill the master..."
  12. i=0
  13. num=-1
  14. try:
  15. while True:
  16. i+=1
  17. cur.execute("UPDATE pgsql_ha_test set num = %s where id=1",(i,))
  18. num=i
  19. if i % 1000 == 0:
  20. print time.time()," current num=",num

  21. except psycopg2.Error as e:
  22. print time.time(),"The master has down, last num:",num
  23. print e.pgerror
  24. conn.close()

  25. time1= time.time()
  26. connect_success = False
  27. while connect_success == False:
  28. try:
  29. conn = psycopg2.connect(url)
  30. conn.autocommit=True
  31. connect_success=True
  32. except psycopg2.Error as e:
  33. pass

  34. time2= time.time()
  35. print time.time()," connect success after %f second"%(time2-time1)

  36. cur = conn.cursor()
  37. cur.execute("select num from pgsql_ha_test where id=1; ")
  38. newnum = cur.fetchone()[0]
  39. print "current num:",newnum

  40. if not (newnum==num or newnum==num+1):
  41. print "NG"
  42. exit(1)

  43. cur.close()
  44. conn.close()
  45. print "OK"

failover_test2.py
循环读取failover_test.py更新的值。发生故障后,重建连接,等连接成功后,比较新master中的数值和最后一次成功读到的值。新值大于或等于最后一次成功读到的值都说明没有丢数据。 点击(此处)折叠或打开
  1. #!/usr/bin/python
  2. import psycopg2
  3. import time

  4. url="host=192.168.1.246 dbname=postgres user=postgres"
  5. conn = psycopg2.connect(url)
  6. conn.autocommit=True

  7. cur = conn.cursor()

  8. print "Read process had started..."
  9. i=0
  10. num=-1
  11. try:
  12. while True:
  13. i+=1
  14. cur.execute("select num from pgsql_ha_test where id=1; ")
  15. num = cur.fetchone()[0]
  16. if i % 10000 == 0:
  17. print time.time()," current num=",num

  18. except psycopg2.Error as e:
  19. print time.time()," The master has down, last num=",num
  20. print e.pgerror
  21. conn.close()

  22. time1= time.time()
  23. connect_success = False
  24. while connect_success == False:
  25. try:
  26. conn = psycopg2.connect(url)
  27. conn.autocommit=True
  28. connect_success=True
  29. except psycopg2.Error as e:
  30. time.sleep(1)

  31. time2= time.time()
  32. print time.time()," connect success after %f second"%(time2-time1)

  33. cur = conn.cursor()
  34. cur.execute("select num from pgsql_ha_test where id=1; ")
  35. newnum = cur.fetchone()[0]
  36. print time.time()," current num:",newnum

  37. if newnumnum:
  38. print "NG: Data Lost!"
  39. exit(1)

  40. cur.close()
  41. conn.close()
  42. print "OK"

3. 杀掉master上的postgres进程,模拟故障触发主从切换
点击(此处)折叠或打开
  1. [root@node73 ~]# ps -ef|grep postgres
  2. root 1228 25775 0 14:09 pts/2 00:00:00 su - postgres
  3. postgres 1229 1228 0 14:09 pts/2 00:00:00 -bash
  4. postgres 3339 31673 17 14:10 ? 00:00:07 postgres: postgres postgres 192.168.1.245(41991) UPDATE waiting for 1/FE556260
  5. postgres 4105 31673 41 14:10 ? 00:00:11 postgres: postgres postgres 192.168.1.245(41994) idle
  6. root 5612 25754 0 14:11 pts/1 00:00:00 grep postgres
  7. postgres 9168 31673 0 13:49 ? 00:00:05 postgres: wal sender process postgres 192.168.1.248(53883) streaming 1/FE556260
  8. postgres 24589 31673 0 13:57 ? 00:00:03 postgres: wal sender process postgres 192.168.1.245(41904) streaming 1/FE5562E8
  9. postgres 31673 1 0 13:45 ? 00:00:00 /opt/PostgreSQL/9.4.5/bin/postgres -D /data/pgsql -c config_file=/data/pgsql//postgresql.conf -p 5432
  10. postgres 31694 31673 0 13:45 ? 00:00:00 postgres: logger process
  11. postgres 31706 31673 0 13:45 ? 00:00:00 postgres: checkpointer process
  12. postgres 31707 31673 0 13:45 ? 00:00:00 postgres: writer process
  13. postgres 31708 31673 0 13:45 ? 00:00:00 postgres: wal writer process
  14. postgres 31709 31673 0 13:45 ? 00:00:00 postgres: autovacuum launcher process
  15. postgres 31710 31673 0 13:45 ? 00:00:00 postgres: stats collector process
  16. [root@n73 ~]# kill -9 31673
  17. [root@node73 ~]# ps -ef|grep postgres
  18. root 1228 25775 0 14:09 pts/2 00:00:00 su - postgres
  19. postgres 1229 1228 0 14:09 pts/2 00:00:00 -bash
  20. postgres 4105 1 38 14:10 ? 00:00:18 postgres: postgres postgres 192.168.1.245(41994) idle
  21. root 7017 25754 0 14:11 pts/1 00:00:00 grep postgres
  22. postgres 31694 1 0 13:45 ? 00:00:00 postgres: logger process

kill -9 后可能会有postgres进程残留,这是PostgreSQL的老问题了,使用普通的kill就不会残留了。

4. 检查2个测试脚本的输出
failover_test.py脚本的测试jiego 从执行更新的会话来看,主从切换后没有丢失任何数据。
点击(此处)折叠或打开
  1. [postgres@node69 pgsql_ha]$ ./failover_test.py
  2. Update process had started,please kill the master...
  3. 1447049434.72current num= 1000
  4. 1447049437.25 current num= 2000
  5. 1447049439.38 current num= 3000
  6. 1447049441.58 current num= 4000
  7. 1447049443.71 current num= 5000
  8. 1447049445.76 current num= 6000
  9. 1447049447.7 current num= 7000
  10. 1447049449.69 current num= 8000
  11. 1447049451.42 current num= 9000
  12. 1447049453.41 current num= 10000
  13. 1447049455.31 current num= 11000
  14. 1447049457.32 current num= 12000
  15. 1447049459.04 current num= 13000
  16. 1447049461.02 current num= 14000
  17. 1447049463.05 current num= 15000
  18. 1447049464.82 current num= 16000
  19. 1447049466.71 current num= 17000
  20. 1447049469.49 current num= 18000
  21. 1447049471.5 current num= 19000
  22. 1447049473.68 current num= 20000
  23. 1447049475.87 current num= 21000
  24. 1447049478.19 Tcurrent num= 22000
  25. 1447049480.2 current num= 23000
  26. 1447049482.18 current num= 24000
  27. 1447049484.01 current num= 25000
  28. 1447049485.23 The master has down, last num: 25529
  29. server closed the connection unexpectedly
  30. This probably means the server terminated abnormally
  31. before or while processing the request.
  32. 1447049504.15 connect success after 18.918158 second
  33. current num: 25529
  34. OK

failover_test2.py的测试结果是NG的。即从旁路的会话来看,曾经看到过的一次数据更新消失了,即发生了数据丢失。
【关于PostgreSQL同步复制下主从切换时的数据丢失问题】 点击(此处)折叠或打开
  1. [postgres@node69 pgsql_ha]$ ./failover_test2.py
  2. Read process had started...
  3. 1447049449.34 current num= 7828
  4. 1447049453.46 current num= 10040
  5. 1447049457.62 current num= 12162
  6. 1447049461.85 current num= 14456
  7. 1447049465.87 current num= 16492
  8. 1447049469.8 current num= 18165
  9. 1447049473.84 current num= 20056
  10. 1447049477.83 current num= 21867
  11. 1447049481.9 current num= 23860
  12. 1447049485.89 current num= 25530
  13. 1447049501.8 The master has down, last num= 25530
  14. server closed the connection unexpectedly
  15. This probably means the server terminated abnormally
  16. before or while processing the request.
  17. 1447049501.81 connect success after 0.006506 second
  18. 1447049501.81 current num: 25529
  19. NG: Data Lost!

问题的危害关于这个问题的危害程度,要视情况而定了。
如果看到那次数据更新的会话,仅仅是看看,一点问题没有。如果它是利用看到的数据更新了其它表,那么也没事,因为这个更新操作必然会失败,它的WAL记录不会被传递到slave上。但是如果它把看到那个值缓存下来或者传递到其它数据系统上,那就有可能带来数据不一致了。
那么PostgreSQL为什么要这么实现呢?也许是为了尽快释放锁,也许是别的原因。之后准备发邮件到社区问问。
另外,这个问题发生的概率比较低,测试时以kill -9的方式杀进程出现这个问题的概率要高一些(可能和postgres进程残留有关),实际场景的数据库crash没这么高。

    推荐阅读