postgres-xl分布式数据库测试1 postgres-xl分布式数据库测试1

分布式架构 Postgresql-xl的官方主页在http://www.postgres-xl.org/。
Features:
Fully ACID
Open Source
Cluster-wide Consistency
Multi-tenant Security
PostgreSQL-based
Workloads:
OLAP with MPP Parallelism
Online Transaction Processing
Mixed
Operational Data Store
Key-value including JSON
Postgres-XL is an all-purpose fully ACID open source scale-out SQL database solution.

文章图片
image.png 测试环境 4台以下配置的虚拟机

CPU：4核
内存：8GB
硬盘：200GB
系统：CentOS7.6 minimal

主机规划

server0 192.168.80.100 gtm
server1 192.168.80.101 gtm-proxy1,coordinator1,datanode1
server2 192.168.80.102 gtm-proxy2,coordinator2,datanode2
server3 192.168.80.103 gtm-proxy3,coordinator3,datanode3

--测试对比机器

安装postgresql 10.5 官方版本
server99 192.168.80.99 postgresql

查看PGXL的节点信息

postgres=# select * from pgxc_node; node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |node_id -----------+-----------+-----------+-----------+----------------+------------------+------------- coord1| C|20004 | server1| f| f|1885696643 coord2| C|20005 | server2| f| f| -1197102633 coord3| C|20004 | server3| f| f|1638403545 datanode1 | D|20008 | server1| t| t|888802358 datanode2 | D|20009 | server2| f| f|-905831925 datanode3 | D|20008 | server3| f| f| -1894792127 (6 rows)

测试 1、创建postgresql的测试数据

postgres=# DROP TABLE IF EXISTS t_gender; postgres=# DROP TABLE IF EXISTS t_class; postgres=# DROP TABLE IF EXISTS t_course; postgres=# DROP TABLE IF EXISTS t_student; postgres=# DROP TABLE IF EXISTS t_student_course; postgres=# DROP TABLE IF EXISTS t_grades; # 创建表 postgres=# CREATE TABLE t_gender(gender_id int,gender_name varchar(20) not null) ; CREATE TABLE postgres=# CREATE TABLE t_class(class_id int,class_name varchar(20) not null,Headmaster varchar(20)) ; CREATE TABLE postgres=# CREATE TABLE t_course(course_id int primary key,course_name varchar(20) not null,sessions int not null,credit numeric(5,2)) ; CREATE TABLE postgres=# CREATE TABLE t_student(student_id int primary key,name varchar(20) not null,gender varchar(10),age int,class_id int, comment varchar(2000)) ; CREATE TABLE postgres=# CREATE TABLE t_student_course(id serial8,student_id int,course_id int) ; CREATE TABLE postgres=# CREATE TABLE t_grades(grade_id int,record numeric(5,2)) ; CREATE TABLE# 初始化数据 postgres=# INSERT INTO t_gender values(1,'男'),(2,'女'); INSERT 0 2 postgres=# INSERT INTO t_class(class_id,class_name,Headmaster) select n,'班级_'||n,'Headmaster_'||n from generate_series(1,500) n; INSERT 0 500 postgres=# INSERT INTO t_course(course_id,course_name,sessions,credit) select n,'课程_'||n,floor(random()*11+10),(case when random()<0.5 then 100 else 150 end ) from generate_series(1,500) n; INSERT 0 500 postgres=# INSERT INTO t_student(student_id,name,gender,age,class_id) select n,'学生_'||n,(case when random()<0.5 then 1 else 2 end ),floor(random()*4+10),floor(random()*500)+1 from generate_series(1,50000) n; INSERT 0 50000 postgres=# INSERT INTO t_student_course(student_id,course_id) select n/10,floor(random()*500)+1 from generate_series(1,500000) n; INSERT 0 500000 postgres=# INSERT INTO t_grades(grade_id,record) select sc.id,random()*c.credit from t_student_course sc,t_course c where c.course_id=sc.course_id; INSERT 0 500000查看统计信息 postgres=# select count(*) from t_gender; count ------- 2 (1 row)postgres=# select count(*) from t_class; count ------- 500 (1 row)postgres=# select count(*) from t_course; count ------- 500 (1 row)postgres=# select count(*) from t_student; count ------- 50000 (1 row)postgres=# select count(*) from t_student_course; select count(*) from t_grades; count -------- 500000 (1 row)postgres=# select count(*) from t_grades; count -------- 500000 (1 row)

导出测试数据

-bash-4.2$ pg_dump -a > /data/postgres.bak

创建pgxl测试表

DROP TABLE IF EXISTS t_gender; DROP TABLE IF EXISTS t_class; DROP TABLE IF EXISTS t_course; DROP TABLE IF EXISTS t_student; DROP TABLE IF EXISTS t_student_course; DROP TABLE IF EXISTS t_grades; CREATE TABLE t_gender(gender_id int,gender_name varchar(20) not null) DISTRIBUTE BY REPLICATION; COMMENT ON TABLE t_gender IS '性别表'; COMMENT ON COLUMN t_gender.gender_id IS '性别id'; COMMENT ON COLUMN t_gender.gender_name IS '性别'; CREATE TABLE t_class(class_id int,class_name varchar(20) not null,Headmaster varchar(20)) DISTRIBUTE BY REPLICATION; COMMENT ON TABLE t_class IS '班级表'; COMMENT ON COLUMN t_class.class_id IS '班级id'; COMMENT ON COLUMN t_class.class_name IS '班级名'; COMMENT ON COLUMN t_class.Headmaster IS '班主任'; CREATE TABLE t_course(course_id int primary key,course_name varchar(20) not null,sessions int not null,credit numeric(5,2)) DISTRIBUTE BY REPLICATION; COMMENT ON TABLE t_course IS '课程表'; COMMENT ON COLUMN t_course.course_id IS '课程id'; COMMENT ON COLUMN t_course.course_name IS '课程名'; COMMENT ON COLUMN t_course.sessions IS '课时'; COMMENT ON COLUMN t_course.credit IS '学分'; CREATE TABLE t_student(student_id int primary key,name varchar(20) not null,gender varchar(10),age int,class_id int, comment varchar(2000)) DISTRIBUTE BY MODULO(student_id); COMMENT ON TABLE t_student IS '学生信息表'; COMMENT ON COLUMN t_student.student_id IS '学生id'; COMMENT ON COLUMN t_student.name IS '学生姓名'; COMMENT ON COLUMN t_student.gender IS '性别'; COMMENT ON COLUMN t_student.age IS '年龄'; COMMENT ON COLUMN t_student.class_id IS '学生所在班级'; COMMENT ON COLUMN t_student.comment IS '备注'; CREATE TABLE t_student_course(id serial8,student_id int,course_id int) DISTRIBUTE BY HASH(id); COMMENT ON TABLE t_student IS '学生选修课程表'; COMMENT ON COLUMN t_student_course.student_id IS '学生id'; COMMENT ON COLUMN t_student_course.course_id IS '课程id'; CREATE TABLE t_grades(grade_id int,record numeric(5,2)) DISTRIBUTE BY ROUNDROBIN; COMMENT ON TABLE t_student IS '学生课程成绩表'; COMMENT ON COLUMN t_grades.record IS '成绩';

导入测试数据

[postgres@xCloud ~]$ psql -h server1 -p 20004 -d postgres < /data/postgres.bak SET SET SET SET SET set_config ------------ (1 row)SET SET SET COPY 500 COPY 500 COPY 2 COPY 500000 COPY 50000 COPY 500000 setval -------- 500000 (1 row)

查看表数据的存储分布

postgres=# SELECT xc_node_id, count(*) FROM t_gender GROUP BY xc_node_id; xc_node_id | count ------------+------- 888802358 |2 (1 row)postgres=# SELECT xc_node_id, count(*) FROM t_class GROUP BY xc_node_id; xc_node_id | count ------------+------- 888802358 |500 (1 row)postgres=# SELECT xc_node_id, count(*) FROM t_course GROUP BY xc_node_id; xc_node_id | count ------------+------- 888802358 |500 (1 row)postgres=# SELECT xc_node_id, count(*) FROM t_student GROUP BY xc_node_id; SELECT xc_node_id, count(*) FROM t_student_course GROUP BY xc_node_id; SELECT xc_node_id, count(*) FROM t_grades GROUP BY xc_node_id; xc_node_id| count -------------+------- -1894792127 | 16667 -905831925 | 16667 888802358 | 16666 (3 rows)postgres=# SELECT xc_node_id, count(*) FROM t_student_course GROUP BY xc_node_id; xc_node_id| count -------------+-------- -1894792127 | 166613 -905831925 | 167552 888802358 | 165835 (3 rows)postgres=# SELECT xc_node_id, count(*) FROM t_grades GROUP BY xc_node_id; xc_node_id| count -------------+-------- -1894792127 | 166666 -905831925 | 166667 888802358 | 166667 (3 rows)postgres=# execute direct on (datanode1) 'select count(*) from t_student_course'; count -------- 165835 (1 row)postgres=# execute direct on (datanode2) 'select count(*) from t_student_course'; execute direct on (datanode3) 'select count(*) from t_student_course'; count -------- 167552 (1 row)postgres=# execute direct on (datanode3) 'select count(*) from t_student_course'; count -------- 166613 (1 row)

分布式 join 测试

MODULO分片表 join 复制表

postgres=# select s.*,c.class_name from t_student s,t_class c where s.class_id=c.class_id; student_id |name| gender | age | class_id | comment | class_name ------------+------------+--------+-----+----------+---------+------------ 3 | 学生_3| 2|12 |421 || 班级_421 6 | 学生_6| 2|11 |460 || 班级_460 9 | 学生_9| 2|12 |93 || 班级_93 12 | 学生_12| 1|13 |157 || 班级_157 15 | 学生_15| 2|10 |372 || 班级_372 18 | 学生_18| 1|13 |362 || 班级_362 21 | 学生_21| 1|13 |216 || 班级_216 24 | 学生_24| 1|12 |154 || 班级_154 27 | 学生_27| 2|13 |191 || 班级_191 30 | 学生_30| 2|11 |164 || 班级_164 33 | 学生_33| 2|13 |308 || 班级_308 36 | 学生_36| 1|13 |473 || 班级_473 39 | 学生_39| 1|11 |159 || 班级_159 42 | 学生_42| 2|12 |289 || 班级_289 45 | 学生_45| 2|13 |345 || 班级_345 48 | 学生_48| 2|10 |110 || 班级_110 51 | 学生_51| 2|11 |85 || 班级_85 54 | 学生_54| 1|13 |45 || 班级_45 57 | 学生_57| 2|10 |342 || 班级_342 60 | 学生_60| 1|11 |498 || 班级_498 63 | 学生_63| 2|10 |232 || 班级_232 66 | 学生_66| 2|10 |431 || 班级_431 69 | 学生_69| 1|13 |443 || 班级_443 72 | 学生_72| 2|11 |469 || 班级_469 75 | 学生_75| 2|13 |431 || 班级_431 78 | 学生_78| 1|12 |29 || 班级_29 81 | 学生_81| 1|12 |485 || 班级_485 84 | 学生_84| 2|12 |351 || 班级_351 --More--postgres=# select count(*) from t_student s,t_class c where s.class_id=c.class_id; count ------- 50000 (1 row)postgres=# execute direct on (datanode1) 'select count(*) from t_student s,t_class c where s.class_id=c.class_id'; count ------- 16666 (1 row)postgres=# execute direct on (datanode2) 'select count(*) from t_student s,t_class c where s.class_id=c.class_id'; execute direct on (datanode3) 'select count(*) from t_student s,t_class c where s.class_id=c.class_id'; count ------- 16667 (1 row)postgres=# execute direct on (datanode3) 'select count(*) from t_student s,t_class c where s.class_id=c.class_id'; count ------- 16667 (1 row)

MODULO 分片表 join HASH 分片表

postgres=# select s.xc_node_id s_xc_node_id,sc.xc_node_id sc_xc_node_id,s.*,sc.course_id from t_student s,t_student_course sc where s.student_id=sc.student_id; s_xc_node_id | sc_xc_node_id | student_id |name| gender | age | class_id | comment | course_id --------------+---------------+------------+------------+--------+-----+----------+---------+----------- 888802358 |888802358 |3 | 学生_3| 2|12 |421 ||396 888802358 |888802358 |3 | 学生_3| 2|12 |421 ||414 888802358 |888802358 |3 | 学生_3| 2|12 |421 ||486 888802358 |888802358 |3 | 学生_3| 2|12 |421 ||12 888802358 |888802358 |3 | 学生_3| 2|12 |421 ||435 888802358 |888802358 |3 | 学生_3| 2|12 |421 ||461 888802358 |888802358 |3 | 学生_3| 2|12 |421 ||39 888802358 |888802358 |3 | 学生_3| 2|12 |421 ||62 888802358 |888802358 |6 | 学生_6| 2|11 |460 ||437 888802358 |888802358 |6 | 学生_6| 2|11 |460 ||423 888802358 |888802358 |6 | 学生_6| 2|11 |460 ||383 888802358 |888802358 |6 | 学生_6| 2|11 |460 ||196 888802358 |888802358 |9 | 学生_9| 2|12 |93 ||334 888802358 |888802358 |9 | 学生_9| 2|12 |93 ||256 888802358 |888802358 |12 | 学生_12| 1|13 |157 ||92 888802358 |888802358 |12 | 学生_12| 1|13 |157 ||222 888802358 |888802358 |12 | 学生_12| 1|13 |157 ||425 888802358 |888802358 |12 | 学生_12| 1|13 |157 ||199 888802358 |888802358 |12 | 学生_12| 1|13 |157 ||408 888802358 |888802358 |12 | 学生_12| 1|13 |157 ||181 888802358 |888802358 |12 | 学生_12| 1|13 |157 ||245 888802358 |888802358 |15 | 学生_15| 2|10 |372 ||404 888802358 |888802358 |15 | 学生_15| 2|10 |372 ||9 888802358 |888802358 |18 | 学生_18| 1|13 |362 ||386 888802358 |888802358 |18 | 学生_18| 1|13 |362 ||120 888802358 |888802358 |21 | 学生_21| 1|13 |216 ||183 888802358 |888802358 |24 | 学生_24| 1|12 |154 ||21 888802358 |888802358 |24 | 学生_24| 1|12 |154 ||97 --More--postgres=# select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id; count -------- 499991 (1 row)

这里不足50W了，用postgresql 10.5检查数据，也是这个结果，检查数据发现造的数据有以下9条有问题的student_id设置错误，pgxl的执行结果没有错误。

postgres=# select sc.* from t_student_course sc left join t_student s on s.student_id=sc.student_id where s.student_id is null; id | student_id | course_id ----+------------+----------- 3 |0 |79 5 |0 |476 2 |0 |90 6 |0 |27 7 |0 |127 1 |0 |443 4 |0 |22 8 |0 |311 9 |0 |484 (9 rows)

统计下面查询语句造成的同库join的记录数
select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id

# datanode1 的同库join的记录数 postgres=# execute direct on (datanode1) 'select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id '; count ------- 55166 (1 row)# datanode2 的同库join的记录数 postgres=# execute direct on (datanode2) 'select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id '; count ------- 55599 (1 row)# datanode3 的同库join的记录数 postgres=# execute direct on (datanode3) 'select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id '; count ------- 55725 (1 row)postgres=# select 55166 + 55599 +55725; ?column? ---------- 166490 (1 row)

统计下面查询语句造成的跨库join的记录数
select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id

postgres=# select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id and s.xc_node_id != sc.xc_node_id; count -------- 333501 (1 row)postgres=# select 166490 + 333501; ?column? ---------- 499991 (1 row)

分析跨分片执行耗时
select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id;

#在postgresql中执行分析 postgres=# EXPLAIN ANALYZE select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id; QUERY PLAN----------------------------------------------------------------------------------------------------------------------------- ------------------------------- Finalize Aggregate(cost=9622.08..9622.09 rows=1 width=8) (actual time=136.351..136.351 rows=1 loops=1) ->Gather(cost=9621.96..9622.07 rows=1 width=8) (actual time=136.048..138.046 rows=2 loops=1) Workers Planned: 1 Workers Launched: 1 ->Partial Aggregate(cost=8621.96..8621.97 rows=1 width=8) (actual time=130.480..130.480 rows=1 loops=2) ->Hash Join(cost=1493.00..7909.30 rows=285064 width=0) (actual time=24.067..111.177 rows=249996 loops=2) Hash Cond: (sc.student_id = s.student_id) ->Parallel Seq Scan on t_student_course sc(cost=0.00..5644.18 rows=294118 width=4) (actual time=0.02 4..30.479 rows=250000 loops=2) ->Hash(cost=868.00..868.00 rows=50000 width=4) (actual time=23.554..23.555 rows=50000 loops=2) Buckets: 65536Batches: 1Memory Usage: 2270kB ->Seq Scan on t_student s(cost=0.00..868.00 rows=50000 width=4) (actual time=0.032..11.347 row s=50000 loops=2) Planning time: 0.462 ms Execution time: 138.164 ms (13 rows)#在pgxl中执行分析 postgres=# EXPLAIN ANALYZE select count(*) from t_student s,t_student_course sc where s.student_id=sc.student_id; QUERY PLAN----------------------------------------------------------------------------------------------------------------------------- -------------------------- Finalize Aggregate(cost=148.95..148.96 rows=1 width=8) (actual time=194.225..194.225 rows=1 loops=1) ->Remote Subquery Scan on all (datanode1,datanode2,datanode3)(cost=148.93..148.95 rows=1 width=8) (actual time=194.192 ..194.206 rows=3 loops=1) Planning time: 0.211 ms Execution time: 198.193 ms (4 rows)

检查跨库join的结果是否一致
方法：用聚合函数，从多个维度检查班级id为(31,92,163,194,235,296,337,368,439,490)的班级总分、平均分、最高分、最低分...。对比postgresql和pgxl的结果是否一致。

统计各班级人数

#在postgresql中执行查询 postgres=# select class_id,count(*) from t_student s where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id | count ----------+------- 31 |95 92 |113 163 |111 194 |113 235 |111 296 |101 337 |111 368 |102 439 |99 490 |97 (10 rows)#在pgxl中执行查询 postgres=# select class_id,count(*) from t_student s where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id | count ----------+------- 31 |95 92 |113 163 |111 194 |113 235 |111 296 |101 337 |111 368 |102 439 |99 490 |97 (10 rows)

统计各班级选修课程人次

#在postgresql中执行查询 postgres=# select class_id,count(*) from t_student s left join t_student_course sc on sc.student_id=s.student_id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id | count ----------+------- 31 |950 92 |1130 163 |1110 194 |1130 235 |1110 296 |1010 337 |1110 368 |1020 439 |990 490 |970 (10 rows)#在pgxl中执行查询 postgres=# select class_id,count(*) from t_student s left join t_student_course sc on sc.student_id=s.student_id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id | count ----------+------- 31 |950 92 |1130 163 |1110 194 |1130 235 |1110 296 |1010 337 |1110 368 |1020 439 |990 490 |970 (10 rows)

统计各班级选修课程总分、最高分、最低分、平均分

#在postgresql中执行查询 postgres=# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 60040.40 | 149.58 | 0.02 | 63.2004210526315789 92 | 72492.28 | 149.74 | 0.28 | 64.1524601769911504 163 | 69587.54 | 149.64 | 0.00 | 62.6914774774774775 194 | 72458.70 | 149.93 | 0.03 | 64.1227433628318584 235 | 70926.73 | 149.94 | 0.05 | 63.8979549549549550 296 | 62764.50 | 149.76 | 0.02 | 62.1430693069306931 337 | 68505.75 | 149.41 | 0.24 | 61.7168918918918919 368 | 65079.73 | 149.85 | 0.24 | 63.8036568627450980 439 | 61362.66 | 149.76 | 0.04 | 61.9824848484848485 490 | 60821.36 | 149.91 | 0.24 | 62.7024329896907216 (10 rows)#在pgxl中执行查询 postgres=# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 60040.40 | 149.58 | 0.02 | 63.2004210526315789 92 | 72492.28 | 149.74 | 0.28 | 64.1524601769911504 163 | 69587.54 | 149.64 | 0.00 | 62.6914774774774775 194 | 72458.70 | 149.93 | 0.03 | 64.1227433628318584 235 | 70926.73 | 149.94 | 0.05 | 63.8979549549549550 296 | 62764.50 | 149.76 | 0.02 | 62.1430693069306931 337 | 68505.75 | 149.41 | 0.24 | 61.7168918918918919 368 | 65079.73 | 149.85 | 0.24 | 63.8036568627450980 439 | 61362.66 | 149.76 | 0.04 | 61.9824848484848485 490 | 60821.36 | 149.91 | 0.24 | 62.7024329896907216 (10 rows)

分析以上查询的耗时

#在postgresql中执行分析 postgres=# EXPLAIN ANALYZEselect class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; QUERY PLAN----------------------------------------------------------------------------------------------------------------------------- ----------------------------- Sort(cost=20469.61..20470.70 rows=436 width=136) (actual time=222.341..222.341 rows=10 loops=1) Sort Key: s.class_id Sort Method: quicksortMemory: 25kB ->HashAggregate(cost=20443.96..20450.50 rows=436 width=136) (actual time=222.318..222.326 rows=10 loops=1) Group Key: s.class_id ->Hash Right Join(cost=10644.40..20320.87 rows=9847 width=10) (actual time=141.592..218.740 rows=10530 loops=1) Hash Cond: (g.grade_id = sc.id) ->Seq Scan on t_grades g(cost=0.00..7703.00 rows=500000 width=10) (actual time=0.009..32.495 rows=500000 l oops=1) ->Hash(cost=10521.31..10521.31 rows=9847 width=12) (actual time=141.383..141.383 rows=10530 loops=1) Buckets: 16384Batches: 1Memory Usage: 622kB ->Hash Right Join(cost=1505.70..10521.31 rows=9847 width=12) (actual time=22.026..139.204 rows=10530 loops=1) Hash Cond: (sc.student_id = s.student_id) ->Seq Scan on t_student_course sc(cost=0.00..7703.00 rows=500000 width=12) (actual time=0.016. .53.253 rows=500000 loops=1) ->Hash(cost=1493.00..1493.00 rows=1016 width=8) (actual time=21.722..21.722 rows=1053 loops=1) Buckets: 2048 (originally 1024)Batches: 1 (originally 1)Memory Usage: 58kB ->Seq Scan on t_student s(cost=0.00..1493.00 rows=1016 width=8) (actual time=0.062..21.3 27 rows=1053 loops=1) Filter: (class_id = ANY ('{31,92,163,194,235,296,337,368,439,490}'::integer[])) Rows Removed by Filter: 48947 Planning time: 0.707 ms Execution time: 222.461 ms (20 rows)#在pgxl中执行分析 postgres=# EXPLAIN ANALYZEselect class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; QUERY PLAN----------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------- Sort(cost=120.83..120.86 rows=10 width=136) (actual time=587.154..587.155 rows=10 loops=1) Sort Key: s.class_id Sort Method: quicksortMemory: 25kB ->HashAggregate(cost=120.52..120.67 rows=10 width=136) (actual time=587.134..587.142 rows=10 loops=1) Group Key: s.class_id ->Hash Right Join(cost=47.43..109.88 rows=851 width=16) (actual time=228.328..582.249 rows=10530 loops=1) Hash Cond: (g.grade_id = sc.id) ->Remote Subquery Scan on all (datanode1,datanode2,datanode3)(cost=100.00..167.35 rows=1850 width=16) (act ual time=0.330..114.087 rows=500000 loops=1) ->Hash(cost=147.85..147.85 rows=92 width=12) (actual time=227.794..227.794 rows=10530 loops=1) Buckets: 16384 (originally 1024)Batches: 1 (originally 1)Memory Usage: 622kB ->Remote Subquery Scan on all (datanode1,datanode2,datanode3)(cost=112.83..147.85 rows=92 width=12) (actual time=30.138..224.211 rows=10530 loops=1) Planning time: 0.335 ms Execution time: 591.233 ms (13 rows)

小结：在pgxl做分布式跨库join 查询，统计分析，postgresql库的一致。
PS：目前5台虚拟机配置相同，但是放在同一个服务器上，虚拟机之间会有CPU、I/O影响，性能对比只能作为稍微参考，不能认真。O(∩_∩)O哈哈~
分布式事务测试

事务回滚

测试脚本如下：

/* 事务开始 */ begin; /* 更新前的数据 */ select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; /* 更新操作 */ update t_student set class_id=class_id+1; update t_grades set record=record+5 where grade_id between 20000 and 50000; /* 更新后的数据（未提交） */ select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; /* 事务回滚 */ rollback; /* 回滚后的数据 */ select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id;

在pgxl中的执行过程

postgres=# /* 事务开始 */ postgres-# begin; BEGIN postgres=# postgres=# /* 更新前的数据 */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 60040.40 | 149.58 | 0.02 | 63.2004210526315789 92 | 72492.28 | 149.74 | 0.28 | 64.1524601769911504 163 | 69587.54 | 149.64 | 0.00 | 62.6914774774774775 194 | 72458.70 | 149.93 | 0.03 | 64.1227433628318584 235 | 70926.73 | 149.94 | 0.05 | 63.8979549549549550 296 | 62764.50 | 149.76 | 0.02 | 62.1430693069306931 337 | 68505.75 | 149.41 | 0.24 | 61.7168918918918919 368 | 65079.73 | 149.85 | 0.24 | 63.8036568627450980 439 | 61362.66 | 149.76 | 0.04 | 61.9824848484848485 490 | 60821.36 | 149.91 | 0.24 | 62.7024329896907216 (10 rows)postgres=# postgres=# /* 更新操作 */ postgres-# update t_student set class_id=class_id+1; UPDATE 50000 postgres=# update t_grades set record=record+5 where grade_id between 20000 and 50000; UPDATE 30001 postgres=# postgres=# /* 更新后的数据（未提交） */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 58952.04 | 154.56 | 0.50 | 64.7824615384615385 92 | 68456.71 | 153.52 | 0.16 | 63.9782336448598131 163 | 67983.32 | 149.90 | 0.20 | 62.9475185185185185 194 | 62714.90 | 151.45 | 0.00 | 64.6545360824742268 235 | 73089.31 | 149.80 | 0.06 | 64.6808053097345133 296 | 63664.81 | 151.19 | 0.21 | 65.6338247422680412 337 | 59510.79 | 149.64 | 0.21 | 62.6429368421052632 368 | 54241.03 | 149.83 | 0.18 | 63.0709651162790698 439 | 58996.90 | 152.02 | 0.08 | 60.2009183673469388 490 | 73640.25 | 154.75 | 0.17 | 62.9403846153846154 (10 rows)postgres=# postgres=# /* 事务回滚 */ postgres-# rollback; ROLLBACK postgres=# postgres=# /* 回滚后的数据 */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 60040.40 | 149.58 | 0.02 | 63.2004210526315789 92 | 72492.28 | 149.74 | 0.28 | 64.1524601769911504 163 | 69587.54 | 149.64 | 0.00 | 62.6914774774774775 194 | 72458.70 | 149.93 | 0.03 | 64.1227433628318584 235 | 70926.73 | 149.94 | 0.05 | 63.8979549549549550 296 | 62764.50 | 149.76 | 0.02 | 62.1430693069306931 337 | 68505.75 | 149.41 | 0.24 | 61.7168918918918919 368 | 65079.73 | 149.85 | 0.24 | 63.8036568627450980 439 | 61362.66 | 149.76 | 0.04 | 61.9824848484848485 490 | 60821.36 | 149.91 | 0.24 | 62.7024329896907216 (10 rows)

对比【更新操作回滚后的数据】与【更新前的数据】，数据保持一致，事务回滚操作成功。

事务提交

测试脚本如下：

/* 事务开始 */ begin; /* 更新前的数据 */ select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; /* 更新操作 */ update t_student set class_id=class_id+1; update t_grades set record=record+5 where grade_id between 20000 and 50000; /* 更新后的数据（未提交） */ select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; /* 事务提交 */ commit; /* 提交后的数据 */ select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id;

在pgxl中的执行过程

/* 事务开始 */ postgres-# begin; BEGIN/* 更新前的数据 */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 60040.40 | 149.58 | 0.02 | 63.2004210526315789 92 | 72492.28 | 149.74 | 0.28 | 64.1524601769911504 163 | 69587.54 | 149.64 | 0.00 | 62.6914774774774775 194 | 72458.70 | 149.93 | 0.03 | 64.1227433628318584 235 | 70926.73 | 149.94 | 0.05 | 63.8979549549549550 296 | 62764.50 | 149.76 | 0.02 | 62.1430693069306931 337 | 68505.75 | 149.41 | 0.24 | 61.7168918918918919 368 | 65079.73 | 149.85 | 0.24 | 63.8036568627450980 439 | 61362.66 | 149.76 | 0.04 | 61.9824848484848485 490 | 60821.36 | 149.91 | 0.24 | 62.7024329896907216 (10 rows)/* 更新操作 */ postgres-# update t_student set class_id=class_id+1; UPDATE 50000 postgres=# update t_grades set record=record+5 where grade_id between 20000 and 50000; UPDATE 30001 /* 更新后的数据（未提交） */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 58952.04 | 154.56 | 0.50 | 64.7824615384615385 92 | 68456.71 | 153.52 | 0.16 | 63.9782336448598131 163 | 67983.32 | 149.90 | 0.20 | 62.9475185185185185 194 | 62714.90 | 151.45 | 0.00 | 64.6545360824742268 235 | 73089.31 | 149.80 | 0.06 | 64.6808053097345133 296 | 63664.81 | 151.19 | 0.21 | 65.6338247422680412 337 | 59510.79 | 149.64 | 0.21 | 62.6429368421052632 368 | 54241.03 | 149.83 | 0.18 | 63.0709651162790698 439 | 58996.90 | 152.02 | 0.08 | 60.2009183673469388 490 | 73640.25 | 154.75 | 0.17 | 62.9403846153846154 (10 rows)/* 事务提交 */ postgres-# commit; COMMIT/* 提交后的数据 */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 58952.04 | 154.56 | 0.50 | 64.7824615384615385 92 | 68456.71 | 153.52 | 0.16 | 63.9782336448598131 163 | 67983.32 | 149.90 | 0.20 | 62.9475185185185185 194 | 62714.90 | 151.45 | 0.00 | 64.6545360824742268 235 | 73089.31 | 149.80 | 0.06 | 64.6808053097345133 296 | 63664.81 | 151.19 | 0.21 | 65.6338247422680412 337 | 59510.79 | 149.64 | 0.21 | 62.6429368421052632 368 | 54241.03 | 149.83 | 0.18 | 63.0709651162790698 439 | 58996.90 | 152.02 | 0.08 | 60.2009183673469388 490 | 73640.25 | 154.75 | 0.17 | 62.9403846153846154 (10 rows)

在数据相同的postgresql库中执行过程

/* 事务开始 */ postgres-# begin; BEGIN /* 更新前的数据 */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 60040.40 | 149.58 | 0.02 | 63.2004210526315789 92 | 72492.28 | 149.74 | 0.28 | 64.1524601769911504 163 | 69587.54 | 149.64 | 0.00 | 62.6914774774774775 194 | 72458.70 | 149.93 | 0.03 | 64.1227433628318584 235 | 70926.73 | 149.94 | 0.05 | 63.8979549549549550 296 | 62764.50 | 149.76 | 0.02 | 62.1430693069306931 337 | 68505.75 | 149.41 | 0.24 | 61.7168918918918919 368 | 65079.73 | 149.85 | 0.24 | 63.8036568627450980 439 | 61362.66 | 149.76 | 0.04 | 61.9824848484848485 490 | 60821.36 | 149.91 | 0.24 | 62.7024329896907216 (10 rows) /* 更新操作 */ postgres-# update t_student set class_id=class_id+1; UPDATE 50000 postgres=# update t_grades set record=record+5 where grade_id between 20000 and 50000; UPDATE 30001 /* 更新后的数据（未提交） */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 58952.04 | 154.56 | 0.50 | 64.7824615384615385 92 | 68456.71 | 153.52 | 0.16 | 63.9782336448598131 163 | 67983.32 | 149.90 | 0.20 | 62.9475185185185185 194 | 62714.90 | 151.45 | 0.00 | 64.6545360824742268 235 | 73089.31 | 149.80 | 0.06 | 64.6808053097345133 296 | 63664.81 | 151.19 | 0.21 | 65.6338247422680412 337 | 59510.79 | 149.64 | 0.21 | 62.6429368421052632 368 | 54241.03 | 149.83 | 0.18 | 63.0709651162790698 439 | 58996.90 | 152.02 | 0.08 | 60.2009183673469388 490 | 73640.25 | 154.75 | 0.17 | 62.9403846153846154 (10 rows) /* 事务提交 */ postgres-# commit; COMMIT/* 提交后的数据 */ postgres-# select class_id,sum(record),max(record),min(record),avg(record) from t_student s left join t_student_course sc on sc.student_id=s.student_id left join t_grades g on g.grade_id=sc.id where class_id in (31,92,163,194,235,296,337,368,439,490) group by s.class_id order by s.class_id; class_id |sum|max| min|avg ----------+----------+--------+------+--------------------- 31 | 58952.04 | 154.56 | 0.50 | 64.7824615384615385 92 | 68456.71 | 153.52 | 0.16 | 63.9782336448598131 163 | 67983.32 | 149.90 | 0.20 | 62.9475185185185185 194 | 62714.90 | 151.45 | 0.00 | 64.6545360824742268 235 | 73089.31 | 149.80 | 0.06 | 64.6808053097345133 296 | 63664.81 | 151.19 | 0.21 | 65.6338247422680412 337 | 59510.79 | 149.64 | 0.21 | 62.6429368421052632 368 | 54241.03 | 149.83 | 0.18 | 63.0709651162790698 439 | 58996.90 | 152.02 | 0.08 | 60.2009183673469388 490 | 73640.25 | 154.75 | 0.17 | 62.9403846153846154 (10 rows)

对比 pgxl 与 postgresql的执行过程中输出的数据一致，分布式事务提交操作成功。

事务间更新冲突
操作过程：

文章图片
image.png

执行脚本如下：

begin; select * from t_student where student_id=20000; update t_student set class_id=class_id+1 where student_id=20000; select * from t_student where student_id=20000; commit; select * from t_student where student_id=20000;

postgresql执行以上过程中，当记录1 被 sessionB 更新但未提交时，此时 sessionB 为记录1 加上了write lock，sessionA可以读取记录A的更新前状态，如果执行修改操作，则会等待 sessionB 提交或者回滚后，才会继续操作。
下面看看pgxl的执行情况：
sessionA 连接 coordinator1
sessionB 连接 coordinator3

psql -h server1 -p 20004 psql -h server3 -p 20004

postgres(A)=# begin; BEGINpostgres(B)=# begin; BEGINpostgres(A)=# select * from t_student where student_id=20000; student_id |name| gender | age | class_id | comment ------------+------------+--------+-----+----------+--------- 20000 | 学生_20000 | 2|12 |433 | (1 row)postgres(B)=# select * from t_student where student_id=20000; student_id |name| gender | age | class_id | comment ------------+------------+--------+-----+----------+--------- 20000 | 学生_20000 | 2|12 |433 | (1 row)postgres(B)=# update t_student set class_id=class_id+1 where student_id=20000; UPDATE 1postgres(A)=# update t_student set class_id=class_id+1 where student_id=20000; /* 此时，记录已经被sessionB锁住，这里会卡住，无法继续 */postgres(B)=# select * from t_student where student_id=20000; student_id |name| gender | age | class_id | comment ------------+------------+--------+-----+----------+--------- 20000 | 学生_20000 | 2|12 |434 | (1 row)postgres(B)=# commit; COMMIT/* 此时，sessionB提交事务后，记录锁被释放，sessionA继续更新操作 */ UPDATE 1postgres(A)=# select * from t_student where student_id=20000; student_id |name| gender | age | class_id | comment ------------+------------+--------+-----+----------+--------- 20000 | 学生_20000 | 2|12 |435 | (1 row)postgres(A)=# commit; COMMITpostgres(A)=# select * from t_student where student_id=20000; student_id |name| gender | age | class_id | comment ------------+------------+--------+-----+----------+--------- 20000 | 学生_20000 | 2|12 |435 | (1 row)postgres(B)=# select * from t_student where student_id=20000; student_id |name| gender | age | class_id | comment ------------+------------+--------+-----+----------+--------- 20000 | 学生_20000 | 2|12 |435 | (1 row)

再来一种交叉等待的情况，看看会发生什么。。。
案例：

文章图片
image.png

sessionA执行脚本

/* 事务开始 */ begin; select * from t_student where student_id in (20001,20002); update t_student set class_id=class_id+1 where student_id=20001; update t_student set class_id=class_id+1 where student_id=20002; select * from t_student where student_id in (20001,20002); commit; select * from t_student where student_id in (20001,20002);

sessionB执行脚本

/* 事务开始 */ begin; select * from t_student where student_id in (20002,20001); update t_student set class_id=class_id+1 where student_id=20002; update t_student set class_id=class_id+1 where student_id=20001; select * from t_student where student_id in (20002,20001); commit; select * from t_student where student_id in (20002,20001);

执行过程：

postgres(A)=# begin; BEGINpostgres(B)=# begin; BEGINpostgres(A)=# select * from t_student where student_id in (20001,20002); student_id |name| gender | age | class_id | comment ------------+------------+--------+-----+----------+--------- 20001 | 学生_20001 | 2|11 |187 | 20002 | 学生_20002 | 2|10 |490 | (2 rows)postgres(B)=# select * from t_student where student_id in (20002,20001); student_id |name| gender | age | class_id | comment ------------+------------+--------+-----+----------+--------- 20001 | 学生_20001 | 2|11 |187 | 20002 | 学生_20002 | 2|10 |490 | (2 rows)postgres(B)=# update t_student set class_id=class_id+1 where student_id=20002; UPDATE 1postgres(A)=# update t_student set class_id=class_id+1 where student_id=20001; UPDATE 1postgres(B)=# update t_student set class_id=class_id+1 where student_id=20001; /* 记录1已经被 sessionA 锁住，这里等待 sessionA释放锁 */postgres(A)=# update t_student set class_id=class_id+1 where student_id=20002; /* 记录2已经被 sessionB 锁住，这里等待 sessionB释放锁 */

【postgres-xl分布式数据库测试1】(⊙o⊙)…，两个session死锁了，没法玩了，，，，
只能杀死进程解锁了

# 连接 coordinator3 [postgres@xCloud ~]$ psql -h server3 -p 20004 psql (PGXL 10r1, based on PG 10.5 (Postgres-XL 10r1)) Type "help" for help.postgres=# SELECT * FROM pg_stat_activity WHERE datname='postgres'; datid | datname|pid| usesysid | usename| application_name |client_addr| client_hostname | client_port |backend_start|xact_start|query_start|state_change| wait_event _type | wait_event | state| backend_xid | backend_xmin |query|backend_type -------+----------+-------+----------+----------+------------------+----------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------- ------+------------+--------+-------------+--------------+------------------------------------------------------------------+---------------- 13816 | postgres | 14382 |10 | postgres | psql| 192.168.80.100 ||50654 | 2018-12-05 16:28:30.336241+08 | 2018-12-05 16:49:40.099149+08 | 2018-12-05 16:50:38.460588+08 | 2018-12-05 16:52:39.134195+08 | || active |512345 |512345 | update t_student set class_id=class_id+1 where student_id=20001; | client backend 13816 | postgres | 14456 |10 | postgres | psql| 192.168.80.100 ||50656 | 2018-12-05 17:00:18.290574+08 | 2018-12-05 17:14:09.960765+08 | 2018-12-05 17:14:09.960765+08 | 2018-12-05 17:14:09.96077+08| || active ||512345 | SELECT * FROM pg_stat_activity WHERE datname='postgres'; | client backend (2 rows)

(⊙o⊙)…，只有coordinator3上面的连接才会在这里显示，看来，pgxl没有做pg_stat_activity的分布式查询。O(∩_∩)O哈哈~

# 连接 coordinator1 [postgres@xCloud ~]$ psql -h server1 -p 20004 psql (PGXL 10r1, based on PG 10.5 (Postgres-XL 10r1)) Type "help" for help.postgres=# SELECT * FROM pg_stat_activity WHERE datname='postgres'; datid | datname|pid| usesysid | usename| application_name |client_addr| client_hostname | client_port |backend_start|xact_start|query_start|state_change| wait_event _type | wait_event | state| backend_xid | backend_xmin |query|backend_type -------+----------+-------+----------+----------+------------------+----------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------- ------+------------+--------+-------------+--------------+------------------------------------------------------------------+---------------- 13816 | postgres | 14476 |10 | postgres | psql| 192.168.80.100 ||48434 | 2018-12-05 16:28:25.056003+08 | 2018-12-05 16:49:36.343582+08 | 2018-12-05 16:50:20.411759+08 | 2018-12-05 16:52:48.564985+08 | || active |512346 |512345 | update t_student set class_id=class_id+1 where student_id=20002; | client backend 13816 | postgres | 14576 |10 | postgres | psql| 192.168.80.100 ||48440 | 2018-12-05 17:03:56.34494+08| 2018-12-05 17:13:49.328275+08 | 2018-12-05 17:13:49.328275+08 | 2018-12-05 17:13:49.32828+08| || active ||512345 | SELECT * FROM pg_stat_activity WHERE datname='postgres'; | client backend (2 rows)

这两个方法的区别在于：

pg_cancel_backend 只是取消当前某一个进程的查询操作，但不能释放数据库连接
pg_terminate_backend 可以在pg的后台杀死这个进程，从而释放出宝贵的连接资源

试试用coordinator3 杀 coordinator1 中的sessionA（pid：14476）

postgres=# select pg_cancel_backend(14476 ); WARNING:PID 14476 is not a PostgreSQL server process pg_cancel_backend ------------------- f (1 row)postgres=# select pg_cancel_backend(14382); pg_cancel_backend ------------------- t (1 row)

失败了，每个coordinator 只能杀自己的session了.
这次杀掉sessionB的后台操作，所以sessionB要先rollback才能做其他操作了。。。