【StoneDB技术解析】验证相关数据包是否需要解压缩

在StoneDB中,数据包分为以下几类:

  1. 不相关的数据包:不满足查询条件的数据包。
  2. 相关的数据包:满足查询条件的数据包。
  3. 可疑的数据包:数据包中的数据部分满足查询条件,需要进一步解压缩数据包才能得到满足条件的数据行。
通过对数据包的划分,知识网格技术过滤掉不相关的数据包,读取相关的数据包和可疑的数据包。其中相关的数据包不需要解压缩,只读取元数据,不会发生IO,可疑的数据包需要解压缩,会发生IO。
1)创建表t_user CREATE TABLE t_user( id INT NOT NULL AUTO_INCREMENT, first_name VARCHAR(10) NOT NULL, last_name VARCHAR(10) NOT NULL, sex VARCHAR(5) NOT NULL, score INT NOT NULL, copy_id INT NOT NULL, PRIMARY KEY (`id`), key idx_lastname(last_name) ) engine=STONEDB; 2)创建存储过程 DELIMITER // create PROCEDURE add_user(in num INT) BEGIN DECLARE rowid INT DEFAULT 0; DECLARE firstname CHAR(1); DECLARE name1 CHAR(1); DECLARE name2 CHAR(1); DECLARE lastname VARCHAR(3) DEFAULT ''; DECLARE sex CHAR(1); DECLARE score CHAR(2); WHILE rowid < num DO SET firstname = SUBSTRING('赵钱孙李周吴郑王林杨柳刘孙陈江阮侯邹高彭徐',FLOOR(1+21*RAND()),1); SET name1 = SUBSTRING('一二三四五六七八九十甲乙丙丁静景京晶名明铭敏闵民军君俊骏天田甜兲恬益依成城诚立莉力黎励',ROUND(1+43*RAND()),1); SET name2 = SUBSTRING('一二三四五六七八九十甲乙丙丁静景京晶名明铭敏闵民军君俊骏天田甜兲恬益依成城诚立莉力黎励',ROUND(1+43*RAND()),1); SET sex=FLOOR(0 + (RAND() * 2)); SET score= FLOOR(40 + (RAND() *60)); SET rowid = rowid + 1; IF ROUND(RAND())=0 THEN SET lastname =name1; END IF; IF ROUND(RAND())=1 THEN SET lastname = CONCAT(name1,name2); END IF; insert INTO t_user(first_name,last_name,sex,score,copy_id) VALUES (firstname,lastname,sex,score,rowid); END WHILE; END // DELIMITER ; 3)插入数据 call add_user(10000000); 4)创建表t_user_innodb create table t_user_innodb like t_user; insert into t_user_innodb select * from t_user; alter table t_user_innodb engine=innodb;

1)验证读取相关数据包 【【StoneDB技术解析】验证相关数据包是否需要解压缩】SQL的语义逻辑是对字段 first_name 进行分组统计,在StoneDB中,元数据信息记录在元数据节点,如果能通过元数据节点读取到元数据,就不需要解压缩数据包,不发生IO。
在InnoDB中,表的统计信息记录在mysql.innodb_table_stats,优化器根据表和索引的统计信息,生成一个最优的执行计划,然后执行SQL。分别在InnoDB与StoneDB执行,通过SQL profile观察读取IO的情况。
注:为规避缓存的影响,每组测试前重启数据库实例。
InnoDB
mysql> set profiling=on; Query OK, 0 rows affected, 1 warning (0.00 sec)mysql> select first_name,count(*) from t_user_innodb group by first_name; +------------+----------+ | first_name | count(*) | +------------+----------+ | 侯|476424 | | 刘|475764 | | 吴|475979 | | 周|475891 | | 孙|950444 | | 彭|476632 | | 徐|476219 | | 李|475521 | | 杨|476026 | | 林|477289 | | 柳|476250 | | 江|476623 | | 王|475119 | | 赵|476529 | | 邹|476852 | | 郑|476379 | | 钱|476829 | | 阮|476336 | | 陈|476746 | | 高|476148 | +------------+----------+ 20 rows in set (8.62 sec)mysql> show profiles; +----------+------------+-------------------------------------------------------------------+ | Query_ID | Duration| Query| +----------+------------+-------------------------------------------------------------------+ |1 | 8.61591075 | select first_name,count(*) from t_user_innodb group by first_name | +----------+------------+-------------------------------------------------------------------+ 1 row in set, 1 warning (0.00 sec)mysql> show profile cpu,block io for query 1; +----------------------+----------+----------+------------+--------------+---------------+ | Status| Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | +----------------------+----------+----------+------------+--------------+---------------+ | starting| 0.000149 | 0.000059 |0.000083 |0 |0 | | checking permissions | 0.000027 | 0.000011 |0.000015 |0 |0 | | Opening tables| 0.048181 | 0.003919 |0.007952 |608 |0 | | init| 0.000036 | 0.000014 |0.000021 |0 |0 | | System lock| 0.000022 | 0.000009 |0.000013 |0 |0 | | optimizing| 0.000017 | 0.000007 |0.000010 |0 |0 | | statistics| 0.000029 | 0.000012 |0.000016 |0 |0 | | preparing| 0.000022 | 0.000009 |0.000013 |0 |0 | | Creating tmp table| 0.000045 | 0.000019 |0.000027 |0 |0 | | Sorting result| 0.000016 | 0.000007 |0.000009 |0 |0 | | executing| 0.000014 | 0.000005 |0.000008 |0 |0 | | Sending data| 8.566974 | 6.905969 |0.772964 |873888 |0 | | Creating sort index| 0.000144 | 0.000164 |0.000037 |64 |0 | | end| 0.000014 | 0.000012 |0.000003 |32 |0 | | query end| 0.000028 | 0.000038 |0.000009 |0 |0 | | removing tmp table| 0.000019 | 0.000015 |0.000003 |0 |0 | | query end| 0.000012 | 0.000010 |0.000002 |0 |0 | | closing tables| 0.000031 | 0.000025 |0.000006 |0 |0 | | freeing items| 0.000032 | 0.000027 |0.000006 |0 |0 | | logging slow query| 0.000067 | 0.000054 |0.000012 |0 |8 | | cleaning up| 0.000035 | 0.000028 |0.000006 |0 |0 | +----------------------+----------+----------+------------+--------------+---------------+ 21 rows in set, 1 warning (0.00 sec)

从SQL profile可知,SQL在InnoDB执行的过程中,发生IO的阶段有Opening tables、Sending data、Creating sort index、end,其中Opening tables是每张表第一次加载都会经历的,可排除讨论。重点关注Sending data部分,它表示在执行器的任意阶段,通常是存储引擎层与Server层的IO交互过程。
StoneDB
mysql> set profiling=on; Query OK, 0 rows affected, 1 warning (0.00 sec)mysql> select first_name,count(*) from t_user group by first_name; +------------+----------+ | first_name | count(*) | +------------+----------+ | 赵|476529 | | 徐|476219 | | 王|475119 | | 阮|476336 | | 柳|476250 | | 侯|476424 | | 孙|950444 | | 郑|476379 | | 高|476148 | | 林|477289 | | 邹|476852 | | 彭|476632 | | 李|475521 | | 吴|475979 | | 刘|475764 | | 钱|476829 | | 周|475891 | | 杨|476026 | | 陈|476746 | | 江|476623 | +------------+----------+ 20 rows in set (0.59 sec)mysql> show profiles; +----------+------------+------------------------------------------------------------+ | Query_ID | Duration| Query| +----------+------------+------------------------------------------------------------+ |1 | 0.59069975 | select first_name,count(*) from t_user group by first_name | +----------+------------+------------------------------------------------------------+ 1 row in set, 1 warning (0.00 sec)mysql> show profile cpu,block io for query 1; +----------------------+----------+----------+------------+--------------+---------------+ | Status| Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | +----------------------+----------+----------+------------+--------------+---------------+ | starting| 0.000160 | 0.000066 |0.000089 |0 |0 | | checking permissions | 0.000027 | 0.000011 |0.000015 |0 |0 | | Opening tables| 0.011405 | 0.003718 |0.007688 |0 |240 | | System lock| 0.000385 | 0.000163 |0.000222 |0 |0 | | init| 0.000050 | 0.000021 |0.000028 |0 |0 | | optimizing| 0.000143 | 0.000061 |0.000082 |0 |0 | | update multi-index| 0.000052 | 0.000022 |0.000030 |0 |0 | | aggregation| 0.578315 | 2.639504 |0.981471 |0 |8 | | query end| 0.000069 | 0.000043 |0.000026 |0 |0 | | closing tables| 0.000035 | 0.000021 |0.000013 |0 |0 | | freeing items| 0.000034 | 0.000021 |0.000013 |0 |0 | | cleaning up| 0.000027 | 0.000017 |0.000010 |0 |0 | +----------------------+----------+----------+------------+--------------+---------------+ 12 rows in set, 1 warning (0.00 sec)

从SQL profile可知,SQL在StoneDB执行的过程中,只在Opening tables阶段发生IO。其它阶段没有发生IO,说明相关数据包是不需要解压缩的,通过元数据得到。
2)验证读取可疑数据包 SQL的语义逻辑是查询一行数据,StoneDB可以通过知识网格技术过滤掉不相关的数据包,由于只返回一行数据,最终只能找到可疑的数据包,然后解压缩可疑的数据包,最终得到这一行数据。InnoDB还是根据统计信息生成一个最优的执行计划去执行SQL。
InnoDB
mysql> set profiling=on; Query OK, 0 rows affected, 1 warning (0.00 sec)mysql> select count(*) from t_user_innodb where first_name='柳' and copy_id=9968888; +----------+ | count(*) | +----------+ |1 | +----------+ 1 row in set (3.20 sec)mysql> show profile cpu,block io for query 1; +----------------------+----------+----------+------------+--------------+---------------+ | Status| Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | +----------------------+----------+----------+------------+--------------+---------------+ | starting| 0.000170 | 0.000072 |0.000092 |0 |0 | | checking permissions | 0.000030 | 0.000013 |0.000016 |0 |0 | | Opening tables| 0.024121 | 0.004351 |0.008638 |800 |0 | | init| 0.000049 | 0.000021 |0.000027 |0 |0 | | System lock| 0.000019 | 0.000008 |0.000011 |0 |0 | | optimizing| 0.000022 | 0.000010 |0.000012 |0 |0 | | statistics| 0.000030 | 0.000013 |0.000016 |0 |0 | | preparing| 0.000026 | 0.000012 |0.000015 |0 |0 | | executing| 0.000013 | 0.000005 |0.000007 |0 |0 | | Sending data| 3.169882 | 2.755171 |0.389367 |534176 |0 | | end| 0.000069 | 0.000050 |0.000018 |0 |0 | | query end| 0.000029 | 0.000022 |0.000008 |0 |0 | | closing tables| 0.000031 | 0.000023 |0.000009 |0 |0 | | freeing items| 0.000035 | 0.000025 |0.000009 |0 |0 | | cleaning up| 0.000038 | 0.000028 |0.000010 |0 |0 | +----------------------+----------+----------+------------+--------------+---------------+ 15 rows in set, 1 warning (0.00 sec)

StoneDB
mysql> set profiling=on; Query OK, 0 rows affected, 1 warning (0.00 sec)mysql> select count(*) from t_user where first_name='柳' and copy_id=9968888; +----------+ | count(*) | +----------+ |1 | +----------+ 1 row in set (0.01 sec)mysql> show profile cpu,block io for query 1; +----------------------+----------+----------+------------+--------------+---------------+ | Status| Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | +----------------------+----------+----------+------------+--------------+---------------+ | starting| 0.000173 | 0.000081 |0.000086 |0 |0 | | checking permissions | 0.000026 | 0.000013 |0.000013 |0 |0 | | Opening tables| 0.010228 | 0.009385 |0.000843 |0 |240 | | System lock| 0.000232 | 0.000113 |0.000119 |0 |0 | | init| 0.000045 | 0.000021 |0.000022 |0 |0 | | optimizing| 0.000144 | 0.000071 |0.000074 |0 |0 | | update multi-index| 0.003694 | 0.002027 |0.006428 |0 |0 | | aggregation| 0.000191 | 0.000093 |0.000098 |0 |16 | | query end| 0.000020 | 0.000010 |0.000010 |0 |0 | | closing tables| 0.000029 | 0.000014 |0.000015 |0 |0 | | freeing items| 0.000033 | 0.000016 |0.000017 |0 |0 | | cleaning up| 0.000027 | 0.000013 |0.000013 |0 |0 | +----------------------+----------+----------+------------+--------------+---------------+ 12 rows in set, 1 warning (0.00 sec)

从SQL profile可知,SQL在StoneDB执行的过程中,在aggregation阶段发生IO。
综上所述:
  1. 知识网格技术过滤出相关的数据包后,只需要读取元数据,不再解压缩数据包;
  2. 知识网格技术过滤出可疑的数据包后,需要解压缩数据包。

    推荐阅读