OpenMLDB(一文了解带参数查询语句(paramterized query statement))

厌伴老儒烹瓠叶,强随举子踏槐花。这篇文章主要讲述OpenMLDB:一文了解带参数查询语句(paramterized query statement)相关的知识,希望能为你提供帮助。
背景

In database management systems (DBMS), a prepared statement or parameterized statement is a feature used to execute the same or similar database statements repeatedly with high efficiency. Typically used with SQL statements such as queries or updates, the prepared statement takes the form of a template into which certain constant values are substituted during each execution. (??https://en.wikipedia.org/wiki/Prepared_statement??)
在数据库系统中,带参数的语句(parameterized statement),一方面,能够提供预编译的能力,以达到高效执行语句、提高性能的目的。另一方面,能够预防SQL注入攻击,安全性更好。以上两点是传统的数据库系统使用支持带参数语句的主要原因。
从数据库系统角度看,??OpenMLDB?? 支持Parameterized query statement能进一步完善数据库查询能力。从业务角度上看,它使得OpenMLDB能够在规则引擎场景下,支持规则特征计算。
场景示例:规则引擎特征计算
SELECT
SUM(trans_amount) as F_TRANS_AMOUNT_SUM,
COUNT(user) as F_TRANS_COUNT,
MAX(trans_amount) as F_TRANS_AMOUNT_MAX,
MIN(trans_amount) as F_TRANS_AMOUNT_MIN,
FROM t1 where user = \'ABC123456789\' and trans_time between 1590115420000 and 1592707420000;

在示例中,我们计算了用户`ABC123456789` 从`2020-05-22 02:43:40` 到 `2020-06-20 07:43:40`这段期间的交易总额,交易次数,最大交易金额,最小交易金额。这些特征将传递可给下游的组件(规则引擎)使用。
在实际场景中,不可能针对每个用户写一段SQL查询代码。因此,需要一个规则特征计算的模版,而用户,时间区间则是动态变化的。
最简单的方式,就是写一段类似下面程序,把用户名,时间区间作为变量拼接到一段SQL语句中。
String query = "SELECT "+
"SUM(trans_amount) as F_TRANS_AMOUNT_SUM, "+
"COUNT(user) as F_TRANS_COUNT,"+
"MAX(trans_amount) as F_TRANS_AMOUNT_MAX,"+
"MIN(trans_amount) as F_TRANS_AMOUNT_MIN,"+
"FROM t1 where user = \'"+ user +"\' and trans_time between "
+ System.currentTimestamp()-30*86400000+ " and " + System.currentTimestamp();

executor.execute(query);

这种实现方法比较直接,但查询性能将很差,并且可能有SQL注入的风险。更为推荐的方式,是使用带参数查询(Parameterized query)
PreparedStatement stmt = conn.prepareStatement("SELECT "+
"SUM(trans_amount) as F_TRANS_AMOUNT_SUM, "+
"COUNT(user) as F_TRANS_COUNT,"+
"MAX(trans_amount) as F_TRANS_AMOUNT_MAX,"+
"MIN(trans_amount) as F_TRANS_AMOUNT_MIN,"+
"FROM t1 where user = ? and trans_time between ? and ? ");

stmt.setString(1, user);
stmt.setTimestamp(2, System.currentTimestamp()-30*86400000);
stmt.setTimestamp(3, System.currentTimestamp())
ResultSet rs = stmt.executeQuery();
rs.next();

实现细节在OpenMLDB中,支持一个新的语法功能,通常需要依次完成语法解析、计划生成和优化、表达式Codegen、执行查询等步骤。必要时,还需要考虑在客户端新增或者重构相关接口。`Paramteried Query`的支持基本就涵盖的对上述几个模块的修改和开发,因此,了解相关实现细节有助于大家快速了解OpenMLDB的开发,特别是OpenMLDB Engine的开发。
下图是执行带参数查询流程示意图。
  1. 用户在应用程序`javaApplication`中s使用JDBC(PrepraredStatement)来执行带参数查询。
  2. 客户端(TabletClient)提供接口`ExecuteSQLParameterized`来处理带参数的查询,并通过RPC调用服务端(Tablet)的`Query`服务。
  3. 服务端(Tablet)的依赖Engine模块进行查询编译和执行。
  4. 查询语句的编译需要经过SQL语法分析,计划生成优化,表达式Codegen三个主要阶段。编译成功后,编译结果会存放在当前执行会话(jizSeesion)的SQL上下文中(SqlContext)。如果当前查询语句已经预编译过,则不需要重复编译。可直接从编译缓存中获取相对应的编译产物存放到RunSession的SqlContext中。
  5. 查询语句的执行需要调用RunSeesion的`Run`接口。执行结果`run output`会存放到response的附件中,回传给TabletClient。最终存放到`ResultSet`返回给`JavaApplication`
OpenMLDB(一文了解带参数查询语句(paramterized query statement))

文章图片

1. JDBC PreparedStatement1.1 JDBC Prepared Statements 概览
Sometimes it is more convenient to use a `PreparedStatement` object for sending SQL statements to the database. This special type of statement is derived from the more general class, `Statement`, that you already know.
If you want to execute a `Statement` object many times, it usually reduces execution time to use a `PreparedStatement` object instead.[[2]](??Using Prepared Statements??)
JDBC提供`PreparedStatement`给用户执行参数的SQL语句。用户可以使用PrepareStatment执行带参数的查询、插入、更新等操作。这个小节,我们讲详细OpenMLDB的PrepareStatement执行带参数查询语句的细节。
【OpenMLDB(一文了解带参数查询语句(paramterized query statement))】1.2 OpenMLDB PreapredStatement的用法介绍
public void parameterizedQueryDemo() {
SdkOption option = new SdkOption();
option.setZkPath(TestConfig.ZK_PATH);
option.setZkCluster(TestConfig.ZK_CLUSTER);
option.setSessionTimeout(200000);
try {
SqlExecutor executor = new SqlClusterExecutor(option);
String dbname = "demo_db";
boolean ok = executor.createDB(dbname);
// create table
ok = executor.executeDDL(dbname, "create table t1(user string, trans_amount double, trans_time bigint, index(key=user, ts=trans_time)); ");
// insert normal (1000, \'hello\')
ok = executor.executeInsert(dbname, "insert into t1 values(\'user1\', 1.0, 1592707420000); ");
ok = executor.executeInsert(dbname, "insert into t1 values(\'user1\', 2.0, 1592707410000); ");
ok = executor.executeInsert(

    推荐阅读