Phoenix快速入门|Phoenix快速入门 - BigData工具篇

Phoenix是HBase的一个开源SQL外壳。您可以使用标准的JDBC api而不是常规的HBase客户机api来创建表、插入数据和查询HBase数据。
使用Phoenix,可以编写更少的代码,得到更好的处理性能:

  • 将SQL查询编译为本机HBase扫描
  • 确定您的扫描键的最佳开始和结束位置
  • 调度并行执行扫描
  • 为数据提供运算能力
  • 将where子句中的条件推到服务器端过滤器
  • 通过服务器端钩子(称为协处理器)执行聚合查询
除了这些项目,我们还有一些有趣的改进工作,以进一步优化性能:
  • 用于改进非行键列查询性能的二级索引
  • 统计数据收集以改进并行化,并指导优化之间的选择
  • 跳过扫描过滤器来优化IN,LIKE和OR查询
  • 可选的行键加盐以均匀分配写负载
下面先来一个官例:
1、环境搭建参考:DataX+Phoenix+Hbase大数据分析平台整合
2、创建自己的SQL脚本并使用命令行工具$PhoenixPaht/bin/psql.py执行脚本。
首先,创建us_population.sql文件,包含表定义
[admin@mvxl2429 examples]$ pwd /mnt/yst/phoenix-5.0.0/examples [admin@mvxl2429 examples]$ cat us_population.sql CREATE TABLE IF NOT EXISTS us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));

然后,创建一个us_population.csv文件,包含要存入该表的数据
[admin@mvxl2429 examples]$ cat us_population.csv NY,New York,8143197 CA,Los Angeles,3844829 IL,Chicago,2842518 TX,Houston,2016582 PA,Philadelphia,1463281 AZ,Phoenix,1461575 TX,San Antonio,1256509 CA,San Diego,1255540 TX,Dallas,1213825 CA,San Jose,912332

最后,创建一个us_population_queries.sql文件,包含要运行的查询语句
[admin@mvxl2429 examples]$ cat us_population_queries.sql SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum" FROM us_population GROUP BY state ORDER BY sum(population) DESC;

从命令终端执行以下命令:
[admin@mvxl2429 examples]$ ../bin/sqlline.py localhost:2182 us_population.sql us_population.csv us_population_queries.sql usage: sqlline.py [-h] [-v VERBOSE] [-c COLOR] [-fc FASTCONNECT] [zookeepers] [sqlfile] sqlline.py: error: unrecognized arguments: us_population.csv us_population_queries.sql [admin@mvxl2429 examples]$ ../bin/psql.py localhost:2182 us_population.sql us_population.csv us_population_queries.sql 19/08/08 14:51:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable no rows upserted Time: 1.508 sec(s)csv columns from database. CSV Upsert complete. 10 rows upserted Time: 0.117 sec(s)StCity CountPopulation Sum -- ---------------------------------------- ---------------------------------------- NY18143197 CA36012701 TX34486916 IL12842518 PA11463281 AZ11461575 Time: 0.024 sec(s)

3、查看bin/performance.py脚本,为您想到的任何模式创建任意多的行,并对其运行定时查询。
【Phoenix快速入门|Phoenix快速入门 - BigData工具篇】性能测试的数量跟内存大小密切相关,基于机器配置不高,这里运行1000条数据作为测试。测试海量数据需要加大内存,避免OutOfMemoryError异常。
[admin@mvxl2429 bin]$ ./performance.py localhost:2182 1000 Phoenix Performance Evaluation Script 1.0 -----------------------------------------Creating performance table... 19/08/08 16:04:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable no rows upserted Time: 0.943 sec(s)Query # 1 - Count - SELECT COUNT(1) FROM PERFORMANCE_1000; Query # 2 - Group By First PK - SELECT HOST FROM PERFORMANCE_1000 GROUP BY HOST; Query # 3 - Group By Second PK - SELECT DOMAIN FROM PERFORMANCE_1000 GROUP BY DOMAIN; Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,'DAY') DAY FROM PERFORMANCE_1000 GROUP BY TRUNC(DATE,'DAY'); Query # 5 - Filter + Count - SELECT COUNT(1) FROM PERFORMANCE_1000 WHERE CORE<10; Generating and upserting data.../mnt/yst/phoenix-5.0.0/bin/../phoenix-core-5.0.0-HBase-2.0-tests.jar .19/08/08 16:04:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable csv columns from database. CSV Upsert complete. 1000 rows upserted Time: 0.492 sec(s)COUNT(1) ---------------------------------------- 1000 Time: 0.034 sec(s)HO -- CS EU NA Time: 0.018 sec(s)DOMAIN ---------------------------------------- Apple.com Google.com Salesforce.com Time: 0.024 sec(s)DAY ----------------------- 2019-08-08 00:00:00.000 Time: 0.028 sec(s)COUNT(1) ---------------------------------------- 19 Time: 0.019 sec(s)[admin@mvxl2429 bin]$ ./sqlline.py localhost:2182 Setting property: [incremental, false] Setting property: [isolation, TRANSACTION_READ_COMMITTED] issuing: !connect jdbc:phoenix:localhost:2182 none none org.apache.phoenix.jdbc.PhoenixDriver Connecting to jdbc:phoenix:localhost:2182 19/08/08 15:21:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connected to: Phoenix (version 5.0) Driver: PhoenixEmbeddedDriver (version 5.0) Autocommit status: true Transaction isolation: TRANSACTION_READ_COMMITTED Building list of tables and columns for tab-completion (set fastconnect to true to skip)... 146/146 (100%) Done Done sqlline version 1.2.0 0: jdbc:phoenix:localhost:2182> !tables +------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+ | TABLE_CAT| TABLE_SCHEM|TABLE_NAME|TABLE_TYPE| REMARKS| TYPE_NAME| SELF_REFERENCING_COL_NAME| REF_GENERATION| INDEX_STATE|| +------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+ || SYSTEM| CATALOG| SYSTEM TABLE||||||| || SYSTEM| FUNCTION| SYSTEM TABLE||||||| || SYSTEM| LOG| SYSTEM TABLE||||||| || SYSTEM| SEQUENCE| SYSTEM TABLE||||||| || SYSTEM| STATS| SYSTEM TABLE||||||| ||| PERFORMANCE_1000| TABLE||||||| ||| PERSON| TABLE||||||| ||| US_POPULATION| TABLE||||||| +------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+ 0: jdbc:phoenix:localhost:2182> select * from PERFORMANCE_1000 limit 3; +-------+------------+------------+--------------------------+-------+-------+-----------------+ | HOST|DOMAIN|FEATURE|DATE| CORE|DB| ACTIVE_VISITOR| +-------+------------+------------+--------------------------+-------+-------+-----------------+ | CS| Apple.com| Dashboard| 2019-08-08 16:04:38.000| 373| 658| 3847| | CS| Apple.com| Dashboard| 2019-08-08 16:04:47.000| 465| 29| 4551| | CS| Apple.com| Dashboard| 2019-08-08 16:05:05.000| 298| 1463| 142| +-------+------------+------------+--------------------------+-------+-------+-----------------+ 3 rows selected (0.041 seconds)

    推荐阅读