大数据技术体系|Apache Doris 安装部署指南

前言 本文隶属于专栏《大数据安装部署》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!
准备 建议下载下面的 2 个安装包
Apache Doris 1.1.2 FE 安装包
Apache Doris 1.1.2 BE 安装包
另外建议参考我的这篇博客部署好 MySQL 客户端——CentOS 7 安装 MySQL 5.7
部署 FE 首先将 FE 安装包复制到指定的节点。
配置 FE 配置文件是 conf/fe.conf
注意:meta_dir表示元数据存储位置,默认值为 ${DORIS_HOME}/doris-meta,目录不存在则需要手动创建(安装包解压后默认是存在的)。
fe.conf 中的JAVA_OPTS默认为java的最大堆内存为4GB,建议将生产环境调整为超过8G。
笔者的 fe.conf 配置如下,各位同学可以参考。注意,一般情况下这个文件用默认的就行了,除非出现端口冲突的情况。
大数据技术体系|Apache Doris 安装部署指南
文章图片

启动 FE 在 fe 的主目录下执行下面的命令(笔者的 FE 主目录是 /opt/bigdata/doris/apache-doris-fe-1.1.2-bin/fe)

bin/start_fe.sh --daemon

FE 进程启动并进入后台执行。
默认情况下,日志存储在 ${DORIS_HOME}/log 中。
如果启动失败,您可以通过查看 log/fe.log 或 log/fe.out 来查看错误消息。
部署 BE 将 BE 安装包复制到所有节点以部署 BE。
修改所有的 BE 配置 修改 be/conf/be.conf
主要配置 storage_root_path:数据存储目录。默认值是 be/storage,目录不存在则需要手动创建(安装包解压后默认是存在的)。在多目录的情况下,使用英文分号; 分离(不要在最后一个目录之后添加; )。
笔者的 be.conf 配置如下,各位同学可以参考,注意端口冲突请自行更改(可以使用这个命令netstat -anp | grep 端口号查看端口是否被占用)
大数据技术体系|Apache Doris 安装部署指南
文章图片

将所有 BE 节点添加到 FE BE 节点需要在 FE 中添加,然后才能加入集群。
建议使用 mysql-client(下载 MySQL 5.7)连接到 FE:
如果要用 Apache Doris 的话,建议还是装个 MySQL,毕竟完美兼容 MySQL 的协议和语法。
建议参考我的这篇博客安装好 MySQL——CentOS 7 安装 MySQL 5.7
./mysql-client -h fe_host -P query_port -uroot

  • fe_host:FE 所在的节点IP;
  • query_port:fe/conf/fe.conf 中的 query_port;
  • u:默认使用 root 帐户,无需输入密码即可登录。
命令执行后,可以看到如下的输出:
[root@node1 ~]# mysql -h node1 -P 9030 -uroot Welcome to the MySQL monitor.Commands end with ; or \g. Your MySQL connection id is 0 Server version: 5.7.37 Doris version 1.1.2-rc05-a8323dae4Copyright (c) 2000, 2022, Oracle and/or its affiliates.Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.Type 'help; ' or '\h' for help. Type '\c' to clear the current input statement.mysql>

登录后,执行以下命令来添加每个BE:
ALTER SYSTEM ADD BACKEND "be_host:heartbeat_service_port";

  • be_host:BE 所在的节点IP;
  • heartbeat_service_port:be/conf/be.conf 中的 heartbeat_service_port。
如下所示:
mysql> ALTER SYSTEM ADD BACKEND "node1:9050"; Query OK, 0 rows affected (0.16 sec) mysql> ALTER SYSTEM ADD BACKEND "node2:9050"; Query OK, 0 rows affected (0.01 sec) mysql> ALTER SYSTEM ADD BACKEND "node3:9050"; Query OK, 0 rows affected (0.00 sec)

启动 BE 在每一个节点的 be 的主目录下执行下面的命令(笔者的 BE 主目录是 /opt/bigdata/doris/apache-doris-be-1.1.2-bin-x86_64/be)
bin/start_be.sh --daemon

BE 进行启动并进入后台执行。
默认情况下,日志存储在 be/log 目录中。
如果启动失败,可以通过查看 be/log/be.log 或 be/log/be.out 来查看错误消息。
至此,实际上安装部署过程已经结束了,但是我们需要检查一下安装的有没有问题,Apache Doris 集群能不能正常工作。
检查 查看 BE 状态 使用 MySQL 客户端连接 FE 并执行下面的命令查看 BE 的状态:
SHOW PROC '/backends';

如果一切正常的话,Alive 列应该是 true,如下所示
mysql> SHOW PROC '/backends'; +-----------+-----------------+---------------+----------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------------------------+--------+----------------------+-------------------------------------------------------------------------------------------------------------------------------+ | BackendId | Cluster| IP| HostName | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime| LastHeartbeat| Alive | SystemDecommissioned | ClusterDecommissioned | TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | MaxDiskUsedPct | Tag| ErrMsg | Version| Status| +-----------+-----------------+---------------+----------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------------------------+--------+----------------------+-------------------------------------------------------------------------------------------------------------------------------+ | 10002| default_cluster | 192.168.10.11 | node1| 9050| 9060| 8040| 8060| 2022-09-26 09:59:55 | 2022-09-26 10:13:58 | true| false| false| 0| 0.000| 12.509 GB| 16.986 GB| 26.36 % | 26.36 %| {"location" : "default"} || 1.1.2-rc05-a8323dae4 | {"lastSuccessReportTabletsTime":"2022-09-26 10:13:44","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} | | 10003| default_cluster | 192.168.10.12 | node2| 9050| 9060| 8040| 8060| 2022-09-26 10:13:51 | 2022-09-26 10:13:58 | true| false| false| 0| 0.000| 14.046 GB| 16.986 GB| 17.31 % | 17.31 %| {"location" : "default"} || 1.1.2-rc05-a8323dae4 | {"lastSuccessReportTabletsTime":"2022-09-26 10:13:55","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} | | 10004| default_cluster | 192.168.10.13 | node3| 9050| 9060| 8040| 8060| 2022-09-26 10:13:10 | 2022-09-26 10:13:58 | true| false| false| 0| 0.000| 14.261 GB| 16.986 GB| 16.04 % | 16.04 %| {"location" : "default"} || 1.1.2-rc05-a8323dae4 | {"lastSuccessReportTabletsTime":"2022-09-26 10:13:15","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} | +-----------+-----------------+---------------+----------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------------------------+--------+----------------------+-------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set (0.01 sec)

查看 FE 日志 【大数据技术体系|Apache Doris 安装部署指南】在 fe 的 log 目录下搜索关键字thrift server started,出现如下的输出,说明 FE 是正常工作的。
[root@node1 log]# grep -nR -A 10 "thrift server started" . ./fe.log:36:2022-09-26 09:07:12,649 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [FeServer.start():48] thrift server started. ./fe.log-37-2022-09-26 09:07:12,890 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [RestartApplicationListener.onApplicationStartingEvent():93] Restart disabled due to System property 'spring.devtools.restart.enabled' being set to false ./fe.log-38-2022-09-26 09:07:13,617 INFO (background-preinit|108) [Version.():27] HV000001: Hibernate Validator 5.1.0.Final ./fe.log-39-2022-09-26 09:07:14,188 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [StartupInfoLogger.logStarting():55] Starting PaloFe v1.0-SNAPSHOT using Java 1.8.0_261 on node1 with PID 20366 (/opt/bigdata/doris/apache-doris-fe-1.1.2-bin/fe/lib/doris-fe.jar started by root in /opt/bigdata/doris/apache-doris-fe-1.1.2-bin/fe) ./fe.log-40-2022-09-26 09:07:14,194 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [SpringApplication.logStartupProfileInfo():634] No active profile set, falling back to 1 default profile: "default" ./fe.log-41-2022-09-26 09:07:14,310 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [DeferredLog.logTo():255] For additional web related logging consider setting the 'logging.level.web' property to 'DEBUG' ./fe.log-42-2022-09-26 09:07:15,651 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [RepositoryConfigurationDelegate.registerRepositoriesIn():132] Bootstrapping Spring Data LDAP repositories in DEFAULT mode. ./fe.log-43-2022-09-26 09:07:15,682 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [RepositoryConfigurationDelegate.registerRepositoriesIn():201] Finished Spring Data repository scanning in 19 ms. Found 0 LDAP repository interfaces. ./fe.log-44-2022-09-26 09:07:17,021 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [JettyServletWebServerFactory.getWebServer():166] Server initialized with port: 8030 ./fe.log-45-2022-09-26 09:07:17,104 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [ServletWebServerApplicationContext.prepareWebApplicationContext():292] Root WebApplicationContext: initialization completed in 2791 ms ./fe.log-46-2022-09-26 09:07:18,362 INFO (UNKNOWN 192.168.10.11_9010_1664154423601(-1)|1) [WelcomePageHandlerMapping.():53] Adding welcome page: class path resource [static/index.html] You have new mail in /var/spool/mail/root

Restful API 检查 检查 FE
http://192.168.10.11:8030/api/bootstrap

大数据技术体系|Apache Doris 安装部署指南
文章图片

检查 BE
http://192.168.10.11:8040/api/health

大数据技术体系|Apache Doris 安装部署指南
文章图片

http://192.168.10.12:8040/api/health

大数据技术体系|Apache Doris 安装部署指南
文章图片

http://192.168.10.13:8040/api/health

大数据技术体系|Apache Doris 安装部署指南
文章图片

MySQL 客户端检查 MySQL 客户端检查 FE
mysql> show frontends\G; *************************** 1. row *************************** Name: 192.168.10.11_9010_1664154423601 IP: 192.168.10.11 EditLogPort: 9010 HttpPort: 8030 QueryPort: 9030 RpcPort: 9020 Role: FOLLOWER IsMaster: true ClusterId: 1565676480 Join: true Alive: true ReplayedJournalId: 1829 LastHeartbeat: 2022-09-26 10:48:15 IsHelper: true ErrMsg: Version: 1.1.2-rc05-a8323dae4 CurrentConnected: Yes 1 row in set (0.16 sec)

MySQL 客户端检查 BE
mysql> SHOW BACKENDS\G; *************************** 1. row *************************** BackendId: 10002 Cluster: default_cluster IP: 192.168.10.11 HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2022-09-26 10:37:47 LastHeartbeat: 2022-09-26 10:49:10 Alive: true SystemDecommissioned: false ClusterDecommissioned: false TabletNum: 0 DataUsedCapacity: 0.000 AvailCapacity: 12.503 GB TotalCapacity: 16.986 GB UsedPct: 26.39 % MaxDiskUsedPct: 26.39 % Tag: {"location" : "default"} ErrMsg: Version: 1.1.2-rc05-a8323dae4 Status: {"lastSuccessReportTabletsTime":"2022-09-26 10:48:23","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} *************************** 2. row *************************** BackendId: 10003 Cluster: default_cluster IP: 192.168.10.12 HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2022-09-26 10:13:51 LastHeartbeat: 2022-09-26 10:49:10 Alive: true SystemDecommissioned: false ClusterDecommissioned: false TabletNum: 0 DataUsedCapacity: 0.000 AvailCapacity: 14.046 GB TotalCapacity: 16.986 GB UsedPct: 17.31 % MaxDiskUsedPct: 17.31 % Tag: {"location" : "default"} ErrMsg: Version: 1.1.2-rc05-a8323dae4 Status: {"lastSuccessReportTabletsTime":"2022-09-26 10:48:28","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} *************************** 3. row *************************** BackendId: 10004 Cluster: default_cluster IP: 192.168.10.13 HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2022-09-26 10:13:10 LastHeartbeat: 2022-09-26 10:49:10 Alive: true SystemDecommissioned: false ClusterDecommissioned: false TabletNum: 0 DataUsedCapacity: 0.000 AvailCapacity: 14.261 GB TotalCapacity: 16.986 GB UsedPct: 16.04 % MaxDiskUsedPct: 16.04 % Tag: {"location" : "default"} ErrMsg: Version: 1.1.2-rc05-a8323dae4 Status: {"lastSuccessReportTabletsTime":"2022-09-26 10:48:55","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} 3 rows in set (0.01 sec)

WEB UI 我们使用用户 root 登录 WEB UI,密码为空即可。

登录后可以看到如下的界面:

可以在 Playground 中执行 SQL 命令操作 Apache Doris 集群。
第一次使用 创建数据库
create database demo;

创建表
use demo; CREATE TABLE IF NOT EXISTS demo.example_tbl ( `user_id` LARGEINT NOT NULL COMMENT "user id", `date` DATE NOT NULL COMMENT "", `city` VARCHAR(20) COMMENT "", `age` SMALLINT COMMENT "", `sex` TINYINT COMMENT "", `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "", `cost` BIGINT SUM DEFAULT "0" COMMENT "", `max_dwell_time` INT MAX DEFAULT "0" COMMENT "", `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "" ) AGGREGATE KEY(`user_id`, `date`, `city`, `age`, `sex`) DISTRIBUTED BY HASH(`user_id`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" );

示例数据
10000,2017-10-01,beijing,20,0,2017-10-01 06:00:00,20,10,10 10006,2017-10-01,beijing,20,0,2017-10-01 07:00:00,15,2,2 10001,2017-10-01,beijing,30,1,2017-10-01 17:05:45,2,22,22 10002,2017-10-02,shanghai,20,1,2017-10-02 12:59:12,200,5,5 10003,2017-10-02,guangzhou,32,0,2017-10-02 11:20:00,30,11,11 10004,2017-10-01,shenzhen,35,0,2017-10-01 10:00:15,100,3,3 10004,2017-10-03,shenzhen,35,0,2017-10-03 10:20:22,11,6,6

将上述数据保存到 test.csv 文件中。
导入数据 在这里,我们将保存到上述文件中的数据通过 通过 _stream_load 导入我们刚刚创建的表中。
[root@node1 ~]# curl --location-trusted -u root: -T test.csv -H "column_separator:," http://node1:8030/api/demo/example_tbl/_stream_load { "TxnId": 2, "Label": "c6780496-ac8f-4784-b059-013dec735511", "TwoPhaseCommit": "false", "Status": "Success", "Message": "OK", "NumberTotalRows": 7, "NumberLoadedRows": 7, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 411, "LoadTimeMs": 399, "BeginTxnTimeMs": 54, "StreamLoadPutTimeMs": 229, "ReadDataTimeMs": 0, "WriteDataTimeMs": 18, "CommitAndPublishTimeMs": 93 }

查询数据
mysql> select * from example_tbl; +---------+------------+-----------+------+------+---------------------+------+----------------+----------------+ | user_id | date| city| age| sex| last_visit_date| cost | max_dwell_time | min_dwell_time | +---------+------------+-----------+------+------+---------------------+------+----------------+----------------+ | 10000| 2017-10-01 | beijing|20 |0 | 2017-10-01 06:00:00 |20 |10 |10 | | 10001| 2017-10-01 | beijing|30 |1 | 2017-10-01 17:05:45 |2 |22 |22 | | 10002| 2017-10-02 | shanghai|20 |1 | 2017-10-02 12:59:12 |200 |5 |5 | | 10003| 2017-10-02 | guangzhou |32 |0 | 2017-10-02 11:20:00 |30 |11 |11 | | 10004| 2017-10-01 | shenzhen|35 |0 | 2017-10-01 10:00:15 |100 |3 |3 | | 10004| 2017-10-03 | shenzhen|35 |0 | 2017-10-03 10:20:22 |11 |6 |6 | | 10006| 2017-10-01 | beijing|20 |0 | 2017-10-01 07:00:00 |15 |2 |2 | +---------+------------+-----------+------+------+---------------------+------+----------------+----------------+ 7 rows in set (0.17 sec)mysql> select * from example_tbl where city='shanghai'; +---------+------------+----------+------+------+---------------------+------+----------------+----------------+ | user_id | date| city| age| sex| last_visit_date| cost | max_dwell_time | min_dwell_time | +---------+------------+----------+------+------+---------------------+------+----------------+----------------+ | 10002| 2017-10-02 | shanghai |20 |1 | 2017-10-02 12:59:12 |200 |5 |5 | +---------+------------+----------+------+------+---------------------+------+----------------+----------------+ 1 row in set (0.14 sec)mysql> select city, sum(cost) as total_cost from example_tbl group by city; +-----------+------------+ | city| total_cost | +-----------+------------+ | beijing|37 | | shenzhen|111 | | guangzhou |30 | | shanghai|200 | +-----------+------------+ 4 rows in set (0.16 sec)

    推荐阅读