DataX安装及基本使用

欠伸展肢体,吟咏心自愉。这篇文章主要讲述DataX安装及基本使用相关的知识,希望能为你提供帮助。


前置准备这里我们演示 mysql 和 HDFS 之间的数据导入导出,需要预先安装 Hadoop集群。Hadoop 集群的安装教程如下:
?


一、DataX 概述DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。

二、安装 2.1 下载并解压
?
这里我下载的是最新版本的 DataX3.0 。?

# 下载后进行解压
[xiaokang@hadoop ~]$ tar -zxvf datax.tar.gz -C /opt/software/

2.2 运行自检脚本
[xiaokang@hadoop ~]$ cd /opt/software/datax/
[xiaokang@hadoop datax]$ bin/datax.py job/job.json

出现以下界面说明DataX安装成功

三、基本使用 3.1 从stream读取数据并打印到控制台
1. 查看官方json配置模板
[xiaokang@hadoop ~]$ python /opt/software/datax/bin/datax.py -r streamreader -w streamwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the streamreader document:
https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md

Please refer to the streamwriter document:
https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md

Please save the following configuration as a json file anduse
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.

{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [],
"sliceRecordCount": ""
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}

2. 根据模板编写json文件
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [
{
"type":"string",
"
},
{
"type":"string",
"value":"你好,世界-DataX"
}
],
"sliceRecordCount": "10"
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "utf-8",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": "2"
}
}
}
}

3. 运行Job
[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./stream2stream.json

DataX安装及基本使用

文章图片

3.2 MySQL数据导入到HDFS
示例:导出 MySQL 数据库中的 ??help_keyword??? 表到 HDFS 的 ??/datax??目录下(此目录必须提前创建)。


注:help_keyword 是 MySQL 内置的一张字典表,之后的示例均使用这张表。


1. 查看官方json配置模板
[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r mysqlreader -w hdfswriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the mysqlreader document:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md

Please refer to the hdfswriter document:
https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md

Please save the following configuration as a json file anduse
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.

{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": [],
"table": []
}
],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [],
"compress": "",
"defaultFS": "",
"fieldDelimiter": "",
"fileName": "",
"fileType": "",
"path": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}

2. 根据模板编写json文件mysqlreader参数解析:
DataX安装及基本使用

文章图片

hdfswriter参数解析:
DataX安装及基本使用

文章图片

{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"help_keyword_id",
"name"
],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://192.168.1.106:3306/mysql"
],
"table": [
"help_keyword"
]
}
],
"password": "xiaokang",
"username": "root"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name":"help_keyword_id",
"type":"int"
},
{
"name":"name",
"type":"string"
}
],
"defaultFS": "hdfs://hadoop:9000",
"fieldDelimiter": "|",
"fileName": "keyword.txt",
"fileType": "text",
"path": "/datax",
"writeMode": "append"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}

3. 运行Job
[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./mysql2hdfs.json

3.3 HDFS数据导出到MySQL
1. 将3.2中导入的文件重命名并在数据库创建表
[xiaokang@hadoop ~]$ hdfs dfs -mv /datax/keyword.txt__4c0e0d04_e503_437a_a1e3_49db49cbaaed /datax/keyword.txt

表必须预先创建,建表语句如下:
CREATE TABLE help_keyword_from_hdfs_datax LIKE help_keyword;

2. 查看官方json配置模板
[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r hdfsreader -w mysqlwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the hdfsreader document:
https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md

Please refer to the mysqlwriter document:
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md

Please save the following configuration as a json file anduse
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.

{
"job": {
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"column": [],
"defaultFS": "",
"encoding": "UTF-8",
"fieldDelimiter": ",",
"fileType": "orc",
"path": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}

3. 根据模板编写json文件
{
"job": {
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"column": [
"*"
],
"defaultFS": "hdfs://hadoop:9000",
"encoding": "UTF-8",
"fieldDelimiter": "|",
"fileType": "text",
"path": "/datax/keyword.txt"
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [
"help_keyword_id",
"name"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.106:3306/mysql",
"table": ["help_keyword_from_hdfs_datax"]
}
],
"password": "xiaokang",
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}

3. 运行Job
[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./hdfs2mysql.json

【DataX安装及基本使用】


    推荐阅读