学向勤中得,萤窗万卷书。这篇文章主要讲述hoodie.datasource.hive_sync.partition_extractor_class配置相关的知识,希望能为你提供帮助。
【hoodie.datasource.hive_sync.partition_extractor_class配置】DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为location;
hoodie.datasource.hive_sync.partition_fields配置为location,与写入Hudi的分区字段相同;
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.SimpleKeyGenerator,或者不配置该选项,默认为org.apache.hudi.keygen.SimpleKeyGenerator;
hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.MultiPartKeysValueExtractor;
Hudi同步到Hive创建的表如下
如使用上述date字段做为分区字段,核心配置项如下
DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为date;
hoodie.datasource.hive_sync.partition_fields配置为date,与写入Hudi的分区字段相同;
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.SimpleKeyGenerator,或者不配置该选项,默认为org.apache.hudi.keygen.SimpleKeyGenerator;
hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor;
多分区表示使用多个字段表示作为分区字段的场景,如上述使用location字段和sex字段,核心配置项如下
DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为location,sex;
hoodie.datasource.hive_sync.partition_fields配置为location,sex,与写入Hudi的分区字段相同;
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.ComplexKeyGenerator;
hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.MultiPartKeysValueExtractor;
无分区场景是指无分区字段,写入Hudi的数据集无分区。核心配置如下
DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为空字符串;
hoodie.datasource.hive_sync.partition_fields配置为空字符串,与写入Hudi的分区字段相同;
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.NonpartitionedKeyGenerator;
hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.NonPartitionedExtractor;
Hudi同步到Hive创建的表如下
除了上述几种常见的分区方式,还有一种Hive风格分区格式,如location=beijing/sex=male格式,以location,sex作为分区字段,核心配置如下
DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为location,sex;
hoodie.datasource.hive_sync.partition_fields配置为location,sex,与写入Hudi的分区字段相同;
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.ComplexKeyGenerator;
hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor;
DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY()配置为true;
生成的Hudi数据集目录结构会为如下格式
https://page.om.qq.com/page/O3-aVCtpfl4XNZmB2in2k7xw0
https://page.om.qq.com/page/O2QdZjKiqZ_x-Ej_6n6tEoQQ0
https://page.om.qq.com/page/O5I2CCmM7-4Xr7D_MyVl3faA0
https://page.om.qq.com/page/O-Xo2Z4Bnv5DET2BiQsM4n3g0
https://page.om.qq.com/page/Ovc562VXZZsCh3a7NH91pRAA0
https://page.om.qq.com/page/OFbrRUQcH05nCeWYYzC4v5iw0
https://page.om.qq.com/page/OXVmimX6iLL1DFij0WqKwDxg0
https://page.om.qq.com/page/O9cIcQ7DMpIzGO0tonFz70sw0
推荐阅读
- 一基础配置
- 自己的服务器
- 二zookeeper搭建
- linux下自动化实现分布式压测
- K8S之list-watch机制+节点以及亲和性调度
- Kubernetes的污点(Taint)和容忍(Tolerations)
- K8S存储卷,PV和PVC
- 三kafka搭建
- K8S的Pod控制器