hoodie.datasource.hive_sync.partition_extractor_class配置

学向勤中得,萤窗万卷书。这篇文章主要讲述hoodie.datasource.hive_sync.partition_extractor_class配置相关的知识,希望能为你提供帮助。
【hoodie.datasource.hive_sync.partition_extractor_class配置】DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为location;

hoodie.datasource.hive_sync.partition_fields配置为location,与写入Hudi的分区字段相同;

DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.SimpleKeyGenerator,或者不配置该选项,默认为org.apache.hudi.keygen.SimpleKeyGenerator;

hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.MultiPartKeysValueExtractor;

Hudi同步到Hive创建的表如下

如使用上述date字段做为分区字段,核心配置项如下

DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为date;

hoodie.datasource.hive_sync.partition_fields配置为date,与写入Hudi的分区字段相同;

DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.SimpleKeyGenerator,或者不配置该选项,默认为org.apache.hudi.keygen.SimpleKeyGenerator;

hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor;

多分区表示使用多个字段表示作为分区字段的场景,如上述使用location字段和sex字段,核心配置项如下

DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为location,sex;

hoodie.datasource.hive_sync.partition_fields配置为location,sex,与写入Hudi的分区字段相同;

DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.ComplexKeyGenerator;

hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.MultiPartKeysValueExtractor;

无分区场景是指无分区字段,写入Hudi的数据集无分区。核心配置如下

DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为空字符串;

hoodie.datasource.hive_sync.partition_fields配置为空字符串,与写入Hudi的分区字段相同;

DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.NonpartitionedKeyGenerator;

hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.NonPartitionedExtractor;

Hudi同步到Hive创建的表如下

除了上述几种常见的分区方式,还有一种Hive风格分区格式,如location=beijing/sex=male格式,以location,sex作为分区字段,核心配置如下

DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为location,sex;

hoodie.datasource.hive_sync.partition_fields配置为location,sex,与写入Hudi的分区字段相同;

DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.ComplexKeyGenerator;

hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor;

DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY()配置为true;

生成的Hudi数据集目录结构会为如下格式

https://page.om.qq.com/page/O3-aVCtpfl4XNZmB2in2k7xw0

https://page.om.qq.com/page/O2QdZjKiqZ_x-Ej_6n6tEoQQ0

https://page.om.qq.com/page/O5I2CCmM7-4Xr7D_MyVl3faA0

https://page.om.qq.com/page/O-Xo2Z4Bnv5DET2BiQsM4n3g0

https://page.om.qq.com/page/Ovc562VXZZsCh3a7NH91pRAA0

https://page.om.qq.com/page/OFbrRUQcH05nCeWYYzC4v5iw0

https://page.om.qq.com/page/OXVmimX6iLL1DFij0WqKwDxg0

https://page.om.qq.com/page/O9cIcQ7DMpIzGO0tonFz70sw0

    推荐阅读