Spark:combineByKey算子

combineByKey是Transformation算子且有shuffle
Spark:combineByKey算子
文章图片

【Spark:combineByKey算子】传入三个函数
第一个函数将value取出来
分区内聚合相同key的value追加
聚合后相同(类型一致)key的value追加

val a = sc.parallelize(List("dog","cat","gnu","salmon", "rabbit","turkey","wolf","bear","bee"), 3) val b = sc.parallelize(List(1,1,2,2,2,1,2,2,2), 3) val rdd3: RDD[(Int, String)] = b.zip(a) /rdd3=ArrayBuffer((1,dog), (1,cat), (2,gnu), (2,salmon), (2,rabbit), (1,turkey), (2,wolf), (2,bear), (2,bee)) val value = https://www.it610.com/article/rdd3.combineByKey(x => List(x), (a: List[String], b: String) => a :+ b, (w: List[String], q: List[String]) => w ++ q)

第一个参数是将list的第一个value取出来,
所以第二个参数的第一个,也是list,追加相同分区key相同的value局部聚合
全局聚合中List的相同key的value聚合

    推荐阅读