spark将Rdd转成list和set

【spark将Rdd转成list和set】有需求要将Rdd转成list,上网查资料实现都很复杂,后来发现其实是非常简单的,collect()完已经就是Array了,看源码

/** * Return an array that contains all of the elements in this RDD. * * @note This method should only be used if the resulting array is expected to be small, as * all the data is loaded into the driver's memory. */ def collect(): Array[T] = withScope { val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray) Array.concat(results: _*) }

之后只要接toList或toSet就可以,上代码:
import org.apache.kudu.spark.kudu.KuduContext import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.SparkSessionimport scala.collection.mutable.ArrayBufferobject Spark_kudu { def main(args: Array[String]): Unit = { val conf = new SparkConf() val sc = new SparkContext()val arr = Array(1,2,3,4,5) val rdd = sc.parallelize(arr) val list: List[Int] = rdd.collect().toList val set: Set[Int] = rdd.collect().toSetspark.close() } }


    推荐阅读