昆明快速建站模板,きょこんきょうしゃ在线,王占山事迹,wordpress搭建网盘创建RDD
在Spark中创建RDD的方式分为三种:
从外部存储创建RDD从集合中创建RDD从其他RDD创建
textfile
调用SparkContext.textFile()方法#xff0c;从外部存储中读取数据来创建 RDD
parallelize
调用SparkContext 的 parallelize()方法#xff0c;将一个存在的集合从外部存储中读取数据来创建 RDD
parallelize
调用SparkContext 的 parallelize()方法将一个存在的集合变成一个RDD
makeRDD
方法一
/** Distribute a local Scala collection to form an RDD.** This method is identical to parallelize.*/def makeRDD[T: ClassTag](seq: Seq[T],numSlices: Int defaultParallelism): RDD[T] withScope {parallelize(seq, numSlices)}
方法二分配一个本地Scala集合形成一个RDD为每个集合对象创建一个最佳分区。
/*** Distribute a local Scala collection to form an RDD, with one or more* location preferences (hostnames of Spark nodes) for each object.* Create a new partition for each collection item.*/def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] withScope {assertNotStopped()val indexToPrefs seq.zipWithIndex.map(t (t._2, t._1._2)).toMapnew ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, 1), indexToPrefs)} 举例
scala val rdd sc.parallelize(1 to 6, 2)
val rdd: org.apache.spark.rdd.RDD[Int] ParallelCollectionRDD[2] at parallelize at console:1scala rdd.collect()
val res4: Array[Int] Array(1, 2, 3, 4, 5, 6)scala val seq List((American Person, List(Tom, Jim)), (China Person, List(LiLei, HanMeiMei)), (Color Type, List(Red, Blue)))
val seq: List[(String, List[String])] List((American Person,List(Tom, Jim)), (China Person,List(LiLei, HanMeiMei)), (Color Type,List(Red, Blue)))scala val rdd2 sc.makeRDD(seq)
val rdd2: org.apache.spark.rdd.RDD[String] ParallelCollectionRDD[0] at makeRDD at console:1scala rdd2.partitions.size
val res0: Int 3scala rdd2.foreach(println)
American Person
Color Type
China Personscala val rdd1 sc.parallelize(seq)
val rdd1: org.apache.spark.rdd.RDD[(String, List[String])] ParallelCollectionRDD[1] at parallelize at console:1scala rdd1.partitions.size
val res1: Int 2scala rdd2.collect()
val res2: Array[String] Array(American Person, China Person, Color Type)scala rdd1.collect()
val res3: Array[(String, List[String])] Array((American Person,List(Tom, Jim)), (China Person,List(LiLei, HanMeiMei)), (Color Type,List(Red, Blue)))scala var lines sc.textFile(/root/tmp/a.txt,3)
var lines: org.apache.spark.rdd.RDD[String] /root/tmp/a.txt MapPartitionsRDD[4] at textFile at console:1scala lines.collect()
val res6: Array[String] Array(a,b,c)scala lines.partitions.size
val res7: Int 3转换算子
flatMap
map
reduceByKey
groupByKey
举例
scala var lines sc.textFile(/root/tmp/a.txt,3)
var lines: org.apache.spark.rdd.RDD[String] /root/tmp/a.txt MapPartitionsRDD[13] at textFile at console:1scala lines.flatMap(xx.split(,)).map(x(x,1)).reduceByKey((a,b)ab).foreach(println)
(c,2)
(b,1)
(d,1)
(a,2)scala lines.collect()
val res27: Array[String] Array(a,b,c, c, a,d)scala lines.map(_.split(,)).collect()
val res25: Array[Array[String]] Array(Array(a, b, c), Array(c), Array(a, d))scala lines.flatMap(_.split(,)).collect()
val res26: Array[String] Array(a, b, c, c, a, d)
行动算子