Spark Graphx 圖計算 圖構造器彙總

Spark grahx構造圖的方式彙總如下:

1.最原始的方法構造graph

使用Graph的伴生對象的apply方法。允許從頂點和邊的RDD上創建一個圖。

Spark Graphx 圖計算 圖構造器彙總

import org.apache.log4j.{Level, Logger}import org.apache.spark.graphx.{Graph, Edge}import org.apache.spark.{SparkContext, SparkConf}/*** Created by lichangyue on 2016/9/13.*/object FirstGraph1 {def main(args: Array[String]) { Logger.getLogger("org.apache.spark").setLevel(Level.ERROR); Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.ERROR);val conf = new SparkConf().setAppName("FirstGrahp").setMaster("local")val sc = new SparkContext(conf)val users = sc.parallelize(Array((3L,("rxin","student")),(7L,("jgonzal","postdoc")), (5L,("franklin","prof")),(2L,("istoica","prof"))))val relationships = sc.parallelize(Array(Edge(3L,7L,"collab"),Edge(5L,3L,"advisor"),Edge(2L,5L,"collegaue"),Edge(5L,7L,"pi")))val defaultUser = ("John Doe","Missing")
//創建圖val graph =Graph(users,relationships,defaultUser)//模式匹配 println( "count :" +graph.vertices.filter{ case(id,(name,pos)) => pos =="postdoc"}.count)graph.edges.filter(e => e.srcId > e.dstId ).count()  }}

2.從文件中讀取讀取數據構建graph

從文件中讀取讀取數據構建graph的接口,GraphLoader中的edgeListFile方法。

GraphLoader.edgeListFile 從文件中加載一張圖,文件內容如下:

2 1

4 1

1 2

會自動根據邊創建頂點,所有的邊和頂點的屬性默認為1。

object GraphFromFile2 {def main(args: Array[String]) { Logger.getLogger("org.apache.spark").setLevel(Level.ERROR); Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.ERROR);// val conf = new SparkConf().setAppName("graph2").setMaster("spark://10.58.22.219:7077")val conf = new SparkConf().setAppName("graph2").setMaster("local[4]")val sc = new SparkContext(conf)// val graph = GraphLoader.edgeListFile(sc,"hdfs://S7SA053:8020/stat/web-Google.txt") //指定分區數量val graph = GraphLoader.edgeListFile(sc,"hdfs://S7SA053:8020/stat/web-Google.txt",numEdgePartitions=4)//查看頂點數量val vcount = graph.vertices.count()println("vcount:" +vcount)//查看邊的數量val ecount = graph.edges.count()println("ecount:" + ecount) }}

3. 使用邊創建圖

Graph.fromEdges(ClassTag[VD],ClassTag[ED]):Graph[VD,ED]) 允許僅僅從一個邊RDD上創建一個圖,它自動地創建邊提及的頂點,並分配這些頂點默認的值。

  1. def fromEdges[VD: ClassTag, ED: ClassTag]( edges: RDD[Edge[ED]], defaultValue: VD, edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY, vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED]
val edge=List(//邊的信息(111,122),(111,133),(122,133),(133,144),(133,155),(133,116), (144,155),(155,116),(177,188),(177,199),(188,199))//構建邊的rddval edgeRdd = sc.parallelize(edge).map(x =>{Edge(x._1.toLong , x._2.toLong, None)})//通過邊構建圖 , 0是頂點默認屬性val graph = Graph.fromEdges(edgeRdd,0)

4.從源點和目的點的元組構建

Graph.fromEdgeTuples],VD,Option[PartitionStrategy])(ClassTag[VD]):Graph[VD,Int]) 允許僅僅從一個邊元組組成的RDD上創建一個圖。分配給邊的值為1。它自動地創建邊提及的頂點,並分配這些頂點默認的值。

  1.  def fromEdgeTuples[VD: ClassTag]( rawEdges: RDD[(VertexId, VertexId)], defaultValue: VD, uniqueEdges: Option[PartitionStrategy] = None, edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY, vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, Int]

5.生成隨機圖

GraphGenerators.logNormalGraph隨機圖生成方法源碼:默認出度為4,標準偏差為1.3,並行生成numVertices,partition默認為sc的默認partition。

Logger.getLogger("org.apache.spark").setLevel(Level.ERROR);Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.ERROR);val conf = new SparkConf()val sc = new SparkContext("local","text",conf)val graph = GraphGenerators.logNormalGraph(sc,numVertices = 10) //生成10個頂點的圖 .mapVertices((id,_) => id.toDouble)graph.vertices.collect.foreach(println(_))graph.edges.collect.foreach(println(_))


分享到:


相關文章: