goquery初试-房价信息爬取

  这次准备试试goquery库,来爬取我房网的房价信息。首先要安装goquery,参考当go get遇到墙时,安装好库就可以开始。

  比较重要的就是观察页面布局,元素特征。

goquery初试-房价信息爬取

goquery初试-房价信息爬取

goquery初试-房价信息爬取

  本次爬虫只要有房价的楼盘信息,代码如下:

<code>

package

main

import

(

"fmt"

"strconv"

"time"

"github.com/PuerkitoBio/goquery"

"log"

"bytes"

"encoding/csv"

"os"

)

func

p

()

{ a:=

0

fileName :=

"wofang.csv"

buf :=

new

(bytes.Buffer) r2 := csv.NewWriter(buf)

for

i :=

1

; i

202

; i++ { fmt.Println(

"正在抓取第"

+ strconv.Itoa(i) +

"页......"

) url :=

"http://www.wofang.com/building/p/"

+ strconv.Itoa(i) +

"/"

if

i==

1

{ url=

"http://www.wofang.com/building/"

} doc, err := goquery.NewDocument(url)

if

err !=

nil

{ log.Fatal(err) } doc.Find(

".m ul li"

).Each(

func

(i

int

, s *goquery.Selection)

{ name:= s.Find(

".title a"

).Text() location:= s.Find(

".time"

).Text() price:=s.Find(

".sale-price font"

).Text()

if

price!=

""

{ a++ s :=

make

([]

string

,

3

) s[

0

] = name s[

1

] = price s[

2

] = location r2.Write(s) r2.Flush() fmt.Printf(

"%s,%s,%s\n"

, name,price, location) } }) } fout,err := os.Create(fileName)

defer

fout.Close()

if

err !=

nil

{ fmt.Println(fileName,err)

return

} fout.WriteString(buf.String()) fmt.Print(a) }

func

main

()

{ t1 := time.Now() p() elapsed := time.Since(t1) fmt.Println(

""

) fmt.Println(

"爬虫结束,总共耗时: "

, elapsed) } /<code>
goquery初试-房价信息爬取

  最后根据链接规律(大致就是:http://www.wofang.com/building/" + 地市键值+ "-te_住宅/",键值用的比较笨的方法手动一个个点出来的)爬一下各地市的信息并用echart做可视化。(本文仅供参考)

goquery初试-房价信息爬取


分享到:


相關文章: