Foxtable(狐表)用户栏目专家坐堂 → [求助]求高手帮我写个抓取网页数据的代码


  共有4771人关注过本帖树形打印复制链接

主题:[求助]求高手帮我写个抓取网页数据的代码

帅哥哟,离线,有人找我吗?
有点甜
  1楼 | 信息 | 搜索 | 邮箱 | 主页 | UC


加好友 发短信
等级:版主 帖子:85326 积分:427815 威望:0 精华:5 注册:2012/10/18 22:13:00
  发帖心情 Post By:2017/11/6 14:30:00 [显示全部帖子]

参考代码

 

Dim web As New System.Windows.Forms.WebBrowser()
web.Navigate("http://cx.cqjsxx.com:8010/CX_SGQYMRPM2.aspx")
Do Until web.ReadyState = 4
    Application.DoEvents
Loop

Dim elems As object = web.Document.GetElementById("GV_SGPM").GetElementsByTagName("tr")
For Each elem As object In elems
    Dim tdelems As object =  elem.GetElementsByTagName("td")
    If tdelems.count >= 10 Then
        output.show(tdelems(0).innertext & "  " & tdelems(1).innertext & "  " & tdelems(2).innertext)
    End If
Next

 


   


 回到顶部
帅哥哟,离线,有人找我吗?
有点甜
  2楼 | 信息 | 搜索 | 邮箱 | 主页 | UC


加好友 发短信
等级:版主 帖子:85326 积分:427815 威望:0 精华:5 注册:2012/10/18 22:13:00
  发帖心情 Post By:2017/11/6 14:44:00 [显示全部帖子]

1、看懂代码;

 

2、请去测试窗口调试 http://www.foxtable.com/webhelp/scr/0213.htm

 


 回到顶部
帅哥哟,离线,有人找我吗?
有点甜
  3楼 | 信息 | 搜索 | 邮箱 | 主页 | UC


加好友 发短信
等级:版主 帖子:85326 积分:427815 威望:0 精华:5 注册:2012/10/18 22:13:00
  发帖心情 Post By:2017/11/6 16:18:00 [显示全部帖子]

参考代码

 

Dim web As New System.Windows.Forms.WebBrowser()
web.Navigate("http://cx.cqjsxx.com:8010/CX_SGQYMRPM2.aspx")
Do Until web.ReadyState = 4
    Application.DoEvents
Loop

Dim pg As Integer = 0
Dim cp As Integer = 0
For Each sp As object In web.Document.GetElementsByTagName("span")
    If sp.id IsNot Nothing
        If sp.id.contains("LabelPageCount") Then
            pg = sp.InnerText.split(" ")(1)
        ElseIf sp.id.contains("LabelCurrentPage") Then
            cp = sp.InnerText.split(" ")(1)
        End If
    End If
Next
msgbox(pg & " " & cp)
For i As Integer = cp To pg-1
    Dim elems As object = web.Document.GetElementById("GV_SGPM").GetElementsByTagName("tr")
    For Each elem As object In elems
        Dim tdelems As object =  elem.GetElementsByTagName("td")
        If tdelems.count >= 10 Then
            output.show(tdelems(0).innertext & "  " & tdelems(1).innertext & "  " & tdelems(2).innertext)
        End If
    Next
   
    '调试
    If i = 3 Then '第三页后退出
        Exit For
    End If

   
    Dim btn = web.Document.GetElementById("GV_SGPM_ctl16_LinkButtonNextPage")
    btn.InvokeMember("click")
   
    Dim ok As Boolean = False
    Do Until web.ReadyState = 4 AndAlso ok
        Application.DoEvents
        For Each sp As object In web.Document.GetElementsByTagName("span")
            If sp.id IsNot Nothing
                If sp.id.contains("LabelCurrentPage") Then
                    If sp.InnerText.split(" ")(1) <> i Then
                        ok = True
                        Exit For
                    Else
                        Application.DoEvents
                        Exit For
                    End If
                End If
            End If
        Next
    Loop
Next


 回到顶部
帅哥哟,离线,有人找我吗?
有点甜
  4楼 | 信息 | 搜索 | 邮箱 | 主页 | UC


加好友 发短信
等级:版主 帖子:85326 积分:427815 威望:0 精华:5 注册:2012/10/18 22:13:00
  发帖心情 Post By:2017/11/6 16:46:00 [显示全部帖子]

你要用这种方式获取

 

For Each sp As object In web.Document.GetElementsByTagName("span")
    If sp.id IsNot Nothing
        If sp.id.contains("LabelPageCount") Then
            pg = sp.InnerText.split(" ")(1)
        ElseIf sp.id.contains("LabelCurrentPage") Then
            cp = sp.InnerText.split(" ")(1)

        ElseIf sp.id.contains("LabelRecordCount") Then
            msgbox(sp.InnerText)
        End If
    End If
Next


 回到顶部
帅哥哟,离线,有人找我吗?
有点甜
  5楼 | 信息 | 搜索 | 邮箱 | 主页 | UC


加好友 发短信
等级:版主 帖子:85326 积分:427815 威望:0 精华:5 注册:2012/10/18 22:13:00
  发帖心情 Post By:2017/11/6 17:57:00 [显示全部帖子]

以下是引用18523982317在2017/11/6 17:21:00的发言:
甜大大   我又有个需求,我要在企业名称的文本框里自动填入当前行的企业名称进去,然后查询,然后下载数据到本地

这个向网页填写数据,改怎么做

 

Dim web As New System.Windows.Forms.WebBrowser()
web.Navigate("http://cx.cqjsxx.com:8010/CX_SGQYMRPM2.aspx")
Do Until web.ReadyState = 4
    Application.DoEvents
Loop

 

Dim mc = web.Document.GetElementByID("txt_mc")
mc.setattribute("value", "重庆建工住宅建设有限公司")

 

Dim cx = web.Document.GetElementByID("btn_c")
cx.InvokeMember("click")

 


'msgbox(2)
'output.show(web.documenttext)


 回到顶部
帅哥哟,离线,有人找我吗?
有点甜
  6楼 | 信息 | 搜索 | 邮箱 | 主页 | UC


加好友 发短信
等级:版主 帖子:85326 积分:427815 威望:0 精华:5 注册:2012/10/18 22:13:00
  发帖心情 Post By:2017/11/7 8:58:00 [显示全部帖子]

 汗,你会不会测试?

 

Dim web As New System.Windows.Forms.WebBrowser()
web.Navigate("http://cx.cqjsxx.com:8010/CX_SGQYMRPM2.aspx")
Do Until web.ReadyState = 4
    Application.DoEvents
Loop

Dim mc = web.Document.GetElementByID("txt_mc")
mc.setattribute("value", "重庆建工住宅建设有限公司")

Dim temp As String = web.documenttext
Dim cx = web.Document.GetElementByID("btn_c")
cx.InvokeMember("click")

Do Until web.ReadyState = 4 AndAlso temp <> web.documentText
    Application.DoEvents
Loop

Dim elems As object = web.Document.GetElementById("GV_SGPM").GetElementsByTagName("tr")
For Each elem As object In elems
    Dim tdelems As object =  elem.GetElementsByTagName("td")
    If tdelems.count >= 10 Then
        output.show(tdelems(0).innertext & "  " & tdelems(1).innertext & "  " & tdelems(2).innertext)
    End If
Next


 回到顶部
帅哥哟,离线,有人找我吗?
有点甜
  7楼 | 信息 | 搜索 | 邮箱 | 主页 | UC


加好友 发短信
等级:版主 帖子:85326 积分:427815 威望:0 精华:5 注册:2012/10/18 22:13:00
  发帖心情 Post By:2017/11/7 10:22:00 [显示全部帖子]

代码的意义是,你模拟点击下一页,但页面不会马上加载完毕的,需要等候一段时间。

 

那就需要判断前后两个页面是否相同,如果相同,那就是没有跳转完毕。

 

你也可以这样写

 

Dim web As New System.Windows.Forms.WebBrowser()
web.Navigate("http://cx.cqjsxx.com:8010/CX_SGQYMRPM2.aspx")
Do Until web.ReadyState = 4
    Application.DoEvents
Loop

Dim pg As Integer = 0
Dim cp As Integer = 0
For Each sp As object In web.Document.GetElementsByTagName("span")
    If sp.id IsNot Nothing
        If sp.id.contains("LabelPageCount") Then
            pg = sp.InnerText.split(" ")(1)
        ElseIf sp.id.contains("LabelCurrentPage") Then
            cp = sp.InnerText.split(" ")(1)
        End If
    End If
Next
msgbox(pg & " " & cp)
For i As Integer = cp To pg-1
    Dim elems As object = web.Document.GetElementById("GV_SGPM").GetElementsByTagName("tr")
    For Each elem As object In elems
        Dim tdelems As object =  elem.GetElementsByTagName("td")
        If tdelems.count >= 10 Then
            output.show(tdelems(0).innertext & "  " & tdelems(1).innertext & "  " & tdelems(2).innertext)
        End If
    Next
   
    '调试
    If i = 3 Then '第三页后退出
        Exit For
    End If
   
    Dim btn = web.Document.GetElementById("GV_SGPM_ctl16_LinkButtonNextPage")
    btn.InvokeMember("click")
   
    Dim temp = web.documentText
    Do Until web.ReadyState = 4 AndAlso temp <> web.documentText
        Application.DoEvents
    Loop

Next


 回到顶部