以文本方式查看主题

-  Foxtable(狐表)  (http://foxtable.com/bbs/index.asp)
--  专家坐堂  (http://foxtable.com/bbs/list.asp?boardid=2)
----  [求助]spire.pdf如何提取 PDF 中的表格  (http://foxtable.com/bbs/dispbbs.asp?boardid=2&id=173537)

--  作者:witkeylaw
--  发布时间:2021/12/4 9:28:00
--  [求助]spire.pdf如何提取 PDF 中的表格
想使用Spire.pdf提取PDF中的表格,遇到问题。用的是7.11.1,net4的spire.pdf.dll
spire网址:https://www.e-iceblue.cn/table/extract-table-from-pdf-file-using-c.html
1、VB.NET代码
以下内容为程序代码:

1 \'原来的代码
2 Imports Spire.Pdf
3 Imports Spire.Pdf.Utilities
4 Imports System.IO
5 Imports System.Text
6
7 Namespace ExtractTable
8     Class Program
9         Private Shared Sub Main(args As String())
10             \'实例化PdfDocument类的对象
11             Dim pdf As New PdfDocument()
12
13             \'加载PDF文档
14             pdf.LoadFromFile("sample.pdf")
15
16             \'创建StringBuilder类的对象
17             Dim builder As New StringBuilder()
18
19             \'实例化PdfTableExtractor类的对象
20             Dim extractor As New PdfTableExtractor(pdf)
21
22             \'声明PdfTable类的表格数组
23             Dim tableLists As PdfTable()
24
25             \'遍历PDF页面
26             For pageIndex As Integer = 0 To pdf.Pages.Count - 1
27                 \'从页面提取表格
28                 tableLists = extractor.ExtractTable(pageIndex)
29
30                 \'判断表格列表是否为空
31                 If tableLists IsNot Nothing AndAlso tableLists.Length > 0 Then
32                     \'遍历表格
33                     For Each table As PdfTable In tableLists
34                         \'获取表格中的行和列数
35                         Dim row As Integer = table.GetRowCount()
36                         Dim column As Integer = table.GetColumnCount()
37
38                         \'遍历表格行和列
39                         For i As Integer = 0 To row - 1
40                             For j As Integer = 0 To column - 1
41                                 \'获取行和列中的文本
42                                 Dim text As String = table.GetText(i, j)
43
44                                 \'写入文本到StringBuilder容器
45                                 builder.Append(text & Convert.ToString(" "))
46                             Next
47                             builder.Append(vbCr & vbLf)
48                         Next
49                     Next
50                 End If
51             Next
52
53             \'保存提取的表格内容为.txt文档
54             File.WriteAllText("ExtractedTable.txt", builder.ToString())
55         End Sub
56     End Class
57 End Namespace


2、自己修改后的代码
以下内容为程序代码:

1 \'实例化PdfDocument类的对象
2 Dim pdf As New spire.pdf.PdfDocument()
3
4 \'加载PDF文档
5 pdf.LoadFromFile("sample.pdf")
6
7 \'创建StringBuilder类的对象
8 Dim builder As New StringBuilder()
9
10 \'实例化PdfTableExtractor类的对象
11 \'Dim extractor As New spire.pdf.PdfTableExtractor(pdf) \'出错
12
13 \'声明PdfTable类的表格数组
14 Dim tableLists As spire.pdf.Tables.PdfTable()
15
16 \'遍历PDF页面
17 For pageIndex As Integer = 0 To pdf.Pages.Count - 1
18 \'从页面提取表格
19 \'tableLists = spire.pdf.Tables.extractor.ExtractTable(pageIndex) \'出错
20
21 \'判断表格列表是否为空
22 If tableLists IsNot Nothing AndAlso tableLists.Length > 0 Then
23 \'遍历表格
24 For Each Table As spire.pdf.Tables.PdfTable In tableLists
25 \'获取表格中的行和列数
26 \'Dim Row As Integer = Table.GetRowCount() \'出错
27 \'Dim Column As Integer = Table.GetColumnCount() \'出错
28
29 \'遍历表格行和列
30 For i As Integer = 0 To Row - 1
31 For j As Integer = 0 To Column - 1
32 \'获取行和列中的文本
33 Dim text As String = Table.GetText(i, j)
34
35 \'写入文本到StringBuilder容器
36 builder.Append(text & Convert.ToString(" "))
37 Next
38 builder.Append(vbCr & vbLf)
39 Next
40 Next
41 End If
42 Next
43
44 \'保存提取的表格内容为.txt文档
45 File.WriteAllText("ExtractedTable.txt", builder.ToString())


3、问题
以下内容为程序代码:

1 \'实例化PdfDocument类的对象
2 Dim pdf As New spire.pdf.PdfDocument()
3
4 \'加载PDF文档
5 pdf.LoadFromFile("sample.pdf")
6
7 \'创建StringBuilder类的对象
8 Dim builder As New StringBuilder()
9
10 \'实例化PdfTableExtractor类的对象
11 \'Dim extractor As New spire.pdf.PdfTableExtractor(pdf) \'出错


11行之前运行通过,之后的部分代码不能通过。
查看帮助文件,没有PdfTableExtractor,网站说明又有,不知道怎么回事。

--  作者:有点蓝
--  发布时间:2021/12/4 9:37:00
--  
组件的使用问题请去咨询对方客服。我们没有办法做支持的