想使用Spire.pdf提取PDF中的表格,遇到问题。用的是
7.11.1,net4的spire.pdf.dllspire网址:
https://www.e-iceblue.cn/table/extract-table-from-pdf-file-using-c.html1、VB.NET代码
以下内容为程序代码:
1 '原来的代码
2 Imports Spire.Pdf
3 Imports Spire.Pdf.Utilities
4 Imports System.IO
5 Imports System.Text
6
7 Namespace ExtractTable
8 Class Program
9 Private Shared Sub Main(args As String())
10 '实例化PdfDocument类的对象
11 Dim pdf As New PdfDocument()
12
13 '加载PDF文档
14 pdf.LoadFromFile("sample.pdf")
15
16 '创建StringBuilder类的对象
17 Dim builder As New StringBuilder()
18
19 '实例化PdfTableExtractor类的对象
20 Dim extractor As New PdfTableExtractor(pdf)
21
22 '声明PdfTable类的表格数组
23 Dim tableLists As PdfTable()
24
25 '遍历PDF页面
26 For pageIndex As Integer = 0 To pdf.Pages.Count - 1
27 '从页面提取表格
28 tableLists = extractor.ExtractTable(pageIndex)
29
30 '判断表格列表是否为空
31 If tableLists IsNot Nothing AndAlso tableLists.Length > 0 Then
32 '遍历表格
33 For Each table As PdfTable In tableLists
34 '获取表格中的行和列数
35 Dim row As Integer = table.GetRowCount()
36 Dim column As Integer = table.GetColumnCount()
37
38 '遍历表格行和列
39 For i As Integer = 0 To row - 1
40 For j As Integer = 0 To column - 1
41 '获取行和列中的文本
42 Dim text As String = table.GetText(i, j)
43
44 '写入文本到StringBuilder容器
45 builder.Append(text & Convert.ToString(" "))
46 Next
47 builder.Append(vbCr & vbLf)
48 Next
49 Next
50 End If
51 Next
52
53 '保存提取的表格内容为.txt文档
54 File.WriteAllText("ExtractedTable.txt", builder.ToString())
55 End Sub
56 End Class
57 End Namespace
2、自己修改后的代码
以下内容为程序代码:
1 '实例化PdfDocument类的对象
2 Dim pdf As New spire.pdf.PdfDocument()
3
4 '加载PDF文档
5 pdf.LoadFromFile("sample.pdf")
6
7 '创建StringBuilder类的对象
8 Dim builder As New StringBuilder()
9
10 '实例化PdfTableExtractor类的对象
11 'Dim extractor As New spire.pdf.PdfTableExtractor(pdf) '出错
12
13 '声明PdfTable类的表格数组
14 Dim tableLists As spire.pdf.Tables.PdfTable()
15
16 '遍历PDF页面
17 For pageIndex As Integer = 0 To pdf.Pages.Count - 1
18 '从页面提取表格
19 'tableLists = spire.pdf.Tables.extractor.ExtractTable(pageIndex) '出错
20
21 '判断表格列表是否为空
22 If tableLists IsNot Nothing AndAlso tableLists.Length > 0 Then
23 '遍历表格
24 For Each Table As spire.pdf.Tables.PdfTable In tableLists
25 '获取表格中的行和列数
26 'Dim Row As Integer = Table.GetRowCount() '出错
27 'Dim Column As Integer = Table.GetColumnCount() '出错
28
29 '遍历表格行和列
30 For i As Integer = 0 To Row - 1
31 For j As Integer = 0 To Column - 1
32 '获取行和列中的文本
33 Dim text As String = Table.GetText(i, j)
34
35 '写入文本到StringBuilder容器
36 builder.Append(text & Convert.ToString(" "))
37 Next
38 builder.Append(vbCr & vbLf)
39 Next
40 Next
41 End If
42 Next
43
44 '保存提取的表格内容为.txt文档
45 File.WriteAllText("ExtractedTable.txt", builder.ToString())
3、问题
以下内容为程序代码:
1 '实例化PdfDocument类的对象
2 Dim pdf As New spire.pdf.PdfDocument()
3
4 '加载PDF文档
5 pdf.LoadFromFile("sample.pdf")
6
7 '创建StringBuilder类的对象
8 Dim builder As New StringBuilder()
9
10 '实例化PdfTableExtractor类的对象
11 'Dim extractor As New spire.pdf.PdfTableExtractor(pdf) '出错
11行之前运行通过,之后的部分代码不能通过。
查看帮助文件,没有PdfTableExtractor,网站说明又有,不知道怎么回事。