如何提取PDF文件中表格数据到数据库中
[此贴子已经被作者于2023/9/15 19:11:12编辑过]
Dim reader As New iTextSharp.text.pdf.PdfReader("d:\test.pdf")
Dim n As Integer = reader.NumberOfPages
Dim str As String = ""
For i As Integer = 1 To n
Dim strategy As object = New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy()
Dim currentText As String = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(Reader, i, strategy)
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.[Default], Encoding.UTF8, Encoding.[Default].GetBytes(currentText)))
str &= currentText
Next
msgbox(str)
reader.Close()
此主题相关图片如下:1694773476332.jpg

此主题相关图片如下:rrr.jpg

[此贴子已经被作者于2023/9/15 18:29:49编辑过]
可能.net版本问题,要引用.net 4.0的iTextSharp.dll
Dim doc As org.apache.pdfbox.pdmodel.PDDocument = Nothing
Try
doc = org.apache.pdfbox.pdmodel.PDDocument.load("d:\AAA.pdf")
Dim pages = doc.getDocumentCatalog().getAllPages()
Dim pdfStripper = new org.apache.pdfbox.util.PDFTextStripper
Dim text = pdfStripper.getText(doc)
msgbox(text)
catch ex As exception
msgbox(ex.message)
Finally
If doc IsNot Nothing Then
doc.close()
End If
End Try
此主题相关图片如下:1697638434719.jpg

使用的第三方dll是.net 4.0的吗?复制到Foxtable安装目录了吗?
我用的是net4.0,
itextsharp.dll已复制了,但还是出错
我用下面代码是可以的
Dim reader As New iTextSharp.text.pdf.PdfReader("d:\AAA.pdf")
Dim n As Integer = reader.NumberOfPages
Dim str As String = ""
For i As Integer = 1 To n
Dim strategy As object = New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy()
Dim currentText As String = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(Reader, i, strategy)
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.[Default],
Encoding.UTF8,
Encoding.[Default].GetBytes(currentText)))
str &= currentText
Next
'msgbox(str)
Output.Show(str)
reader.Close()
但为什么用4楼时会出错???
[此贴子已经被作者于2023/10/18 23:39:18编辑过]
4楼是另外一个组件吧,和itextsharp没有任何关系