PDFs are notoriously difficult to scrape. This program converts them to *.txt or *.html formats. The program has tested for Latin alphabets and Japanese.
-
Updated
Jun 11, 2019 - CSS
PDFs are notoriously difficult to scrape. This program converts them to *.txt or *.html formats. The program has tested for Latin alphabets and Japanese.
Add a description, image, and links to the pdfminer topic page so that developers can more easily learn about it.
To associate your repository with the pdfminer topic, visit your repo's landing page and select "manage topics."