Recover text from PDF file when normal methods fail
I have a few hundred PDF files from which I need to extract sections of text. For many, pdftotext works fine, but for others, it misses large sections of text. If I open the PDF in Acrobat and select that text by hand and copy/paste into emacs and then view the file without an encoding, I get stuff like this:
Husband \364\200\200\272\364\200\201\213\364 etc.
How can I extract the text correctly?
I should mention that I've tried saving as text from Acrobat; also tried applying Acrobat's Document=>OCR feature before copying.
Why not convert the PDF to doc or txt first? See the guide: http://www.aolor.com/pdf-converter/user-guide.html