Recover text from PDF file when normal methods fail

I have a few hundred PDF files from which I need to extract sections of text. For many, pdftotext works fine, but for others, it misses large sections of text. If I open the PDF in Acrobat and select that text by hand and copy/paste into emacs and then view the file without an encoding, I get stuff like this:

 Husband \364\200\200\272\364\200\201\213\364 etc.

How can I extract the text correctly?

I should mention that I've tried saving as text from Acrobat; also tried applying Acrobat's Document=>OCR feature before copying.

Answers


Why not convert the PDF to doc or txt first? See the guide: http://www.aolor.com/pdf-converter/user-guide.html


Need Your Help

(Android) Overlay one audio to another audio

android audio ffmpeg android-audiomanager

suppose that I have a audio file(10s), and I also have another one(2s). What I want is to overlay the second one to the first one, for example: first file(10s): 1111111111 (where "1" stands for 1se...

You have requested a non-existent service "knp_paginator".

php symfony symfony-forms

I have created savvy contact form bundle which is working fine but I want to display all contact table data with pagination using knp-pagination bundle but it is not working so please help me to so...