Convert pdf (bit maps) text into word doc
Thread poster: Lou Sanz
Lou Sanz
Lou Sanz  Identity Verified
Spain
Local time: 12:37
English to Spanish
+ ...
Sep 26, 2008

I would appreciate your advice on this matter:

I am to convert a pdf document into word format I can edit, so that I can count the words in the file.
Problem is the text in the pdf document is identified as a bit maps image, and programmes like Solid converter pdf2word won't convert the file as 'there is no text' in such file.
Can anyone tell me wether there is some other programme I can use?
Thanks a lot!
Lou


 
Uldis Liepkalns
Uldis Liepkalns  Identity Verified
Latvia
Local time: 13:37
Member (2003)
English to Latvian
+ ...
You have to use Finereader Sep 26, 2008

or like program, capable recognising image as a text. I can help you with the concrete file, if you'd email it to me- uldis_at_tulko.lv

Uldis


 
Elizabeth Medina
Elizabeth Medina  Identity Verified
Local time: 06:37
Spanish to English
+ ...
FineReader - Can it Edit out Stamps and Pen Markings? Sep 26, 2008

Hi fellow ProZians,

I bought FineReader Version 9 a few months ago but I find the User's Manual absolutely arcane. I haven't had time to hunt them down and oblige them to help me figure out how to use it.

When I convert a PDF formatted legal document with stamps and notations in ink, it's just a disaster. I believe FineReader is supposed to let you edit out the marks and stamps that confuse the program and result in gobbledy-gook. But how? Beats me!!!

I
... See more
Hi fellow ProZians,

I bought FineReader Version 9 a few months ago but I find the User's Manual absolutely arcane. I haven't had time to hunt them down and oblige them to help me figure out how to use it.

When I convert a PDF formatted legal document with stamps and notations in ink, it's just a disaster. I believe FineReader is supposed to let you edit out the marks and stamps that confuse the program and result in gobbledy-gook. But how? Beats me!!!

If anyone can point me in the right direction, even just a teensy-weensy bit for starters, it would be wonderful, I would be extremely grateful.

Best regards,
Elizabeth
Collapse


 
Anna Villegas
Anna Villegas
Mexico
Local time: 04:37
English to Spanish
To my Chilean and Spanish colleagues Sep 27, 2008

Please have a look to this:

You have your own OCR on your PC


This should help.


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 07:37
English to Spanish
Also in Office 2007 Sep 27, 2008

Tadzio Carvallo wrote:

Please have a look to this:

You have your own OCR on your PC


Office 2007 also has "Document Imaging" in the Tools folder of the Start menu.

Regards,
Rodolfo


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 12:37
Member (2006)
English to Afrikaans
+ ...
Change your counting method Sep 27, 2008

Lou Harvey wrote:
Problem is the text in the pdf document is identified as a bit maps image, and programmes like Solid converter pdf2word won't convert the file as 'there is no text' in such file. Can anyone tell me wether there is some other programme I can use?


You need an OCR program, but let me warn you that even the OCR'ed document may need checking and correcting. An OCR program may add more spaces or more "words" of single characters where it failed to recognise the text, and this will result in an incorrect word count. For PDF files like these, I suggest you do a spot check manual word count (eg print every 5th or 10th page, count the number of words in the longest lines, get an average per line, multiply by the number of lines, and add a little bit just in case).


 
Lou Sanz
Lou Sanz  Identity Verified
Spain
Local time: 12:37
English to Spanish
+ ...
TOPIC STARTER
Problems with FineReader Sep 27, 2008

Thanks a lot for your help, you were all most helpful.
Finereader recognised the text and created a word document keeping the original layout, which I was more than glad to see. However, the words at the end of each line do not appear in the converted file. It seems like there's something wrong with the settings/margins which leaves such words out and that I cannot fix.
Thanks again for your help!
Lou


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Convert pdf (bit maps) text into word doc






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »