‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)!
Initiator des Themas: Michael Beijer
Michael Beijer
Michael Beijer  Identity Verified
Vereinigtes Königreich
Local time: 02:46
Mitglied (2009)
Niederländisch > Englisch
+ ...
Jun 29, 2014

Do you often have a lot of incorrect line breaks in OCRd and/or converted texts?

You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools!

-> http://www.translatortools.net/word-unbreaker.html

I made a quick screencast showing how it works:

http://wordbook.nl/screencasts/Unbreaker-(new-tool-in-TransTools-to-remove-spurious-line-endings-from-e.g.-OCRd-documents).mp4


Unbreaker


Michael


 
Samuel Murray
Samuel Murray  Identity Verified
Niederlande
Local time: 03:46
Mitglied (2006)
Englisch > Afrikaans
+ ...
How does it compare to PlusTools? Jun 29, 2014

Michael Beijer wrote:
Do you often have a lot of incorrect line breaks in OCRd and/or converted texts? You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools!


That's nice. PlusTools has had a similar function for years. In PlusTools, go to Tools > C[o]nv[ert] > Recreate paragraphs in currend doc[ument]. How does the Unbreaker tool's results compare to PlusTools' results?


 
Michael Beijer
Michael Beijer  Identity Verified
Vereinigtes Königreich
Local time: 02:46
Mitglied (2009)
Niederländisch > Englisch
+ ...
THEMENSTARTER
some more info Jun 29, 2014

Hi Samuel,

I’m not really sure. I only used PlusTools once or twice, a long time ago. I wouldn’t even know where to download it anymore, whereas Stanislav (the developer of Unbreaker/TransTools) is extremely active and enthusiastic about his tool.

Unbreaker also has quite a few useful settings:


Wordbook.nl


+

Wordbook.nl


We are also discussing it over in the CafeTran mailing list:

https://groups.google.com/forum/?fromgroups=#!topic/cafetranslators/p3oMlbQeWaM

¬¬¬

TransTools is definitely a tool to keep an eye on. I think it has amazing potential.

I also just read the following in the ‘Translator Tools Newsletter’ this morning (for registered users of TransTools Professional Edition):


Wordbook.nl


+


Wordbook.nl


+


Wordbook.nl


Michael


 
Anton Konashenok
Anton Konashenok  Identity Verified
Tschechische Republik
Local time: 03:46
Französisch > Englisch
+ ...
OCR Jun 29, 2014

Speaking of extra line and page breaks in OCRed text, a good OCR program will have an option to remove them. For example, ABBYY Finereader does.

 
Michael Beijer
Michael Beijer  Identity Verified
Vereinigtes Königreich
Local time: 02:46
Mitglied (2009)
Niederländisch > Englisch
+ ...
THEMENSTARTER
ABBYY FineReader Jun 29, 2014

Hi Anton,

That’s interesting. Can you tell me how to do it in ABBYY FineReader 12?

Incidentally, the documents in my screenshots were just something I found online and was in the process of aligning.

Michael


 
Anton Konashenok
Anton Konashenok  Identity Verified
Tschechische Republik
Local time: 03:46
Französisch > Englisch
+ ...
Finereader Jul 7, 2014

Michael, all you need to do in Finereader is uncheck two boxes, "Keep page breaks" and "Keep line breaks" in the Options dialogue, Save tab.

 
Michael Beijer
Michael Beijer  Identity Verified
Vereinigtes Königreich
Local time: 02:46
Mitglied (2009)
Niederländisch > Englisch
+ ...
THEMENSTARTER
¬¬¬ Jul 7, 2014

Hi Anton,

I'll have to do a little testing. If I do this, will I then perhaps inadvertently remove valid page and line breaks. That is, ones that I would like to keep?

Michael


 
Chunyi Chen
Chunyi Chen
Vereinigte Staaten
Local time: 18:46
Englisch > Chinesisch
Thanks for sharing this information Jul 9, 2014

Hi Michael,

I purchased the TransTools Professional Edition ($20 for installing the program on up to three machines) after reading your user feedback here and visiting the TransTools website. So far I have only tried two or three features and am very happy with this program. I thought I would need to go through the readme files to know how to use the features, but it turns out the onscreen instructions were quite easy to follow. I highly recommend this tool to those who need to t
... See more
Hi Michael,

I purchased the TransTools Professional Edition ($20 for installing the program on up to three machines) after reading your user feedback here and visiting the TransTools website. So far I have only tried two or three features and am very happy with this program. I thought I would need to go through the readme files to know how to use the features, but it turns out the onscreen instructions were quite easy to follow. I highly recommend this tool to those who need to tweak the source files for better importing results in their CAT environment, and to those who still work with Trados files and want to see better which segments are new, which are fuzzy, etc.
This is another nice addition after my ApSIC Xbench investment!

Chun-yi
Collapse


 
Anton Konashenok
Anton Konashenok  Identity Verified
Tschechische Republik
Local time: 03:46
Französisch > Englisch
+ ...
"Valid" breaks Jul 9, 2014

Michael, if I remember correctly, line breaks between paragraphs (as opposed to lines within the same paragraph) will be retained anyway. As to page breaks, either you keep them, or you don't, I don't see how a page break may be "valid" or "invalid".

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)!







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »