‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)! Initiator des Themas: Michael Beijer
|
Michael Beijer Vereinigtes Königreich Local time: 02:46 Mitglied (2009) Niederländisch > Englisch + ... |
Samuel Murray Niederlande Local time: 03:46 Mitglied (2006) Englisch > Afrikaans + ... How does it compare to PlusTools? | Jun 29, 2014 |
Michael Beijer wrote: Do you often have a lot of incorrect line breaks in OCRd and/or converted texts? You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools! That's nice. PlusTools has had a similar function for years. In PlusTools, go to Tools > C[o]nv[ert] > Recreate paragraphs in currend doc[ument]. How does the Unbreaker tool's results compare to PlusTools' results? | | |
Michael Beijer Vereinigtes Königreich Local time: 02:46 Mitglied (2009) Niederländisch > Englisch + ... THEMENSTARTER |
Anton Konashenok Tschechische Republik Local time: 03:46 Französisch > Englisch + ...
Speaking of extra line and page breaks in OCRed text, a good OCR program will have an option to remove them. For example, ABBYY Finereader does. | |
|
|
Michael Beijer Vereinigtes Königreich Local time: 02:46 Mitglied (2009) Niederländisch > Englisch + ... THEMENSTARTER ABBYY FineReader | Jun 29, 2014 |
Hi Anton, That’s interesting. Can you tell me how to do it in ABBYY FineReader 12? Incidentally, the documents in my screenshots were just something I found online and was in the process of aligning. Michael | | |
Anton Konashenok Tschechische Republik Local time: 03:46 Französisch > Englisch + ...
Michael, all you need to do in Finereader is uncheck two boxes, "Keep page breaks" and "Keep line breaks" in the Options dialogue, Save tab. | | |
Michael Beijer Vereinigtes Königreich Local time: 02:46 Mitglied (2009) Niederländisch > Englisch + ... THEMENSTARTER
Hi Anton, I'll have to do a little testing. If I do this, will I then perhaps inadvertently remove valid page and line breaks. That is, ones that I would like to keep? Michael | | |
Chunyi Chen Vereinigte Staaten Local time: 18:46 Englisch > Chinesisch Thanks for sharing this information | Jul 9, 2014 |
Hi Michael, I purchased the TransTools Professional Edition ($20 for installing the program on up to three machines) after reading your user feedback here and visiting the TransTools website. So far I have only tried two or three features and am very happy with this program. I thought I would need to go through the readme files to know how to use the features, but it turns out the onscreen instructions were quite easy to follow. I highly recommend this tool to those who need to t... See more Hi Michael, I purchased the TransTools Professional Edition ($20 for installing the program on up to three machines) after reading your user feedback here and visiting the TransTools website. So far I have only tried two or three features and am very happy with this program. I thought I would need to go through the readme files to know how to use the features, but it turns out the onscreen instructions were quite easy to follow. I highly recommend this tool to those who need to tweak the source files for better importing results in their CAT environment, and to those who still work with Trados files and want to see better which segments are new, which are fuzzy, etc. This is another nice addition after my ApSIC Xbench investment! Chun-yi ▲ Collapse | |
|
|
Anton Konashenok Tschechische Republik Local time: 03:46 Französisch > Englisch + ... "Valid" breaks | Jul 9, 2014 |
Michael, if I remember correctly, line breaks between paragraphs (as opposed to lines within the same paragraph) will be retained anyway. As to page breaks, either you keep them, or you don't, I don't see how a page break may be "valid" or "invalid". | | |