OCR to DTP sotware Initiator des Themas: Jaroslaw Michalak
| Jaroslaw Michalak Polen Local time: 09:36 Mitglied (2004) Englisch > Polnisch SITE LOCALIZER
Is there an OCR software that allows to save in one of popular DTP software formats?
In particular, the following features would be nice:
- separation of graphics to externally linked files
- control over text flows
- designation of special text areas (headers, running heads, page numbers)
I know it is possible to have that functionality by combining several techniques (saving to HTML for graphics, rereading of frames for flows etc.), but it takes a l... See more Is there an OCR software that allows to save in one of popular DTP software formats?
In particular, the following features would be nice:
- separation of graphics to externally linked files
- control over text flows
- designation of special text areas (headers, running heads, page numbers)
I know it is possible to have that functionality by combining several techniques (saving to HTML for graphics, rereading of frames for flows etc.), but it takes a lot of time... ▲ Collapse | | | Doru Voin Rumänien Local time: 10:36 Englisch > Rumänisch + ... There are some | Jan 6, 2005 |
Jabberwock wrote:
Is there an OCR software that allows to save in one of popular DTP software formats?
Try ScanSoft Omnipage or Abby FineReader. For more info, search the Proz.com website, the issue has been discussed at least several times before.
Regards,
Doru Voin | | | Jaroslaw Michalak Polen Local time: 09:36 Mitglied (2004) Englisch > Polnisch THEMENSTARTER SITE LOCALIZER I did some searching... | Jan 6, 2005 |
I did some searching on the subject, but with no results. I would be grateful for any pointers on such discussions.
FineReader does not allow to save to Corel Ventura, PageMaker, Quark or any other popular formats. It also does not allow to separate graphics (except in HTML), it does not provide for frame flowing control (or I have not found that feature).
I don't know about OmniPage, as it has no demo. However, the list of the features does not indicate it can perform ... See more I did some searching on the subject, but with no results. I would be grateful for any pointers on such discussions.
FineReader does not allow to save to Corel Ventura, PageMaker, Quark or any other popular formats. It also does not allow to separate graphics (except in HTML), it does not provide for frame flowing control (or I have not found that feature).
I don't know about OmniPage, as it has no demo. However, the list of the features does not indicate it can perform things I expect. ▲ Collapse | | | PAS Local time: 09:36 Polnisch > Englisch + ... Omnipage - maybe | Jan 6, 2005 |
I had a chance to use Omnipage 12 for a while and it can save OCR'd documents as Framemaker (MIF), Ventura Publisher (DOC) and Pagemaker (DOC) files.
Yes, I know - most people do not consider these true DTP software.
However: I never tried it, don't know how well it does the job.
Good luck
Pawel Skalinski
[Edited at 2005-01-06 14:34] | |
|
|
Ken Cox Local time: 09:36 Deutsch > Englisch + ...
I suspect that there's not much demand for this in the professional DTP world, so it's unlikely that a commercial product is available that can do what you want.
A possible DIY solution that would do at least part of what you want would be to export the OCR document in RTF and use a tool to convert the RTF document to tagged text for input to XPress or InDesign. If you are a good programmer (or can find someone who is), it should be possible to make such a tool, although I don't think it's... See more I suspect that there's not much demand for this in the professional DTP world, so it's unlikely that a commercial product is available that can do what you want.
A possible DIY solution that would do at least part of what you want would be to export the OCR document in RTF and use a tool to convert the RTF document to tagged text for input to XPress or InDesign. If you are a good programmer (or can find someone who is), it should be possible to make such a tool, although I don't think it's a trivial task. You could also try looking for shareware that can do this (other people may have had the same idea).
Another possibility would be to export the OCR document as PDF and use a tool to convert it directly to XPress or InDesign format. As PDF is becoming a popular output/transfer format in the DTP world, there may be commercial products available that can do this, but they would be expensive. ▲ Collapse | | | Jaroslaw Michalak Polen Local time: 09:36 Mitglied (2004) Englisch > Polnisch THEMENSTARTER SITE LOCALIZER Thanks for the suggestions! | Jan 8, 2005 |
I will keep searching for the solution...
The problem is that the OCR software should be specifically designed with extracting the needed information in mind. The exact format conversion is of secondary importance.
I have thought of a simple example that illustrates my needs: a document which has two columns in two languages on each pages. In DTP it would be quite natural to define the two text flows for each language, so the text might be edited and formatted independe... See more I will keep searching for the solution...
The problem is that the OCR software should be specifically designed with extracting the needed information in mind. The exact format conversion is of secondary importance.
I have thought of a simple example that illustrates my needs: a document which has two columns in two languages on each pages. In DTP it would be quite natural to define the two text flows for each language, so the text might be edited and formatted independently. Getting that from a paper text is quite difficult: the workaround would be to recognize the first column, save it and then recognize the other column. Then any additional frames, tables or pictures might be extracted.
But maybe it is true that such features are not really in demand... DTP usually has the original source files at hand. ▲ Collapse | | | Doru Voin Rumänien Local time: 10:36 Englisch > Rumänisch + ...
Jabberwock wrote:
I have thought of a simple example that illustrates my needs: a document which has two columns in two languages on each pages. In DTP it would be quite natural to define the two text flows for each language, so the text might be edited and formatted independently. Getting that from a paper text is quite difficult: the workaround would be to recognize the first column, save it and then recognize the other column. Then any additional frames, tables or pictures might be extracted.
I suggest you give a try to Omnipage Pro from Scansoft. With this tool, currently at version 14, you can for instance define zones and extract the correspondin text in separate documents or in the same document, you can setup automatic flows etc.
Regards,
Doru Voin | | | Roberta Anderson Italien Local time: 09:36 Mitglied (2001) Englisch > Italienisch + ... my Acrobat approach | Jan 14, 2005 |
I use Acrobat a lot, and I would use it (I have the Professional version, I do not know to what extent this would be possible with the cheaper Standard version) to do what you describe in this way:
1. Open the scanned document and use Paper Capture (Acrobat's OCR feature, included in Acrobat 6 as a standard command, available as a free plug-in for Acrobat 5)) to convert from bitmap to text.
2. Use the Article tool to define the different text flows/threads (easy - just ... See more I use Acrobat a lot, and I would use it (I have the Professional version, I do not know to what extent this would be possible with the cheaper Standard version) to do what you describe in this way:
1. Open the scanned document and use Paper Capture (Acrobat's OCR feature, included in Acrobat 6 as a standard command, available as a free plug-in for Acrobat 5)) to convert from bitmap to text.
2. Use the Article tool to define the different text flows/threads (easy - just drag boxes around the text in the "reading" sequence). In your case, 1 article that covers one language over the various pages, then a second article to cover the second language.
3. Use Iceni Gemini (export plug-in, not included in Acrobat; there may be other similar plug-ins too) to export the separate articles.
4. Use Acrobat's image extraction feature to extract the images.
But I'm sure there are other ways too
cheers,
Roberta ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » OCR to DTP sotware Pastey |
---|
Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
| Trados Business Manager Lite |
---|
Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |