Which is the best software for extraction of term candicates from Word documents and PDFs?
Thread poster: Fredrik Pettersson
Fredrik Pettersson
Fredrik Pettersson  Identity Verified
Hong Kong
Local time: 08:04
Member (2009)
English to Swedish
+ ...
Sep 2, 2015

I have a license for TshwaneLex but haven't used it yet. I'm not sure though I can use TshwaneLex for term extraction from documents.

SDL Multiterm Extract seems to work best with German language only.

What I'm looking for is an easy-to-use term extraction software that I can extract terms from Word documents and PDF-files based on different criteria I set (such as number of occurences in the text etc.).

Or maybe Toolbox from SIL could work:

<
... See more
I have a license for TshwaneLex but haven't used it yet. I'm not sure though I can use TshwaneLex for term extraction from documents.

SDL Multiterm Extract seems to work best with German language only.

What I'm looking for is an easy-to-use term extraction software that I can extract terms from Word documents and PDF-files based on different criteria I set (such as number of occurences in the text etc.).

Or maybe Toolbox from SIL could work:

http://www.linguistics.ucsb.edu/faculty/infield/courses/resources/Lex_H2.pdf

Other alternatives would be Terminotix or SynchroTerm.

I found a posting here at ProZ also, but it seemed more about terminology management:

http://www.proz.com/forum/cat_tools_technical_help/83657-best_terminology_management_termbase_software.html
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 07:04
Member (2009)
Dutch to English
+ ...
SynchroTerm + TerMine Sep 2, 2015

In my experience, the two best are:

(1) the free, online TerMine (http://www.nactem.ac.uk/software/termine/ ), and
(2) the paid, desktop Synchroterm.

SynchroTerm is probably the best term extractor currently on the market, bar none. It also has all kinds of cool tricks, such as the ability to indicate which file (from a group of files) a term derives from
... See more
In my experience, the two best are:

(1) the free, online TerMine (http://www.nactem.ac.uk/software/termine/ ), and
(2) the paid, desktop Synchroterm.

SynchroTerm is probably the best term extractor currently on the market, bar none. It also has all kinds of cool tricks, such as the ability to indicate which file (from a group of files) a term derives from, include a snippet of context, etc. etc. etc. It's what I use for large, corporate term extraction and glossary creation jobs.

Michael

PS: TAAS also has a good term extractor: https://term.tilde.com/projects

[Edited at 2015-09-02 19:39 GMT]
Collapse


 
Fredrik Pettersson
Fredrik Pettersson  Identity Verified
Hong Kong
Local time: 08:04
Member (2009)
English to Swedish
+ ...
TOPIC STARTER
Can SynchroTerm extract only source terms from monolingual documents? Sep 2, 2015

Thanks, SynchroTerm seems to be the best alternative.

I looked in the fact sheet for SynchroTerm now and watched a video, but I can't see if it's possible to extract only source terms and leave the target term translations empty. So I can fill in the translations before and during the translation phase.


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 07:04
Member (2009)
Dutch to English
+ ...
@Fredrik: Sep 2, 2015

Fredrik Pettersson wrote:

Thanks, SynchroTerm seems to be the best alternative.

I looked in the fact sheet for SynchroTerm now and watched a video, but I can't see if it's possible to extract only source terms and leave the target term translations empty. So I can fill in the translations before and during the translation phase.


Not entirely sure what you mean. Assuming you have a monolingual document, and want to extract terms, you open the doc(s) in SynchroTerm, and extract terms. You can then save this list of terms in various formats (Excel, tabbed txt, html, etc.), and later add translations if desired.

I made a quick screencast to show how it works (no sound due to sleeping baby in next room):

https://www.youtube.com/watch?v=9zmHGhZvyb4

PS: SynchroTerm can perform both monolingual extraction and bilingual extraction.


[Edited at 2015-09-02 23:42 GMT]


 
Luca Tutino
Luca Tutino  Identity Verified
Italy
Member (2002)
English to Italian
+ ...
Bravo! Sep 4, 2015

Michael Beijer wrote:

I made a quick screencast to show how it works (no sound due to sleeping baby in next room):



Great stuff (the first minute of the track too) - and thanks!


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 07:04
Member (2009)
Dutch to English
+ ...
:) Sep 4, 2015

Luca Tutino wrote:

Michael Beijer wrote:

I made a quick screencast to show how it works (no sound due to sleeping baby in next room):



Great stuff (the first minute of the track too) - and thanks!


Ha ha, thanks Luca! The soundtrack's actually a very old song of mine ("Casio-Beat-2-(extended-version)-Michael-Beijer.mp3"), made on a little plastic Casio keyboard


 
Meta Arkadia
Meta Arkadia
Local time: 13:04
English to Indonesian
+ ...
Mono Sep 4, 2015

Fredrik Pettersson wrote:
[is it] possible to extract only source terms and leave the target term translations empty. So I can fill in the translations before and during the translation phase.


Most CAT tools can do that, but for more advanced purposes, I use AntConc. It can a lot of things, including extract terms (and exclude stopwords or words already in your termbase) and phrases (n-grams.

AntConc is free and cross-platform.

Cheers,

Hans


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Which is the best software for extraction of term candicates from Word documents and PDFs?







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »