How to merge old and new versions of tab-delimited glossary (CAT Tools Technical Help)

Technical forums » CAT Tools Technical Help »
How to merge old and new versions of tab-delimited glossary
Track this topic

How to merge old and new versions of tab-delimited glossary

Thread poster: Samuel Murray

Samuel Murray

Netherlands
Local time: 08:22
Member (2006)
English to Afrikaans
+ ...

Nov 20, 2014

Hello everyone

I have an old glossary (tab delimited file, first column is the source text) that contains both official terms and terms that I have added. The client just sent an updated version of the list of official terms. The update contains all of the official terms (not just the ones that had changed). I need to merge the update with my old glossary so that if there are clashing entries, the existing entry is commented out (or removed) and (optionally) the entry from the updated glossary is inserted. Do you know of a way to do that?

To put it differently, I need to search my old glossary for any entries that also occur in the update (evaluated strictly on the source text column), and then delete or comment out those entries.

It would not be useful to evaluate whole entries, because my old glossary may contain additional comments in the entries that I have added in the mean time. The comparison would have to be done on the source text only.

Thanks
Samuel ▲ Collapse

Mark
Local time: 08:22
Italian to English

Hmm…

Nov 20, 2014

That sounds tricky. I wonder if the best thing might be to import them into some kind of terminology management system (I don't know whether or not there are usable free ones), let it do the work for you and than export it back to a tab-delimited file.

FarkasAndras

Local time: 08:22
English to Hungarian
+ ...

Filter dupes

Nov 20, 2014

The way I would do it is with a duplicate filter. Copy the two lists into a single file, with the newly received stuff first and the old amended list below it. Filter out duplicates (watching only the first field) and you're done.
It should be pretty easy to do in Excel. Alternatively, send me the files and paypal me a few € and I'll do it for you.

Michael Beijer

United Kingdom
Local time: 07:22
Member (2009)
Dutch to English
+ ...

How about...

Nov 20, 2014

Hmm. Maybe you could do it by converting the old and the new glossary to TMXs. OldGlossary.tmx and NewGlossary.tmx. Make sure all TUs in OldGlossary.tmx have the same time stamp, and that they are all earlier than the time stamps you give to the TUs of NewGlossary.tmx.

If you now merge these two TMXs into a single TMX, you should be able to clean it in e.g. The Heartsome TMX editor or CafeTran, and thus only the TUs with the latest timestamps will remain, thus effectively leaving you with only the updated entries from your glossaries.

Then convert this TMX back into a tab-del glossary.

Does this make any sense?

Not sure what to do with any extra fields or metadata in your tab-del glossaries. Maybe store them in a custom TMX property during the process so you don't lose them?

Michael ▲ Collapse

Michael Beijer

United Kingdom
Local time: 07:22
Member (2009)
Dutch to English
+ ...

or use ASAP Utilities

Nov 20, 2014

Another option is to see if ASAP Utilities can do it for you. It can do so many things, I suspect you might find a simple solution to your problem in one of its gazillion routines:

some_text

Michael

http://www.asap-utilities.com/

[Edited at 2014-11-20 14:53 GMT]

2nl (X)

Netherlands
Local time: 08:22

Use CafeTran

Nov 20, 2014

Samuel Murray wrote:

I need to merge the update with my old glossary so that if there are clashing entries, the existing entry is commented out (or removed) and (optionally) the entry from the updated glossary is inserted.

Copy the new glossary to the end of your old glossary.

In CafeTran choose Glossary > Merge alternative terms.

Delete from semicolon to end of line with a regular expression.

FarkasAndras

Local time: 08:22
English to Hungarian
+ ...

Other way around

Nov 20, 2014

2nl wrote:

Copy the new glossary to the end of your old glossary.

In CafeTran choose Glossary > Merge alternative terms.

Delete from semicolon to end of line with a regular expression.

I think you may have that the wrong way around. The new glossary is the one we want to conserve, so the new comes first (unless CT keeps the last occurrence of duplicates instead of the first one like most tools would). Also, if the 'Merge' function deletes alternative terms, then it's not the most aptly named operation... I would expect it to keep the 2nd target language term as a synonym. If it works as needed here, it looks like a relatively convenient solution.

I checked Excel and it looks like the duplicate filter doesn't work as required here.
It can still do the job of course (Excel can do pretty much anything if you know how), but it's a bit more complicated. This should work:
Copy two glossaries into same worksheet, new on top
Sort alphabetically, make sure entries from new glossary are above identical entries from old glossary
In F2, put something like =IF(A2=A1,"DUPE",""), copy formula to bottom of column F. Dupes should be marked in column F, check if correct.
Copy whole table to text editor, copy and paste to excel (this gets rid of the formula and converts F to normal text).
Sort by F, remove dupes.

It's a lot of steps but it's still quicker than installing and learning a new software tool, and it's a little more transparent and flexible. I.e. you have a better idea of what's going on and you can make sure it's doing what you want it to.

[Edited at 2014-11-20 13:43 GMT]

Dan Lucas

United Kingdom
Local time: 07:22
Member (2014)
Japanese to English

ASAP

Nov 20, 2014

Samuel Murray wrote:
Help

I could write you a few lines of R or Perl to do this but the problem would be running it at your end.

So I second Michael's suggestion of ASAP Utilities, in particular look at this page:
http://www.asap-utilities.com/blog/index.php/2005/09/29/how-to-delete-duplicates-and-leave-one-of-them/
Note that it gives you the choice to leave the first duplicate. With some judicious sorting you should be able to get the result you want.

Dan

2nl (X)

Netherlands
Local time: 08:22

Yes, CafeTran is that smart

Nov 21, 2014

FarkasAndras wrote:
The new glossary is the one we want to conserve, so the new comes first (unless CT keeps the last occurrence of duplicates instead of the first one like most tools would). Also, if the 'Merge' function deletes alternative terms, then it's not the most aptly named operation...

Actually, new entries are added to the end of a text file, so it makes absolutely sense that CafeTran puts entries with a higher number directly after the tab character. CafeTran doesn't delete any older entries (lines with a lower number) that are unique (duplicates are removed).

You can remove older entries (alternative target terms) manually via Find and Replace (with regular expressions) or in Excel (by replacing the semicolon with a tab first, then delete the columns that you don't need).

This is an example glossary:

And this is how CafeTran does optimise the glossary:

http://www.screencast.com/t/gMnL7eDpkPii

More info: http://cafetran.wikidot.com/optimising-your-glossaries

[Edited at 2014-11-21 07:42 GMT]

MikeTrans
Germany
Local time: 08:22
Italian to German
+ ...

MemoQ...

Nov 23, 2014

Hi Sammy,
I know that MemoQ has a new feature where you can merge or delete duplicate terms (the same goes with TM entries).

After listing the duplicates you can then mark terms for merging or deletion taking your "Master" termbase into account, in your case the termbase containing your Client Terms.
I will try to add a screenshot here, but please be patient or tollerant because I don't do this often

Greets,
Mike ▲ Collapse

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie	[Call to this topic]
Peter Zauner	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

How to merge old and new versions of tab-delimited glossary

Translation news related to CAT tools

» Memsource Sells to Carlyle: The Inside Story
(0 comments)
» memoQ 9.4: Turbo-Charging Productivity
(0 comments)
» The Future Of Work Now: The Computer-Assisted Translator And Lilt
(0 comments)

Submit translation news about CAT tools »
Read more translation news »

Forum rules

Help and orientation

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Wordfast Pro
Translation Memory Software for Any Platform Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value Buy now! »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

How to merge old and new versions of tab-delimited glossary

How to merge old and new versions of tab-delimited glossary

You have native languages that can be verified

Your current localization setting

Select a language