How to merge old and new versions of tab-delimited glossary Thread poster: Samuel Murray
| Samuel Murray Netherlands Local time: 08:22 Member (2006) English to Afrikaans + ...
Hello everyone I have an old glossary (tab delimited file, first column is the source text) that contains both official terms and terms that I have added. The client just sent an updated version of the list of official terms. The update contains all of the official terms (not just the ones that had changed). I need to merge the update with my old glossary so that if there are clashing entries, the existing entry is commented out (or removed) and (optionally) the entry from the up... See more Hello everyone I have an old glossary (tab delimited file, first column is the source text) that contains both official terms and terms that I have added. The client just sent an updated version of the list of official terms. The update contains all of the official terms (not just the ones that had changed). I need to merge the update with my old glossary so that if there are clashing entries, the existing entry is commented out (or removed) and (optionally) the entry from the updated glossary is inserted. Do you know of a way to do that? To put it differently, I need to search my old glossary for any entries that also occur in the update (evaluated strictly on the source text column), and then delete or comment out those entries. It would not be useful to evaluate whole entries, because my old glossary may contain additional comments in the entries that I have added in the mean time. The comparison would have to be done on the source text only. Thanks Samuel ▲ Collapse | | | Mark Local time: 08:22 Italian to English
That sounds tricky. I wonder if the best thing might be to import them into some kind of terminology management system (I don't know whether or not there are usable free ones), let it do the work for you and than export it back to a tab-delimited file. | | | Filter dupes | Nov 20, 2014 |
The way I would do it is with a duplicate filter. Copy the two lists into a single file, with the newly received stuff first and the old amended list below it. Filter out duplicates (watching only the first field) and you're done. It should be pretty easy to do in Excel. Alternatively, send me the files and paypal me a few € and I'll do it for you. | | | Michael Beijer United Kingdom Local time: 07:22 Member (2009) Dutch to English + ... How about... | Nov 20, 2014 |
Hmm. Maybe you could do it by converting the old and the new glossary to TMXs. OldGlossary.tmx and NewGlossary.tmx. Make sure all TUs in OldGlossary.tmx have the same time stamp, and that they are all earlier than the time stamps you give to the TUs of NewGlossary.tmx. If you now merge these two TMXs into a single TMX, you should be able to clean it in e.g. The Heartsome TMX editor or CafeTran, and thus only the TUs with the latest timestamps will remain, thus effectively leaving yo... See more Hmm. Maybe you could do it by converting the old and the new glossary to TMXs. OldGlossary.tmx and NewGlossary.tmx. Make sure all TUs in OldGlossary.tmx have the same time stamp, and that they are all earlier than the time stamps you give to the TUs of NewGlossary.tmx. If you now merge these two TMXs into a single TMX, you should be able to clean it in e.g. The Heartsome TMX editor or CafeTran, and thus only the TUs with the latest timestamps will remain, thus effectively leaving you with only the updated entries from your glossaries. Then convert this TMX back into a tab-del glossary. Does this make any sense? Not sure what to do with any extra fields or metadata in your tab-del glossaries. Maybe store them in a custom TMX property during the process so you don't lose them? Michael ▲ Collapse | |
|
|
Michael Beijer United Kingdom Local time: 07:22 Member (2009) Dutch to English + ... or use ASAP Utilities | Nov 20, 2014 |
Another option is to see if ASAP Utilities can do it for you. It can do so many things, I suspect you might find a simple solution to your problem in one of its gazillion routines: Michael http://www.asap-utilities.com/
[Edited at 2014-11-20 14:53 GMT] | | | 2nl (X) Netherlands Local time: 08:22 Use CafeTran | Nov 20, 2014 |
Samuel Murray wrote: I need to merge the update with my old glossary so that if there are clashing entries, the existing entry is commented out (or removed) and (optionally) the entry from the updated glossary is inserted. Copy the new glossary to the end of your old glossary. In CafeTran choose Glossary > Merge alternative terms. Delete from semicolon to end of line with a regular expression. | | | Other way around | Nov 20, 2014 |
2nl wrote: Copy the new glossary to the end of your old glossary. In CafeTran choose Glossary > Merge alternative terms. Delete from semicolon to end of line with a regular expression. I think you may have that the wrong way around. The new glossary is the one we want to conserve, so the new comes first (unless CT keeps the last occurrence of duplicates instead of the first one like most tools would). Also, if the 'Merge' function deletes alternative terms, then it's not the most aptly named operation... I would expect it to keep the 2nd target language term as a synonym. If it works as needed here, it looks like a relatively convenient solution. I checked Excel and it looks like the duplicate filter doesn't work as required here. It can still do the job of course (Excel can do pretty much anything if you know how), but it's a bit more complicated. This should work: Copy two glossaries into same worksheet, new on top Sort alphabetically, make sure entries from new glossary are above identical entries from old glossary In F2, put something like =IF(A2=A1,"DUPE",""), copy formula to bottom of column F. Dupes should be marked in column F, check if correct. Copy whole table to text editor, copy and paste to excel (this gets rid of the formula and converts F to normal text). Sort by F, remove dupes. It's a lot of steps but it's still quicker than installing and learning a new software tool, and it's a little more transparent and flexible. I.e. you have a better idea of what's going on and you can make sure it's doing what you want it to.
[Edited at 2014-11-20 13:43 GMT] | | | Dan Lucas United Kingdom Local time: 07:22 Member (2014) Japanese to English
|
|
2nl (X) Netherlands Local time: 08:22 Yes, CafeTran is that smart | Nov 21, 2014 |
FarkasAndras wrote: The new glossary is the one we want to conserve, so the new comes first (unless CT keeps the last occurrence of duplicates instead of the first one like most tools would). Also, if the 'Merge' function deletes alternative terms, then it's not the most aptly named operation... Actually, new entries are added to the end of a text file, so it makes absolutely sense that CafeTran puts entries with a higher number directly after the tab character. CafeTran doesn't delete any older entries (lines with a lower number) that are unique (duplicates are removed). You can remove older entries (alternative target terms) manually via Find and Replace (with regular expressions) or in Excel (by replacing the semicolon with a tab first, then delete the columns that you don't need). This is an example glossary: And this is how CafeTran does optimise the glossary: http://www.screencast.com/t/gMnL7eDpkPii More info: http://cafetran.wikidot.com/optimising-your-glossaries
[Edited at 2014-11-21 07:42 GMT] | | | MikeTrans Germany Local time: 08:22 Italian to German + ...
Hi Sammy, I know that MemoQ has a new feature where you can merge or delete duplicate terms (the same goes with TM entries). After listing the duplicates you can then mark terms for merging or deletion taking your "Master" termbase into account, in your case the termbase containing your Client Terms. I will try to add a screenshot here, but please be patient or tollerant because I don't do this often ... See more Hi Sammy, I know that MemoQ has a new feature where you can merge or delete duplicate terms (the same goes with TM entries). After listing the duplicates you can then mark terms for merging or deletion taking your "Master" termbase into account, in your case the termbase containing your Client Terms. I will try to add a screenshot here, but please be patient or tollerant because I don't do this often Greets, Mike ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to merge old and new versions of tab-delimited glossary Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |