Splitting up gigantic TMs
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Apr 17, 2018

I'm referring to this posting:

https://cafetran.freshdesk.com/support/discussions/topics/6000053707

SDL's Paul Filkin advises to use Heartsome TM
... See more
I'm referring to this posting:

https://cafetran.freshdesk.com/support/discussions/topics/6000053707

SDL's Paul Filkin advises to use Heartsome TMX editor:
https://community.sdl.com/product-groups/translationproductivity/f/160/t/9898

One could try to use UltraEdit (Mac and Windows) and chop up the gigantic TMX file.

However, how about a new feature for CafeTran Espresso 2018: Split up TMX files?

A dedicated feature could read the TMX file line after line and write it to a new TMX file of 300,00 TUs.

BTW: I wouldn't be amazed if the 1.5 GB from the article mentioned above could be reduced by 50 %, when removing all extra info Studio stores in its TMs and that CT can do without easily.
Collapse


 
Igor Kmitowski
Igor Kmitowski  Identity Verified
Poland
Local time: 20:41
Member (2016)
English to Polish
+ ...
Split translation memory in TMX edit mode Apr 17, 2018

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.
2. In the Search field, type a range for the split (e.g 1-100000) and press Enter.
3. Save the filtered segments to the new TMX file via Project > Export and Exchange > To TMX memory... .


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Small RAM Apr 17, 2018

Igor Kmitowski wrote:

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.


I think that loading the entire TMX file to RAM is a problem with gigantic TMX files and computers with small RAM.

I'm not a developer so I don't know how this works, but:

With a simple MS-Word macro that reads a big file line after line, I could extract useful bits of glossaries and TMX files very fast.

Isn't something like that possible for CT too? Not loading the entire TM to RAM but just processing line after line?


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:41
English to Hungarian
+ ...
yes Apr 19, 2018

Hans Lenting wrote:

Igor Kmitowski wrote:

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.


I think that loading the entire TMX file to RAM is a problem with gigantic TMX files and computers with small RAM.


True. There is a fairly primitive tmx chopper in my random collection of scripts:
https://sourceforge.net/projects/aligner/files/grab_bag_1.7-random_tools_for_translators.zip/download

It strips various tags from the tmx but it should work on any file size.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Solution in AppleScript Apr 23, 2018

Thanks, Andras. I'm sorry to report that I couldn't get your Perl script to work on my Mac.

Here's a great solution using AppleScript:

https://www.proz.com/forum/apple_mac_operating_systems/324749-split_gigantic_tmx_files.html

Here's a part of the DGT, split in parts of 100,000 translation units:

Screen Shot 2018-04-23 at 18.13.26

Screen Shot 2018-04-23 at 18.14.34

[Edited at 2018-04-23 16:18 GMT]


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

Splitting up gigantic TMs






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »