Anyone have info on sdltm internal format (SQLite)?
Thread poster: FarkasAndras
FarkasAndras
FarkasAndras  Identity Verified
Local time: 14:38
English to Hungarian
+ ...
Feb 4, 2013

Hi everyone, I know some of you have groped around inside sdltm files.
Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex.
Reading them does not seem to be difficult at all at first blush (see below). Writing, however... Has anyone attempted to do this outside of SDL? Is it feasible? Has SDL released any info on the for
... See more
Hi everyone, I know some of you have groped around inside sdltm files.
Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex.
Reading them does not seem to be difficult at all at first blush (see below). Writing, however... Has anyone attempted to do this outside of SDL? Is it feasible? Has SDL released any info on the format? Could one, say, generate empty or small TMs with Studio and add TUs externally? Then perhaps run a maintenance/reindexing in Studio, if it can't be done any other way? The main goal would be to be able to generate very large sdltms quickly, so having to run a maintenance process in Studio afterwards would probably make the whole exercise largely pointless... but maybe it would be useful for something for someone.
I'm willing to spend a week or to on this problem and release an open source sdltm writer at the end, but I'm just learning SQLite and I'm not interested in devoting the rest of my life to it...


I've found the following tables:
attributes
date_attributes
fuzzy_data
numeric_attributes
parameters
picklist_attributes
picklist_values
resources
sqlite_sequence
sqlite_stat1
string_attributes
tm_resources
translation_memories
translation_unit_contexts
translation_units

The translation_units table has the following columns:
0 id INTEGER 1 1
1 guid BLOB 1 0
2 translation_memory_id INT 1 0
3 source_hash INTEGER 1 0
4 source_segment TEXT 0 0
5 target_hash INTEGER 1 0
6 target_segment TEXT 0 0
7 creation_date DATETIME 1 0
8 creation_user TEXT 1 0
9 change_date DATETIME 1 0
10 change_user TEXT 1 0
11 last_used_date DATETIME 1 0
12 last_used_user TEXT 1 0
13 usage_counter INT 1 0
14 flags INT 0 0

As expected, the source_segment and target_segment columns seem to contain the actual TUs (in XML, with segment pairs being on the same row).


[Edited at 2013-02-04 10:34 GMT]
Collapse


 
mortgat-f (X)
mortgat-f (X)
France
Local time: 14:38
German to French
+ ...
Interested May 3, 2015

Hello FarkasAndras,

I have not been able to find any documentation on the format, but I would be interested if you succeeded in solving this. I have already succeeded in developing a simple (understand: quick & dirty) tool to extract the source and target segments, but I expect a real reader/writer to be much more complicated to code, since there are all those cryptic guid fields. The context and fuzzy_data tables also seem to be very closely bound to the internal functioning of Tr
... See more
Hello FarkasAndras,

I have not been able to find any documentation on the format, but I would be interested if you succeeded in solving this. I have already succeeded in developing a simple (understand: quick & dirty) tool to extract the source and target segments, but I expect a real reader/writer to be much more complicated to code, since there are all those cryptic guid fields. The context and fuzzy_data tables also seem to be very closely bound to the internal functioning of Trados Studio.

As for very large TMs, I doubt they would be useful unless you have a computer with matching hardware: I once tried to work with a ~1 Gb TM but OmegaT tried loading it into RAM and crashed my computer. Perhaps Studio knows better and uses the TM on disk, though – I haven’t tried yet.

[Edited at 2015-05-03 21:54 GMT]
Collapse


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 13:38
Member (2014)
Japanese to English
Is DIY the only way? May 3, 2015

FarkasAndras wrote:
Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex.

Presumably you have good reasons for not using an ODBC driver or some other existing library? Sounds like it would not only be hard to create but also horrible to maintain.

Dan


 
Meta Arkadia
Meta Arkadia
Local time: 19:38
English to Indonesian
+ ...
Browsers May 3, 2015

FarkasAndras wrote:
Has anyone attempted to do this outside of SDL? Is it feasible?

No, but there are several free, cross-platform browsers you could try. I use DB Browser for SQLite.



I also tried SQLite Studio, and a universal one (just about any DBMS apart from Access), SQuirreL. DtSQL is a paid alternative I haven't tried. And there must be dozens of others, I try OS X compatibel ones only, of course.

The main goal would be to be able to generate very large sdltms quickly

I'm not interested in that, because CafeTran can take care of that. The venenum seems to be in the GUID, as Dan mentioned in another subforum, at least for the return package.

Cheers,

Hans


 
Meta Arkadia
Meta Arkadia
Local time: 19:38
English to Indonesian
+ ...
Why? May 4, 2015

FarkasAndras wrote:
The main goal would be to be able to generate very large sdltms quickly

Trados users can import TMX files, and convert them into SDLTM it seems, so there's no reason for you to offer your TMs and glossaries in the SDLTM format. Apart from that, a lot of information in the SQLite database seems to be relate to the project, the translators, reviewers, and whathaveyous.

Cheers,

Hans


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 15:38
Finnish to French
.sdltm and return packages May 4, 2015

Meta Arkadia wrote:
The venenum seems to be in the GUID, as Dan mentioned in another subforum, at least for the return package.

I was under the impression Studio return packages (.sdlrpx) do not include TM's (the idea being the recipient of the package can update their TM with the translated .sdlxliff), so .sdltm shouldn't be relevant to the return package.


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 14:38
English to Hungarian
+ ...
TOPIC STARTER
Maybe SDK? May 4, 2015

Dan Lucas wrote:

FarkasAndras wrote:
Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex.

Presumably you have good reasons for not using an ODBC driver or some other existing library? Sounds like it would not only be hard to create but also horrible to maintain.

Dan

As far as I can tell, ODBC is designed for database and OS interoperability, which is not a consideration here (SQLite+Windows is the only feasible DB+OS setup in the first place). I'm not sure if ODBC would be of any help.

In any case, this was just a vague idea of mine that went nowhere due to the apparent complexity of the format. To me, it isn't worth the effort it appears to require.

Creating large DBs outside of Studio would be useful as Studio does the job atrociously slowly (as of Studio 2011, I have no info on 2014). Import times in the 10-24 hour range are not unusual, and make it a practical impossibility to create very large TMs that the program might actually be able to use, if it could only generate them. (It doesn't try to load it all into RAM I believe. That's what relational databases are for.)


Mortgat-f, I was suggested to join the openexchange developer program. I wasn't particularly interested in that, but it may be an option for you depending on what exactly you're trying to do. You can register and get the SDK for free. That way you could generate bona fide Studio TMs without launching Studio (but not without relying on SDL code).


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Anyone have info on sdltm internal format (SQLite)?







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »