Anyone have info on sdltm internal format (SQLite)? (Trados support)

Technical forums » Trados support »
Anyone have info on sdltm internal format (SQLite)?
Track this topic

Anyone have info on sdltm internal format (SQLite)?

Thread poster: FarkasAndras

FarkasAndras

Local time: 14:38
English to Hungarian
+ ...

Feb 4, 2013

Hi everyone, I know some of you have groped around inside sdltm files.
Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex.
Reading them does not seem to be difficult at all at first blush (see below). Writing, however... Has anyone attempted to do this outside of SDL? Is it feasible? Has SDL released any info on the format? Could one, say, generate empty or small TMs with Studio and add TUs externally? Then perhaps run a maintenance/reindexing in Studio, if it can't be done any other way? The main goal would be to be able to generate very large sdltms quickly, so having to run a maintenance process in Studio afterwards would probably make the whole exercise largely pointless... but maybe it would be useful for something for someone.
I'm willing to spend a week or to on this problem and release an open source sdltm writer at the end, but I'm just learning SQLite and I'm not interested in devoting the rest of my life to it...

I've found the following tables:
attributes
date_attributes
fuzzy_data
numeric_attributes
parameters
picklist_attributes
picklist_values
resources
sqlite_sequence
sqlite_stat1
string_attributes
tm_resources
translation_memories
translation_unit_contexts
translation_units

The translation_units table has the following columns:
0 id INTEGER 1 1
1 guid BLOB 1 0
2 translation_memory_id INT 1 0
3 source_hash INTEGER 1 0
4 source_segment TEXT 0 0
5 target_hash INTEGER 1 0
6 target_segment TEXT 0 0
7 creation_date DATETIME 1 0
8 creation_user TEXT 1 0
9 change_date DATETIME 1 0
10 change_user TEXT 1 0
11 last_used_date DATETIME 1 0
12 last_used_user TEXT 1 0
13 usage_counter INT 1 0
14 flags INT 0 0

As expected, the source_segment and target_segment columns seem to contain the actual TUs (in XML, with segment pairs being on the same row).

[Edited at 2013-02-04 10:34 GMT] ▲ Collapse

mortgat-f (X)
France
Local time: 14:38
German to French
+ ...

Interested

May 3, 2015

Hello FarkasAndras,

I have not been able to find any documentation on the format, but I would be interested if you succeeded in solving this. I have already succeeded in developing a simple (understand: quick & dirty) tool to extract the source and target segments, but I expect a real reader/writer to be much more complicated to code, since there are all those cryptic guid fields. The context and fuzzy_data tables also seem to be very closely bound to the internal functioning of Trados Studio.

As for very large TMs, I doubt they would be useful unless you have a computer with matching hardware: I once tried to work with a ~1 Gb TM but OmegaT tried loading it into RAM and crashed my computer. Perhaps Studio knows better and uses the TM on disk, though – I haven’t tried yet.

[Edited at 2015-05-03 21:54 GMT] ▲ Collapse

Dan Lucas

United Kingdom
Local time: 13:38
Member (2014)
Japanese to English

Is DIY the only way?

May 3, 2015

FarkasAndras wrote:
Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex.

Presumably you have good reasons for not using an ODBC driver or some other existing library? Sounds like it would not only be hard to create but also horrible to maintain.

Dan

Meta Arkadia
Local time: 19:38
English to Indonesian
+ ...

Browsers

May 3, 2015

FarkasAndras wrote:
Has anyone attempted to do this outside of SDL? Is it feasible?

No, but there are several free, cross-platform browsers you could try. I use DB Browser for SQLite.

I also tried SQLite Studio, and a universal one (just about any DBMS apart from Access), SQuirreL. DtSQL is a paid alternative I haven't tried. And there must be dozens of others, I try OS X compatibel ones only, of course.

The main goal would be to be able to generate very large sdltms quickly

I'm not interested in that, because CafeTran can take care of that. The venenum seems to be in the GUID, as Dan mentioned in another subforum, at least for the return package.

Cheers,

Hans

Meta Arkadia
Local time: 19:38
English to Indonesian
+ ...

Why?

May 4, 2015

FarkasAndras wrote:
The main goal would be to be able to generate very large sdltms quickly

Trados users can import TMX files, and convert them into SDLTM it seems, so there's no reason for you to offer your TMs and glossaries in the SDLTM format. Apart from that, a lot of information in the SQLite database seems to be relate to the project, the translators, reviewers, and whathaveyous.

Cheers,

Hans

Dominique Pivard

Local time: 15:38
Finnish to French

.sdltm and return packages

May 4, 2015

Meta Arkadia wrote:
The venenum seems to be in the GUID, as Dan mentioned in another subforum, at least for the return package.

I was under the impression Studio return packages (.sdlrpx) do not include TM's (the idea being the recipient of the package can update their TM with the translated .sdlxliff), so .sdltm shouldn't be relevant to the return package.

FarkasAndras

Local time: 14:38
English to Hungarian
+ ...

TOPIC STARTER

Maybe SDK?

May 4, 2015

Dan Lucas wrote:

Presumably you have good reasons for not using an ODBC driver or some other existing library? Sounds like it would not only be hard to create but also horrible to maintain.

Dan

As far as I can tell, ODBC is designed for database and OS interoperability, which is not a consideration here (SQLite+Windows is the only feasible DB+OS setup in the first place). I'm not sure if ODBC would be of any help.

In any case, this was just a vague idea of mine that went nowhere due to the apparent complexity of the format. To me, it isn't worth the effort it appears to require.

Creating large DBs outside of Studio would be useful as Studio does the job atrociously slowly (as of Studio 2011, I have no info on 2014). Import times in the 10-24 hour range are not unusual, and make it a practical impossibility to create very large TMs that the program might actually be able to use, if it could only generate them. (It doesn't try to load it all into RAM I believe. That's what relational databases are for.)

Mortgat-f, I was suggested to join the openexchange developer program. I wasn't particularly interested in that, but it may be an option for you depending on what exactly you're trying to do. You can register and get the SDK for free. That way you could generate bona fide Studio TMs without launching Studio (but not without relying on SDL code).

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie	[Call to this topic]
Myron Netchypor	[Call to this topic]
Marco Ramón	[Call to this topic]
Maya Gorgoshidze	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

Anyone have info on sdltm internal format (SQLite)?

Translation news related to SDL Trados

» RWS to Buy SDL in Transformative Deal for the Language Industry
(0 comments)
» SDL Partners with Reynen Court
(0 comments)
» SDL to Launch SLATE: Frictionless, Self-Serve, Translation on Demand
(0 comments)

Submit translation news about SDL Trados »
Read more translation news »

Forum rules

Help and orientation

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Wordfast Pro
Translation Memory Software for Any Platform Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value Buy now! »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

Anyone have info on sdltm internal format (SQLite)?

Anyone have info on sdltm internal format (SQLite)?

You have native languages that can be verified

Your current localization setting

Select a language