Anyone have info on sdltm internal format (SQLite)? Thread poster: FarkasAndras
|
Hi everyone, I know some of you have groped around inside sdltm files. Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex. Reading them does not seem to be difficult at all at first blush (see below). Writing, however... Has anyone attempted to do this outside of SDL? Is it feasible? Has SDL released any info on the for... See more Hi everyone, I know some of you have groped around inside sdltm files. Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex. Reading them does not seem to be difficult at all at first blush (see below). Writing, however... Has anyone attempted to do this outside of SDL? Is it feasible? Has SDL released any info on the format? Could one, say, generate empty or small TMs with Studio and add TUs externally? Then perhaps run a maintenance/reindexing in Studio, if it can't be done any other way? The main goal would be to be able to generate very large sdltms quickly, so having to run a maintenance process in Studio afterwards would probably make the whole exercise largely pointless... but maybe it would be useful for something for someone. I'm willing to spend a week or to on this problem and release an open source sdltm writer at the end, but I'm just learning SQLite and I'm not interested in devoting the rest of my life to it... I've found the following tables: attributes date_attributes fuzzy_data numeric_attributes parameters picklist_attributes picklist_values resources sqlite_sequence sqlite_stat1 string_attributes tm_resources translation_memories translation_unit_contexts translation_units The translation_units table has the following columns: 0 id INTEGER 1 1 1 guid BLOB 1 0 2 translation_memory_id INT 1 0 3 source_hash INTEGER 1 0 4 source_segment TEXT 0 0 5 target_hash INTEGER 1 0 6 target_segment TEXT 0 0 7 creation_date DATETIME 1 0 8 creation_user TEXT 1 0 9 change_date DATETIME 1 0 10 change_user TEXT 1 0 11 last_used_date DATETIME 1 0 12 last_used_user TEXT 1 0 13 usage_counter INT 1 0 14 flags INT 0 0 As expected, the source_segment and target_segment columns seem to contain the actual TUs (in XML, with segment pairs being on the same row).
[Edited at 2013-02-04 10:34 GMT] ▲ Collapse | | | mortgat-f (X) France Local time: 14:38 German to French + ...
Hello FarkasAndras, I have not been able to find any documentation on the format, but I would be interested if you succeeded in solving this. I have already succeeded in developing a simple (understand: quick & dirty) tool to extract the source and target segments, but I expect a real reader/writer to be much more complicated to code, since there are all those cryptic guid fields. The context and fuzzy_data tables also seem to be very closely bound to the internal functioning of Tr... See more Hello FarkasAndras, I have not been able to find any documentation on the format, but I would be interested if you succeeded in solving this. I have already succeeded in developing a simple (understand: quick & dirty) tool to extract the source and target segments, but I expect a real reader/writer to be much more complicated to code, since there are all those cryptic guid fields. The context and fuzzy_data tables also seem to be very closely bound to the internal functioning of Trados Studio. As for very large TMs, I doubt they would be useful unless you have a computer with matching hardware: I once tried to work with a ~1 Gb TM but OmegaT tried loading it into RAM and crashed my computer. Perhaps Studio knows better and uses the TM on disk, though – I haven’t tried yet.
[Edited at 2015-05-03 21:54 GMT] ▲ Collapse | | | Dan Lucas United Kingdom Local time: 13:38 Member (2014) Japanese to English Is DIY the only way? | May 3, 2015 |
FarkasAndras wrote: Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex. Presumably you have good reasons for not using an ODBC driver or some other existing library? Sounds like it would not only be hard to create but also horrible to maintain. Dan | | |
FarkasAndras wrote: Has anyone attempted to do this outside of SDL? Is it feasible? No, but there are several free, cross-platform browsers you could try. I use DB Browser for SQLite. I also tried SQLite Studio, and a universal one (just about any DBMS apart from Access), SQuirreL. DtSQL is a paid alternative I haven't tried. And there must be dozens of others, I try OS X compatibel ones only, of course. The main goal would be to be able to generate very large sdltms quickly I'm not interested in that, because CafeTran can take care of that. The venenum seems to be in the GUID, as Dan mentioned in another subforum, at least for the return package. Cheers, Hans | |
|
|
FarkasAndras wrote: The main goal would be to be able to generate very large sdltms quickly Trados users can import TMX files, and convert them into SDLTM it seems, so there's no reason for you to offer your TMs and glossaries in the SDLTM format. Apart from that, a lot of information in the SQLite database seems to be relate to the project, the translators, reviewers, and whathaveyous. Cheers, Hans | | | .sdltm and return packages | May 4, 2015 |
Meta Arkadia wrote: The venenum seems to be in the GUID, as Dan mentioned in another subforum, at least for the return package. I was under the impression Studio return packages (.sdlrpx) do not include TM's (the idea being the recipient of the package can update their TM with the translated .sdlxliff), so .sdltm shouldn't be relevant to the return package. | | | FarkasAndras Local time: 14:38 English to Hungarian + ... TOPIC STARTER
Dan Lucas wrote: FarkasAndras wrote: Do you have any detailed info on what their structure is and how fault tolerant it is? I'm considering writing some software to read/write them. I've started poking around, and the structure seems to be pretty complex. Presumably you have good reasons for not using an ODBC driver or some other existing library? Sounds like it would not only be hard to create but also horrible to maintain. Dan As far as I can tell, ODBC is designed for database and OS interoperability, which is not a consideration here (SQLite+Windows is the only feasible DB+OS setup in the first place). I'm not sure if ODBC would be of any help. In any case, this was just a vague idea of mine that went nowhere due to the apparent complexity of the format. To me, it isn't worth the effort it appears to require. Creating large DBs outside of Studio would be useful as Studio does the job atrociously slowly (as of Studio 2011, I have no info on 2014). Import times in the 10-24 hour range are not unusual, and make it a practical impossibility to create very large TMs that the program might actually be able to use, if it could only generate them. (It doesn't try to load it all into RAM I believe. That's what relational databases are for.) Mortgat-f, I was suggested to join the openexchange developer program. I wasn't particularly interested in that, but it may be an option for you depending on what exactly you're trying to do. You can register and get the SDK for free. That way you could generate bona fide Studio TMs without launching Studio (but not without relying on SDL code). | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Anyone have info on sdltm internal format (SQLite)? Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |