OmegaT and asian languages (OmegaT support)

Technische Foren » OmegaT support »
OmegaT and asian languages
Track this topic

OmegaT and asian languages

Initiator des Themas: Neirda

Neirda

China
Local time: 12:37
Chinesisch > Französisch
+ ...

Mar 20, 2013

Hello everyone,

So I just got started with OmegaT and Chinese (simplified), and quickly discovered that Fuzzy Matches and Glossary don't seem to work as they should.

Nothing ever shows up on both windows, even after adding entries on the glossary.
I created a test file with sentences of different similarities and lenghts to test the software's behavior, and it seems like only 100% matches are correctly detected & replaced.

I tried to run the software using Applocale (the Windows launcher for non-unicode systems) with no change. I checked the project files, TMXes and the glossary file are correctly created and filled. So I don't know where would be the problem. Maybe a char display issue ? ▲ Collapse

Didier Briel

Frankreich
Local time: 05:37
Englisch > Französisch
+ ...

Use a tokenizer

Mar 20, 2013

Pierret Adrien wrote:
So I just got started with OmegaT and Chinese (simplified), and quickly discovered that Fuzzy Matches and Glossary don't seem to work as they should.

Nothing ever shows up on both windows, even after adding entries on the glossary.
I created a test file with sentences of different similarities and lenghts to test the software's behavior, and it seems like only 100% matches are correctly detected & replaced.

I tried to run the software using Applocale (the Windows launcher for non-unicode systems) with no change. I checked the project files, TMXes and the glossary file are correctly created and filled. So I don't know where would be the problem. Maybe a char display issue ?

For glossaries, it could be an encoding issue. You could test with an English or French source document and glossary to check everything is working as it should.
(As long as you use correctly UTF-8 glossaries, not system-encoded ones, everything should be fine.)

For fuzzy matches, it's very unlikely.

By default, OmegaT uses Java tokenizer, which can only detect words when they are separated by a space. Of course, it doesn't work for CJK languages.

That's why we provide also tokenizers:
http://www.omegat.org/en/howtos/tokenizer.php

For Chinese, LuceneSmartChineseTokenizer seems to be the better one.
Do not forget to use also a target tokenizer, so that you don't have issues with spellchecking in European target languages.

Didier

Neirda

China
Local time: 12:37
Chinesisch > Französisch
+ ...

THEMENSTARTER

You were right

Mar 20, 2013

I must have missed something the first time, I re-set up the tokenizer launcher acording to instructions, and now it works, both fuzzy matches and glossary.

Thank you, and sorry for the trouble.

Login to reply/comment

Dieses Forum wird von keinem Moderator betreut.
Um Verstöße gegen die ProZ.com-Regeln zu melden oder um Hilfe zu erhalten, wenden Sie sich bitte an unsere ProZ.com-Mitarbeiter »

OmegaT and asian languages

Forum rules

Help and orientation

CafeTran Espresso
You've never met a CAT tool this clever! Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free Buy now! »

Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators. Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way. More info »

Neueste Beiträge | FAQ | Regeln | Moderatoren | Artikelbank

Your current localization setting

Deutsch

Select a language

More languages...

OmegaT and asian languages

OmegaT and asian languages

You have native languages that can be verified

Your current localization setting

Select a language