Dirk Bayer wrote:
Simply using an ".utf8" extension on the UTF-8 encoded glossary seems to have worked. Even typing non-English characters in OmegaT now using my preferred method (US International Keyboard) seems to create the same results as insertion from the glossaries, no matter if I use the CP 1252 version of the glossary (with ".tab" file name extension) or the UTF-8 verson (with ".utf8" file name extension), and exporting an OmegaT-produced odt file to a utf8 cleartext file from OpenOffice now produces the same output as OmegaT does from a ".utf8" source file to a utf8 target file. -- It seems as if I could even use good old CP 1252 encoded glossaries (with ".tab" extension!) and leave the unicode worries entirely to OmegaT...
OmegaT handles everything in UTF-8 internally. So, if the input is correctly identified, the output will be correct, assuming it can handle the required characters. I.e., you cannot produce CP 1252 files containing Japanese.
I wonder if there is a good way for verifying UTF-8 vs CP 1252 encoding in the OpenOffice files since they seem to react identically to font changes no matter what glossary was used in their creation as long as the glossary file name matched the glossary's encoding.
There's nothing to verify: all OpenOffice.org files are in UTF-8.
I wonder whether normalizing glossaries to use only straight quotes will now result in matches whether or not the source files contain straight or curly quotes, or whether such normalizing will even be necessary. I previously had mixed results.
It depends on plenty of things.
In short, OmegaT has no specific function to understand that a straight quote is the same as a curly one.
This is an amazing forum.
For advanced discussion on OmegaT, the Yahoo support group would still be more suitable.
Didier