Pages in topic:   < [1 2 3 4 5 6 7 8 9] >
New free & open source aligner (for Windows, OS X and linux)
Thread poster: FarkasAndras
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:12
English to Hungarian
+ ...
TOPIC STARTER
Workaround Nov 15, 2010

Piotr Bienkowski wrote:

I want to make and alignment out of 2007/47/EC and I enter 32007L0047 as the number but I can't seem to be able to succeed.


2007/47/EC is one of the many documents that are not on eur-lex.europa.eu under the CELEX-based URL where the aligner expects them to be. The readme talks about this issue, perhaps I'll add a reference "at runtime" as well (i.e. if you don't get a successful download the aligner will tell you that this might be the reason). Anyway, even if you don't read the entire readme, I recommend that you at least do a search in it when issues like this come up... it has a lot of info.

I'm not sure why these docs never got their own celex page, but here's a workaround: find them elsewhere in html (this will most often be the official journal that is also on the eur-lex site) and use that url. You can use google or the eur-lex search.
In your case, the main eur-lex site is
http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32007L0047:en:NOT

and the English version is at:
http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2007:247:0021:01:EN:HTML

So just start the aligner in "w" mode and copy-paste the appropriate URLs.


I can't automate this because the OJ URL is based on the date of publication and the page number... Although this just gave me a wild idea: when the download fails, the aligner could visit the main eur-lex page of the document, which, at least in this case, seems to be based on the CELEX number. Then it could parse this page to find the correct URLs and attept the download again. Finding the correct URL might be tricky if the page has more than one document (version), but if the guys behind the eur-lex page were methodical enough, this should work.
I tell you, software development projects never end...

Of course you're never guaranteed to get the version that's currently in force, but for translations, that's usually not relevant. Some languages might not be available, but again, we take what we can get.

[Edited at 2010-11-15 17:57 GMT]


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 20:12
English to Polish
+ ...
Well there must be a bug Nov 15, 2010

Piotr Bienkowski wrote:

I want to make and alignment out of 2007/47/EC and I enter 32007L0047 as the number but I can't seem to be able to succeed.


Because I checked that the CELEX number is correct, but still the aligner only retrieves a short message that the document is not available in English which isn't true.

Regards,

Piotr


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:12
English to Hungarian
+ ...
TOPIC STARTER
No bug Nov 15, 2010

Piotr Bienkowski wrote:

Piotr Bienkowski wrote:

I want to make and alignment out of 2007/47/EC and I enter 32007L0047 as the number but I can't seem to be able to succeed.


Because I checked that the CELEX number is correct, but still the aligner only retrieves a short message that the document is not available in English which isn't true.

Regards,

Piotr

As described in my previous post, this is just a limitation caused by an oddity of the eur-lex database. The "Not available in English" message is not from the aligner, that's from eur-lex.


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 20:12
English to Polish
+ ...
There is a bug at eur-lex :-) Nov 17, 2010

FarkasAndras wrote:

Piotr Bienkowski wrote:

Piotr Bienkowski wrote:

I want to make and alignment out of 2007/47/EC and I enter 32007L0047 as the number but I can't seem to be able to succeed.


Because I checked that the CELEX number is correct, but still the aligner only retrieves a short message that the document is not available in English which isn't true.

Regards,

Piotr

As described in my previous post, this is just a limitation caused by an oddity of the eur-lex database. The "Not available in English" message is not from the aligner, that's from eur-lex.


I just downloaded the relevant files and aligned them. Your tool is a real time saver!

Regards

Piotr


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:12
English to Hungarian
+ ...
TOPIC STARTER
2.1 up Nov 17, 2010

Glad it worked - next time somehting doesn't work the way you expect, check the readme;)

I'm not going to spam this thread with every release, but 2.1 is up. It fixes this issue so now the aligner should find most (all?) directives and regulations based on natural/celex numbers by extracting the URL from the eur-lex landing page.
Obviously, I can't tell how consistent eur-lex.europa.eu is with page structure and URLs, so feedback is welcome (if you get a URL with OJ: in it in
... See more
Glad it worked - next time somehting doesn't work the way you expect, check the readme;)

I'm not going to spam this thread with every release, but 2.1 is up. It fixes this issue so now the aligner should find most (all?) directives and regulations based on natural/celex numbers by extracting the URL from the eur-lex landing page.
Obviously, I can't tell how consistent eur-lex.europa.eu is with page structure and URLs, so feedback is welcome (if you get a URL with OJ: in it in the log, then this fix is what helped get the correct url).
There's also a fix to ensure better Studio compatibility: creationdates, IDs and notes should now be imported by Studio (the bug that caused the incompatibility is in Studio, not the aligner btw).
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:12
English to Hungarian
+ ...
TOPIC STARTER
Doc compatibility Nov 23, 2010

2.2 is up (windows version), and it accepts .doc input files. Tables and other more fancy things may not work flawlessly, so txt input is still recommended, but I still suspect .doc compatibility will be quite a popular feature.

On a different note, anyone have input on what open source aligner has a good editing UI? I had a very cursory look at bitext2tmx - it looks passable and it would be pretty easy to integrate LF aligner with something like this. I.e. start LF aligner, have it
... See more
2.2 is up (windows version), and it accepts .doc input files. Tables and other more fancy things may not work flawlessly, so txt input is still recommended, but I still suspect .doc compatibility will be quite a popular feature.

On a different note, anyone have input on what open source aligner has a good editing UI? I had a very cursory look at bitext2tmx - it looks passable and it would be pretty easy to integrate LF aligner with something like this. I.e. start LF aligner, have it autoalign your files and then hand the alignment off to bitext2tmx for review - ideally, this would happen automagically, i.e. the editing ui would just open up with your files already loaded.
I'm looking for something that's platform independent and still in development.
Collapse


 
Nguyen Dieu
Nguyen Dieu  Identity Verified
Vietnam
Local time: 02:12
Member (2008)
English to Vietnamese
+ ...
win7 64 bit and vietnamese text Nov 24, 2010

I tried to align English and Vietnamese text in Win7 64 bit but no success.

When I put source and target document into the application, nothing happened.

FarkasAndras wrote:

Nguyen Cong Dieu wrote:

Hi,

Could you please let me know if this soft works with win7 and Vietnamese text?

Thanks


I won't know until somebody tries it, will I?
Win7 is supported, and UTF-8 is supported. If UTF-8 contains all the Vietnamese characters, it should work in principle, especially with txt input files. Try it and report back!


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:12
English to Hungarian
+ ...
TOPIC STARTER
Nothing Nov 24, 2010

Well, that's definitely not due to the text being Vietnamese.
"Nothing happened" isn't exactly a detailed bugreport, so I can't help unless you post the log, tell me what the input files were, what you did exactly, what the aligner printed in the console and what the output files were.
Even better, please follow the instructions in sample/howto to get an idea of how the aligner works, then try again with your own files.
One issue I can foresee is segmentation - I'm not sure the
... See more
Well, that's definitely not due to the text being Vietnamese.
"Nothing happened" isn't exactly a detailed bugreport, so I can't help unless you post the log, tell me what the input files were, what you did exactly, what the aligner printed in the console and what the output files were.
Even better, please follow the instructions in sample/howto to get an idea of how the aligner works, then try again with your own files.
One issue I can foresee is segmentation - I'm not sure the segmenter will do much with Vietnamese text.
Collapse


 
Johanna Liljenzin
Johanna Liljenzin  Identity Verified
Sweden
Local time: 20:12
Member (2009)
English to Swedish
+ ...
Mac problems Nov 26, 2010

FarkasAndras,

First of all, I want to thank you from the bottom of my heart for devoting so much time in developing a tool that I think will make life so much easier for us all.

I am a Mac user and thought it might be helpful to get some feedback on the Mac version.

I first installed version 2.0 on my Mac. Everything seemed to work well. After figuring out the character-coding issue, I got a perfect XLS-file from the alignment. However, Studio would not ac
... See more
FarkasAndras,

First of all, I want to thank you from the bottom of my heart for devoting so much time in developing a tool that I think will make life so much easier for us all.

I am a Mac user and thought it might be helpful to get some feedback on the Mac version.

I first installed version 2.0 on my Mac. Everything seemed to work well. After figuring out the character-coding issue, I got a perfect XLS-file from the alignment. However, Studio would not accept the TMX, for the same reasons reported by Adam Bojan.

I then downloaded the 2.2 version for Mac, but got the following error message when I tried to run it:

Can't locate HTML/Strip.pm in @INC (@INC contains: CODE(0x10083a1f0) /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at ./scripts/LF_aligner_2.2_with_modules.pl line 36599.
BEGIN failed--compilation aborted at ./scripts/LF_aligner_2.2_with_modules.pl line 36599.
logout

I tried downloading a fresh version in case the download had corrupted the files, but it still does not work.

I will try to run the Windows version for now (I have Parallels Desktop installed, so that should not be a problem) but I look forward to a working version for Mac.

Kind regards,
Johanna
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:12
English to Hungarian
+ ...
TOPIC STARTER
Bug squashed Nov 26, 2010

Thanks for the bug report, that was a small oversight on my part in the linux and mac versions.
I've now fixed both, version 2.201 should work.
I don't have a mac to test it on, so please report back on the fix and the functionality in general, e.g. if doc, docx and pdf input files are handled correctly.

By the way, the TMX import in Studio is pretty shaky, it tends to fail partially or fully on files that are 100% valid TMX but not exactly the way Studio likes TMX to be
... See more
Thanks for the bug report, that was a small oversight on my part in the linux and mac versions.
I've now fixed both, version 2.201 should work.
I don't have a mac to test it on, so please report back on the fix and the functionality in general, e.g. if doc, docx and pdf input files are handled correctly.

By the way, the TMX import in Studio is pretty shaky, it tends to fail partially or fully on files that are 100% valid TMX but not exactly the way Studio likes TMX to be. The import in Trados 2007 seems to be much more solid, so if you bump into any problem, I recommend importing to T2007 and then reexporting to TMX or upgrading the .tmw TM.
That said, in my testing, Studio has been mostly happy with the TMX files generated by 2.201.

[Edited at 2010-11-26 16:29 GMT]
Collapse


 
Johanna Liljenzin
Johanna Liljenzin  Identity Verified
Sweden
Local time: 20:12
Member (2009)
English to Swedish
+ ...
New Mac version seems to work fine! Nov 26, 2010

Thank you for the quick turnaround!

I tried the Mac aligner a few minutes ago, and it seems to work perfectly now (at least on .docx documents). Trados Studio accepted the TM without any problems.

I used the Windows version earlier today, and that also worked very well for me (on .txt-files) with Studio.

I prefer the Mac version, though - it ran quicker and smoother. This might be because I run Windows on a virtual machine on my Mac, which sometimes slows t
... See more
Thank you for the quick turnaround!

I tried the Mac aligner a few minutes ago, and it seems to work perfectly now (at least on .docx documents). Trados Studio accepted the TM without any problems.

I used the Windows version earlier today, and that also worked very well for me (on .txt-files) with Studio.

I prefer the Mac version, though - it ran quicker and smoother. This might be because I run Windows on a virtual machine on my Mac, which sometimes slows things down.

I will give the Mac version a more extensive try next week, when I have a little more time on my hands, and report back to you. That seems to be the least I can do in return for this excellent piece of software

Kind regards,
Johanna
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:12
English to Hungarian
+ ...
TOPIC STARTER
Mac quicker Nov 27, 2010

Johanna Liljenzin wrote:

I prefer the Mac version, though - it ran quicker and smoother. This might be because I run Windows on a virtual machine on my Mac, which sometimes slows things down.

That's true, it's not (all) due to the virtualization.
In a nutshell, it's because Windows doesn't come with a preinstalled perl interpreter, whereas OSX and most linux distros do. To make things work right out of the box, the Windows version is a standalone executable, which basically has the perl interpreter bundled up in it. That's why the startup is so awfully slow in Windows; it has to load the perl interpreter first.

If a tech savvy user wants better performance in windows, you can download the win version, then download the linux version as well, and grab the .pl file from the scripts folder of the linux version and copy it to the aligner folder of the win version. Then install ActivePerl and start up the .pl instead of the .exe. You'll get much the same functionality (the only difference being a potentially worse HTML converter) and near instant startup.


 
Marta Hasyuk
Marta Hasyuk
Local time: 21:12
English to Ukrainian
Dictionary Dec 5, 2010

2 FarkasAndras:

I have quite poor results using no dictionary (null.dic)
How can I attach my own dictionary to hunalign?


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:12
English to Hungarian
+ ...
TOPIC STARTER
Solutions Dec 5, 2010

First of all, the poor results are probably due to poor input files, not the lack of a dictionary. Make sure the two files contain the same text (no missing paragraphs, same paragraph order), preferably with roughly the same segmentation, and with no extraneous elements like page headers and footers (headers and footers from pdf files are what can screw up the alignment most, but if the two files have wildly different segment numbers, you won't get a good alignment, either).

To add
... See more
First of all, the poor results are probably due to poor input files, not the lack of a dictionary. Make sure the two files contain the same text (no missing paragraphs, same paragraph order), preferably with roughly the same segmentation, and with no extraneous elements like page headers and footers (headers and footers from pdf files are what can screw up the alignment most, but if the two files have wildly different segment numbers, you won't get a good alignment, either).

To add a dictionary, just go to scripts/hunalign/data and read info.txt for, well, info. To check if the dictionary is being used correctly, you could align two files without a dictionary, note down the quality score, add a dictionary, align the same files again and see if the score improved (and possibly compare the output files as well).
Collapse


 
Marta Hasyuk
Marta Hasyuk
Local time: 21:12
English to Ukrainian
Realign Dec 9, 2010

Is there any possibility to reiterate alignment for the same texts? How to set realign option in the algorithm?

 
Pages in topic:   < [1 2 3 4 5 6 7 8 9] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

New free & open source aligner (for Windows, OS X and linux)







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »