Pages in topic:   < [1 2 3 4 5 6 7 8 9] >
New free & open source aligner (for Windows, OS X and linux)
Thread poster: FarkasAndras
FarkasAndras
FarkasAndras  Identity Verified
Local time: 10:36
English to Hungarian
+ ...
TOPIC STARTER
New version Jan 27, 2015

By popular demand and about 8 months late (maybe more), here is a new version

From the changelog:

New in 4.1:
- Support for new URLs used by eur-lex (EU legislation downloads work again)
- French translation added (can be enabled in setup)
- TMX maker GUI fixes
- Fixed bug in the alignment editor's "Load addition
... See more
By popular demand and about 8 months late (maybe more), here is a new version

From the changelog:

New in 4.1:
- Support for new URLs used by eur-lex (EU legislation downloads work again)
- French translation added (can be enabled in setup)
- TMX maker GUI fixes
- Fixed bug in the alignment editor's "Load additional column with autoalignment" feature
- Added copy/paste to alignment editor (Ctrl-n/Ctrl-m, can only copy whole cell and only within program)


I haven't done much testing, hence the limited release on dropbox. Please report bugs here, or by email. Minor feature requests are welcome too. I have no plans for any major changes or additions (apart from possibly reviving a dormant effort to get a mac gui version out).

The French translation isn't available yet, so don't try enabling it.
Collapse


 
KylaR
KylaR
Local time: 10:36
Trying to create multilingual TMs and so lost... Mar 2, 2015

Hello András,

I'm not sure if this is an LF Aligner question or a TMLookup question.
I am brand new to TMLookup (and love it!), and not new at all to LF Aligner, but it's the first time I'm aligning 3 languages instead of 2 and creating translation memories (I usually relied on tabbed TXTs).

I aligned some texts using the following syntax:
LF_aligner_4.04.exe -f=t -l=en,fr,pt -s=n -r=xn -t=y -o=C:\outfile.txt -i="C:\TextENG.txt","C:\TextFRE.txt","C:\TextPOR
... See more
Hello András,

I'm not sure if this is an LF Aligner question or a TMLookup question.
I am brand new to TMLookup (and love it!), and not new at all to LF Aligner, but it's the first time I'm aligning 3 languages instead of 2 and creating translation memories (I usually relied on tabbed TXTs).

I aligned some texts using the following syntax:
LF_aligner_4.04.exe -f=t -l=en,fr,pt -s=n -r=xn -t=y -o=C:\outfile.txt -i="C:\TextENG.txt","C:\TextFRE.txt","C:\TextPOR.txt"
When the aligner asked me questions about the order of languages in my TM tool (I think?), I said EN FR PT... kind of at random.

Additionally, I aligned some texts that were only PT>FR, and others that were only PT>EN (and I have a number of pre-existing tabbed TXTs that are only EN>FR).

I translate from English and Portuguese into French. I currently see two query fields in TMLookup. Is it possible to use more?

Say I spend all of March working on a Portuguese text, my first query field will always be PT; but I want to see the concordances in both FR and EN. If I set the second query field as FR, the texts I aligned as PT>FR will work, but not those I aligned as PT>EN. And the ones I aligned as EN FR PT will display, but as PT FR PT Source, and I'd like to see the English as well if that's possible.

And then if I spend all of April working on an English text, my first query field will always be EN, but I run into the same kind of problem with the second query field and the order of the columns...

I'm sorry if I'm being very confusing, I'm pretty confused myself! But anyway, what should I do: declare a different order of languages when LF Aligner creates the TM, align the texts several times with different language orders, use several different databases in TMLookup (basing them on the source or the target language(s)?), import files differently?

Thanks!
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 10:36
English to Hungarian
+ ...
TOPIC STARTER
Multilingual DB Mar 2, 2015

This is only related to LF Aligner to the extent that LF Aligner is probably the only aligner that allows you to generate 3-language alignments. The meat of the matter is how to handle multilingual files in TMLookup.
You can do what you want: import all your bi- and trilingual alignments into the same trilingual TMLookup DB and search them in all directions.
Here's how you do it:
- Create a TMLookup DB with the three languages (and optional source column).
- Take an en-fr
... See more
This is only related to LF Aligner to the extent that LF Aligner is probably the only aligner that allows you to generate 3-language alignments. The meat of the matter is how to handle multilingual files in TMLookup.
You can do what you want: import all your bi- and trilingual alignments into the same trilingual TMLookup DB and search them in all directions.
Here's how you do it:
- Create a TMLookup DB with the three languages (and optional source column).
- Take an en-fr-pt trilingual tabbed files, open it and check the order of languages (en, pt, fr, or pt, en fr etc.). Import the file. In the import dialog, pick 3 as the 'Number of columns to read from file', and specify the languages in the order they are in the txt file.
- Repeat this with all the en-fr-pt tabbed files. Obviously, if the languages are in a different order in the files, you have to specify them in a different order in TMLookup. TMLookup tries to guess the order from the file name but always check manually.
- Then add all your bilingual files. Leave the column number on 2, make sure you have the correct two languages picked in the correct order. Do not select multiple files at the same time and use the Process all files with the same settings checkbox unless you are 100% sure that the languages are in the same order in all files. (If you select multiple files and leave the checkbox unchecked, you can still specify individual settings of course.)

When you have a correctly arranged DB, you can switch the query boxes around with the dropdown boxes. When you get the "PT FR PT Source" display thing, you can just click View/Display additional columns again, pick the missing columns and English will be added.

BTW if you happen to have, say, a fr-pt database and you want to import an en-pt-fr tabbed file into it, you can do that as well. You read three columns from the file and discard the first (en) column by choosing "skip" from the dropbodown list.
Collapse


 
KylaR
KylaR
Local time: 10:36
Thanks Mar 5, 2015

It looks like clicking 'View/Display additional columns' when needed was enough to solve all my problems! Thanks a lot !

 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 09:36
Member (2009)
Dutch to English
+ ...
@FarkasAndras: Sep 13, 2015

Any news re the GUI for batch mode you mentioned a while back? I am getting errors aligning batches of 100 txt files each with AlignFactory, and wanted to try LF Aligner.

MB


 
2nl (X)
2nl (X)  Identity Verified
Netherlands
Local time: 10:36
Work-around? Sep 14, 2015

Michael Beijer wrote:

Any news re the GUI for batch mode you mentioned a while back? I am getting errors aligning batches of 100 txt files each with AlignFactory, and wanted to try LF Aligner.

MB


How about this work-around?


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 10:36
English to Hungarian
+ ...
TOPIC STARTER
batch Sep 14, 2015

Merging files before aligning is an option, but it's not a very good one. If one file pair is badly mismatched (one file several dozen or several hundred segments longer than the other) then it can throw off the alignment throughout the rest of the project. It's best to keep files isolated to isolate problems.

Re: GUI batch mode, LF Aligner is on the back burner... I did write a simple GUI program that generates a .bat file for batch alignment, but it's primitive and ugly. I mostly
... See more
Merging files before aligning is an option, but it's not a very good one. If one file pair is badly mismatched (one file several dozen or several hundred segments longer than the other) then it can throw off the alignment throughout the rest of the project. It's best to keep files isolated to isolate problems.

Re: GUI batch mode, LF Aligner is on the back burner... I did write a simple GUI program that generates a .bat file for batch alignment, but it's primitive and ugly. I mostly wrote it for my own use. Maybe I will polish it up and publish it at some point, but the earliest time that could possibly happen is next week.
You can of course generate the .bat yourself, which is what I did up to a month or two ago. Copying file names to the clipboard from Total Commander, pasting them in Excel and then using either Excel or search and replace in a text editor to add the rest of the command makes it relatively painless... relatively being the keyword.

[Edited at 2015-09-14 08:35 GMT]
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 09:36
Member (2009)
Dutch to English
+ ...
There really should be an easier way to do this Sep 14, 2015

FarkasAndras wrote:

Merging files before aligning is an option, but it's not a very good one. If one file pair is badly mismatched (one file several dozen or several hundred segments longer than the other) then it can throw off the alignment throughout the rest of the project. It's best to keep files isolated to isolate problems.

Re: GUI batch mode, LF Aligner is on the back burner... I did write a simple GUI program that generates a .bat file for batch alignment, but it's primitive and ugly. I mostly wrote it for my own use. Maybe I will polish it up and publish it at some point, but the earliest time that could possibly happen is next week.
You can of course generate the .bat yourself, which is what I did up to a month or two ago. Copying file names to the clipboard from Total Commander, pasting them in Excel and then using either Excel or search and replace in a text editor to add the rest of the command makes it relatively painless... relatively being the keyword.

[Edited at 2015-09-14 08:35 GMT]


Thanks FarkasAndras,

But I solved it for now. I realised that in AlignFactory you can set the program to spit out separate TMXs, as well as one big one. So when you run a huge batch job, and the program chokes, the last TMX it spits out will be the one with the problem. The name of this TMX will correspond to the txt file (pair) with the problem. Just skipping this txt file usually allows the project to complete if rerun. I then just convert the single txt file (pair) with the problem into a separate TMX using Heartsome's TMX editor (Tools > Convert to TMX), and then merge it with the AlignFactory TMX.

Indeed: it's never a good idea to merge 100 txt files into a single big one for stuff like this. Way too much chance of something going wrong, not to mention merely merging 100 text files of this type is in itself quite a chore, and likely to choke most programs, even EmEditor.

No time for generating .bat files, etc., right now but I do look forward to your future GUI batch mode thingee, as I would love to test it against AlignFactory.

There really should be an easier way to do this though, seeing as how all of these files are effectively already aligned. All I need is for a program to: take the first line of text file de1.txt and match it to the first line of text file en1.txt, and turn this into a TU. Then, it needs to take the second line of text file de1.txt and match it with the second line of text file en1.txt, and turn it into a TU. Then repeat that a few times.

PS: this is what I'm currently working on: http://homepages.inf.ed.ac.uk/pkoehn/publications/de-news/speech


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 10:36
English to Hungarian
+ ...
TOPIC STARTER
merge Sep 14, 2015

Michael Beijer wrote:

There really should be an easier way to do this though, seeing as how all of these files are effectively already aligned. All I need is for a program to: take the first line of text file de1.txt and match it to the first line of text file en1.txt, and turn this into a TU. Then, it needs to take the second line of text file de1.txt and match it with the second line of text file en1.txt, and turn it into a TU. Then repeat that a few times.

PS: this is what I'm currently working on: http://homepages.inf.ed.ac.uk/pkoehn/publications/de-news/speech


I thought you already did this with the public patent TM? In any case, some dumb aligners do this. I also have my own software for this because I store my large multilingual TMs in a similar format (separate txt files for each document in each language, one line per segment).
Maybe one day I will add such a dumb pairing feature to lf aligner, or release a separate program that merges files into tabbed files.


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 09:36
Member (2009)
Dutch to English
+ ...
A "dumb aligner" would be great! Sep 14, 2015

FarkasAndras wrote:

Michael Beijer wrote:

There really should be an easier way to do this though, seeing as how all of these files are effectively already aligned. All I need is for a program to: take the first line of text file de1.txt and match it to the first line of text file en1.txt, and turn this into a TU. Then, it needs to take the second line of text file de1.txt and match it with the second line of text file en1.txt, and turn it into a TU. Then repeat that a few times.

PS: this is what I'm currently working on: http://homepages.inf.ed.ac.uk/pkoehn/publications/de-news/speech


I thought you already did this with the public patent TM? In any case, some dumb aligners do this. I also have my own software for this because I store my large multilingual TMs in a similar format (separate txt files for each document in each language, one line per segment).
Maybe one day I will add such a dumb pairing feature to lf aligner, or release a separate program that merges files into tabbed files.


I did, but there were much fewer files, and they were much bigger. Now, I have hundreds of small txt files, so my approach is different.

This is how I did the PatT data:

Original workflow:

1. Append ".txt" to file names
2. Open files in EmEditor (or a good text editor capable of opening large files; UltraEdit is also good)
3. In Ron's CSV Editor, create empty file and paste in contents of .txt files (of src + trgt language) to create a tab-delimited .csv
4. In Xbench, convert aforementioned .csv to .tmx;
5. In Heartsome TMX editor, edit the TMX custom attributes and clean up the TMX (remove duplicates).

Improved workflow:

1. Append ".txt" to file names
2. Use "split" command in cmd.exe to split large text file into smaller files based on number of lines (1,000,000 lines): split -l 1000000 filename.txt
3. Use "generate_tabbed.exe" (in András Farkas’s "Grab Bag", included in LF Aligner download) to convert src and trgt language .txt files into tab-delimited .txt containing both src + trgt
4. Use Heartsome TMX editor to convert bilingual tab-del .txt files into .tmx


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 09:36
Member (2009)
Dutch to English
+ ...
Found the problem! Sep 14, 2015

Every so often, align factory will choke. As I mentioned, the name of the last TMX it created allows me to see which text file is the problem. Looking in these text files reveals that the problem is always the presence of this character:


aka

\x1a

If I F&R all these with '

The batch alignment completes fine.


 
Artem Vakhitov
Artem Vakhitov  Identity Verified
Kyrgyzstan
English to Russian
+ ...
New Linux versions? Oct 2, 2015

I've noticed that the Linux version is at 3.11 while the Windows one is at 4.1. Any chances to have the Linux version updated?

 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 10:36
English to Hungarian
+ ...
TOPIC STARTER
Maybe Oct 4, 2015

Yes, the linux & mac versions got left behind.
Maybe I will get around to releasing a new linux version, but I can't make any promises. You can try to roll your own, though. Find the .pl in the linux version, there is a short howto at the top of the file (aligner/scripts/LF_aligner_XXX.pl). Download the Windows version, find the .pl and copy the relevant bits over into the linux .pl.
Most of the changes since 3.11 affected the GUI, which obviously make no difference for linux users.
... See more
Yes, the linux & mac versions got left behind.
Maybe I will get around to releasing a new linux version, but I can't make any promises. You can try to roll your own, though. Find the .pl in the linux version, there is a short howto at the top of the file (aligner/scripts/LF_aligner_XXX.pl). Download the Windows version, find the .pl and copy the relevant bits over into the linux .pl.
Most of the changes since 3.11 affected the GUI, which obviously make no difference for linux users. There have been only a handful of other updates that could be useful to linux users (see changelog).
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 12:36
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
TMX Maker: errors processing a text file Jan 6, 2016

I have a plain-text file UTF-8 that I want to convert to a TMX file (using TMX Maker 3.0 from LF Aligner 4.1 on Windows 7) as follows:
Code:
source{tab}target



When I start TMX Maker, I see:

... See more
I have a plain-text file UTF-8 that I want to convert to a TMX file (using TMX Maker 3.0 from LF Aligner 4.1 on Windows 7) as follows:
Code:
source{tab}target



When I start TMX Maker, I see:

Code:
Drag and drop the input file (tab delimited txt in UTF-8 encoding, or xls) here
and press enter.



My file seems to fit, thus, I drag and drop it, go through the following steps (choose the output file name, the number of languages (2, as by default), the language codes (I specify EN-US and BE-BY), the date/time (I confirm the default), the creator name (I confirm the default), the note (I leave none), and hit Enter. Then I get just a bunch of:
Code:
LINE XXX OF THE FILE DOESN'T HAVE ENOUGH COLUMNS, SO IT HAS BEEN SKIPPED.
CHECK THE SOURCE FILE AND RUN THE TMX MAKER AGAIN IF NEEDED



and then:
Code:
0 TUs have been written to the TMX. XXX segments were skipped (0 of them due to
being half-empty).



And, obviously, the resulting TMX file contains only a conventional TMX header, but no TUs.

I tried to search for the above error message in Internet, but I could only find: reading a CSV files columns directly into variables names with python. However, this does not help me.

Is there anything else that I should check/look into?
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 09:36
Member (2009)
Dutch to English
+ ...
Try Heartsome TMX editor Jan 6, 2016

esperantisto wrote:

I have a plain-text file UTF-8 that I want to convert to a TMX file (using TMX Maker 3.0 from LF Aligner 4.1 on Windows 7) as follows:
Code:
source{tab}target



When I start TMX Maker, I see:

[snip!]

Is there anything else that I should check/look into?


Slightly off-topic, but the Heartsome TMX editor has a great little tool for converting tab-delimited files (and Excel files) to TMXs. Wonder of if/when anyone will take over the (now open source) project.

[Edited at 2016-01-06 09:15 GMT]


 
Pages in topic:   < [1 2 3 4 5 6 7 8 9] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

New free & open source aligner (for Windows, OS X and linux)







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search