Google admits ‘garbage in, garbage out’ translation problem

Source: The Register
Story flagged by: Maria Kopnitsky

Google’s ever-so-clever Google Translate service may be falling foul of a problem known to grizzled engineers across the globe: garbage in, garbage out.

The problem was discussed by Google’s director of research, Peter Norvig at the Nasa Innovative Advanced Concepts conference at Stanford, California on Wednesday, in response to a question by an audience member.

Norvig admitted that Google was “aware” of a problem caused by some sites using Google’s services to translate their body copy into another language to create a localized version of their site.

The problem with this cut-rate method (bare cupboards of out-of-work translators aside) is that if Google indexes this site, it may then factor the translation into the models it itself uses to train its own machine-translation engine.

This post-modern problem means that Google’s machines may be training themselves on data generated by Google’s machines, which means that rather than getting incrementally better with each new model, they just stagnate.

“It’s not a big problem yet – it could get worse,” Norvig said. “We mostly address it by judging the quality of a site. If you look good, we’ll keep your examples; if you look sketchy we’ll toss them out.” More.

See: The Register

Subscribe to the translation news daily digest here. See more translation news.

Comments about this article


Google admits 'garbage in, garbage out' translation problem
Jeff Whittaker
Jeff Whittaker  Identity Verified
United States
Local time: 00:31
Spanish to English
+ ...
Wow! I predicted this on ProZ.com three years ago Feb 8, 2014

I even used the same phrase: Garbage In, Garbage Out.
See this post:
http://www.proz.com/forum/machine_translation_mt/186784-the_future_of_google_translate.html

[Edited at 2014-02-08 00:30 GMT]


 
Post removed: This post was hidden by a moderator or staff member for the following reason: Empty, duplicate post
Orrin Cummins
Orrin Cummins  Identity Verified
Japan
Local time: 13:31
Japanese to English
+ ...
As the old saying goes Feb 8, 2014

You get what you pay for, I guess.

 
Claudia Cherici
Claudia Cherici  Identity Verified
Italy
Local time: 06:31
Member (2010)
English to Italian
+ ...
well spotted Feb 8, 2014

well done Jeff, you spotted the exact problem with the Google trans system and using even the exact wording is rather impressive, I must say

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 06:31
Member (2006)
English to Afrikaans
+ ...
The comment about watermarking is more interesting than the so-called admission Feb 8, 2014

The original video

The exact words that were spoken, and the question that prompted it, can be heard here:
http://www.livestream.com/niac2014/video?clipId=pla_d5da38fb-0dfb-4dab-8f03-57e6de1ef672
(at minute 51 to minute 53)

There was no admission, however. The man from Google simply "said it" -- he did no
... See more
The original video

The exact words that were spoken, and the question that prompted it, can be heard here:
http://www.livestream.com/niac2014/video?clipId=pla_d5da38fb-0dfb-4dab-8f03-57e6de1ef672
(at minute 51 to minute 53)

There was no admission, however. The man from Google simply "said it" -- he did not "admit to it". I can understand that a news editor might use "admit" in a heading because it is shorter than "acknowledge", but if the news writer persists in referring to the statement as an "admission" throughout the news report is bad journalism, in my opinion.

The question was not about garbage in general but about a specific type of garbage, namely content that was translated by Google itself and left unedited. The question was about the danger of Google using content that it itself had translated, to improve its machine translation system. The Google man's answer is that they are aware of that danger but don't think that it is a threat at this time. In other words, while we can speculate about a worst case scenario, the engineers at Google Translate are not blind to this issue and do actually keep an eye on it. This does not make me trust Google Translate any less.

On watermarking

The Google man told about one experimental method that they used to be able to recognise translations that were translated by Google. They don't use that method any more, but may use it again later. It involves classifying each word in a language as "even" or "odd", and when a translation is about to be generated, and multiple valid word sequences are available for that text, Google would favour a sequence that produces "all even words" or "all odd words" in a phrase. The human reader won't notice the difference, but Google will be able to spot large chunks of all even-classified words or all odd-classified words in web sites that they scrape, and know that the translation is therefore more likely a machine translation. Very clever, IMO.

The fact that Google Translate includes non-printing control characters into its translations may also be a form of watermarking. If you do a translation in Google and copy/paste it into MS Word and enable display of non-printing characters, you will sometimes see those characters show up as grey blocks. They are not printed or visible under normal circumstances (e.g. on web sites or PDFs or other files translated with Google Translate) but they are there and can be detected. In fact, you can search for them in MS Word... their code is ChrW(8203).

With regard to what the Google man said about evaluating the quality of the content, I did notice that about a year or two ago Google Translate changed its output so that it is deliberately poor, from a typesetting point of view. Many translated phrases now start with a lowercase letter even if the source text started with an uppercase letter, or vice versa, and the translated text contains spacing errors next to certain types of punctuation that "good quality" authors would never permit or commit.


[Edited at 2014-02-08 10:30 GMT]
Collapse


 
LilianNekipelov
LilianNekipelov  Identity Verified
United States
Local time: 00:31
Russian to English
+ ...
All their translations are odd, anyhow, Feb 8, 2014

so why do they even bother. The spacing problem--yes, no surprise. The spacing problem becomes more and more annoying even when you, personally--not a machine, are typing. Also, some letters are often skipped or reversed. It is a real pain when you try to type directly on the internet these days.




[Edited at 2014-02-08 11:58 GMT]


 
DLyons
DLyons  Identity Verified
Ireland
Local time: 05:31
Spanish to English
+ ...
Some sites need to be filtered. Feb 8, 2014

Of course sites such as Alibaba should be ignored (or better filtered out from Google hits) by translators. But that's a different problem from Google self-training - watermarking may help Google to recognized and eliminate its own translations from its training material.

 
Maxime Bujakov
Maxime Bujakov  Identity Verified
France
Local time: 06:31
French to English
+ ...
Machine translation Feb 10, 2014

Funny! I love that say too - junk in - junk out. (Even shorter and more rhythmicon_smile.gif)

Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved,
... See more
Funny! I love that say too - junk in - junk out. (Even shorter and more rhythmicon_smile.gif)

Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my output in 99% of the word count.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).
Collapse


 
Jeff Whittaker
Jeff Whittaker  Identity Verified
United States
Local time: 00:31
Spanish to English
+ ...
Yes, but... Feb 13, 2014

did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.

I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make th
... See more
did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.

I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make this assumption with MT, and therefore, each and every line of the source text must be read and compared with the translation. In other words, even if it could be shown that it takes less time to "post-edit" MT translation, the editing process should take twice as long as a standard editing job, thus nullifying most of the time savings.

What you end up getting is a text where no one has read the majority of the original source document and no one can be sure that the "translation" that now sounds all good and grammatical, is in fact a translation at all.


Maxime Bujakov wrote:

Funny! I love that say too - junk in - junk out. (Even shorter and more rhythmicon_smile.gif)

Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my output in 99% of the word count.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).
Collapse


 
Maxime Bujakov
Maxime Bujakov  Identity Verified
France
Local time: 06:31
French to English
+ ...
20% increase in my productivity due to the M Feb 18, 2014

Jeff Whittaker wrote:

did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.

I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make this assumption with MT, and therefore, each and every line of the source text must be read and compared with the translation. In other words, even if it could be shown that it takes less time to "post-edit" MT translation, the editing process should take twice as long as a standard editing job, thus nullifying most of the time savings.

What you end up getting is a text where no one has read the majority of the original source document and no one can be sure that the "translation" that now sounds all good and grammatical, is in fact a translation at all.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).
[/quote]

Jeff, of course as a translator in charge I read it all, that's why I still spent 80% of my regular typing time.

The editor must have tracked the source as well plus was a good reference to know that my writing style did not deteriorate.

MT is also surprisingly good at suggesting very appropriate words in some of the most difficult cases - like when you sit and think for minutes over one single word.

Finally, when it comes to someone's personal business operations in an unknown language environment MT revolutionized the life. I can practically read and write in Lithuanian, the oldest European language, having just a basic idea of the language structure.


 

Sign in to add a comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Jared Tabor[Call to this topic]

You can also contact site staff by submitting a support request »
This discussion can also be accessed via the ProZ.com forum pages.


Translation news
Stay informed on what is happening in the industry, by sharing and discussing translation industry news stories.

All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search