Converting dates formats using autotranslate rules
Thread poster: iyavor
iyavor
iyavor
Local time: 20:26
Hebrew to English
+ ...
Aug 10, 2015

Hi everyone,

I'm trying to do the following automatic conversion:

If the date is given as 10.08.2015, 10-08-2015, 10/08/2015, 10.8.2015, 10.08.15 or 10.8.15, and so on, I want to autotranslate to

August 10, 2015.


This is the rule I wrote:

(\s|:|-)(#days#)(/|\.)(#month#)(/|.)(\d{2})(\s|$|\.|#end#)
(\s|:|-)(#days#)(/|\.)(#month#)(/|\.)(\d{4})(:|\s|\.|#end#)

I included a #days# and #month# translation pair l
... See more
Hi everyone,

I'm trying to do the following automatic conversion:

If the date is given as 10.08.2015, 10-08-2015, 10/08/2015, 10.8.2015, 10.08.15 or 10.8.15, and so on, I want to autotranslate to

August 10, 2015.


This is the rule I wrote:

(\s|:|-)(#days#)(/|\.)(#month#)(/|.)(\d{2})(\s|$|\.|#end#)
(\s|:|-)(#days#)(/|\.)(#month#)(/|\.)(\d{4})(:|\s|\.|#end#)

I included a #days# and #month# translation pair list (e.g. 01 -> 1, 08 -> August, etc.).

Still - it does not seem to work properly and it doesn't identify the date in my source text.

Can someone please check my code and tell me what is wrong with it?

Thank you
Ilan
Collapse


 
Manuel Arcedillo
Manuel Arcedillo
Spain
Local time: 19:26
English to Spanish
Try this Aug 13, 2015

Hi,

Try this:

\b(#days#)(/|\.|-)(#month#)(/|\.|-)(\d{4}|\d{2})\b

I think your first and last capturing groups were not necessary to match your patterns, so I substituted them to \b to just mark there is a start or end of a word there. I'm not sure if #end# can even be used outside segmentation rules (unless you created it as a custom list).

Also, the hyphen was missing as a possible separator and I combined both rules in one to allow years exp
... See more
Hi,

Try this:

\b(#days#)(/|\.|-)(#month#)(/|\.|-)(\d{4}|\d{2})\b

I think your first and last capturing groups were not necessary to match your patterns, so I substituted them to \b to just mark there is a start or end of a word there. I'm not sure if #end# can even be used outside segmentation rules (unless you created it as a custom list).

Also, the hyphen was missing as a possible separator and I combined both rules in one to allow years expressed as yy and yyyy in the same line, although you may want to keep them separated if you wish to turn 15 into 2015, for example.

The expression above is as closest as possible to your original expression, but the following should also work and uses less capturing groups, which may simplify the replacing expression:

\b(#days#)[/.-](#month#)[/.-](\d{4}|\d{2})\b
Collapse


 
iyavor
iyavor
Local time: 20:26
Hebrew to English
+ ...
TOPIC STARTER
Thanks - questions Aug 13, 2015

Hi Manuel -

Thanks for your reply! Question: what does the "\b" do? I'm unfamiliar with this special character.

I do want to convert any year to a four-digit year. Usually the years are beyond the year 2000, but not always, so I could have two possible autotranslations that I would choose from.

Regarding your suggestion, I tested it with the date 4.2.12. It didn't work - here's what the test showed:

4.2.12(The source text contains regions wher
... See more
Hi Manuel -

Thanks for your reply! Question: what does the "\b" do? I'm unfamiliar with this special character.

I do want to convert any year to a four-digit year. Usually the years are beyond the year 2000, but not always, so I could have two possible autotranslations that I would choose from.

Regarding your suggestion, I tested it with the date 4.2.12. It didn't work - here's what the test showed:

4.2.12(The source text contains regions where multiple auto-translation rules can be applied; the intersecting parts remained unchanged. The possible substitutions are: 4.2.12->February 4, 12,4.2.12->February 4, 2012;)

I've run into this same error message on previous attempts... what do you make of it?

Ilan
Collapse


 
Manuel Arcedillo
Manuel Arcedillo
Spain
Local time: 19:26
English to Spanish
The importance of \b Aug 14, 2015

\b marks the start or end of a word. I believe it stands for boundary. So \bPhone would match Phone and Phones, but not iPhone.

The message you get appears when either one rule matches several parts of the sample text or, as in your case, part of the text is matched by several rules. So 4.2.12 seems to be matched by two expressions and memoQ cannot decide which one to apply. To avoid this, make sure you create unambiguos expressions, such as these ones:

\b(#days#)[/.-](
... See more
\b marks the start or end of a word. I believe it stands for boundary. So \bPhone would match Phone and Phones, but not iPhone.

The message you get appears when either one rule matches several parts of the sample text or, as in your case, part of the text is matched by several rules. So 4.2.12 seems to be matched by two expressions and memoQ cannot decide which one to apply. To avoid this, make sure you create unambiguos expressions, such as these ones:

\b(#days#)[/.-](#month#)[/.-](\d{4})\b
\b(#days#)[/.-](#month#)[/.-](\d{2})\b

Here is where the \b operator is important. Without it, \d{2} also matches the first two digits of a four-digit year, thus creating the ambiguity reported.

If the two-digit years may apply to the 1900s or the 2000s, you would actually need three rules, not just two. One solution could be using two rules instead of one for the two-digit years:

\b(#days#)[/.-](#month#)[/.-]([3-9]\d)\b
\b(#days#)[/.-](#month#)[/.-]([0-2]\d)\b

And then replacing the first expression by $2 $1, 19$3 and the second one by $2 $1, 20$3. I assumed that all years above 30 refer to the 1900s, while all below 30 refer to the 2000s, but it can be changed to suit your case.
Collapse


 
iyavor
iyavor
Local time: 20:26
Hebrew to English
+ ...
TOPIC STARTER
Thanks for the explanation - still struggling... Sep 2, 2015

Hi Manuel,

Your explanation is quite thorough. Thank you.

I am now finding that my rule is detecting some dates, but not all of them.
Often, in Hebrew, dates are written with a dash in front of them. Since Hebrew is a RTL language - this is what it would look like

מילה ב-12.12.14

The dash is actually before the date in the source text, because of the RTL.
I tried to write a REGEX to detect these as well, and these is what I ca
... See more
Hi Manuel,

Your explanation is quite thorough. Thank you.

I am now finding that my rule is detecting some dates, but not all of them.
Often, in Hebrew, dates are written with a dash in front of them. Since Hebrew is a RTL language - this is what it would look like

מילה ב-12.12.14

The dash is actually before the date in the source text, because of the RTL.
I tried to write a REGEX to detect these as well, and these is what I came up with:

(\b|-)(#days#)[/.-](#month#)[/.-](\d{4})[\b]

Now, it should detect the dates - whether they are "stand-alone", or immediately following a hyphen (-).

I also tried having a separate rule for each one:
(-)(#days#)[/.-](#month#)[/.-](\d{4})[\b]
(\b|-)(#days#)[/.-](#month#)[/.-](\d{4})[\b]
(-)(#days#)[/.-](#month#)[/.-](\d{4})[\b]
(\b|-)(#days#)[/.-](#month#)[/.-](\d{4})[\b]

Doesn't work. It does not detect the date after a hyphen, but it does detect the date if it's a separate word...
Collapse


 
Manuel Arcedillo
Manuel Arcedillo
Spain
Local time: 19:26
English to Spanish
Examples? Sep 17, 2015

Hi again,

Could you provide any examples of the dates you are trying to match? The example posted above for Hebrew, for example, does not match the expressions you posted since it only has two digits for the year instead of four and the dash in the date is (I think) next to the year, not the day digits as in the regex expressions.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting dates formats using autotranslate rules






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »