Converting dates formats using autotranslate rules Thread poster: iyavor
| iyavor Local time: 20:26 Hebrew to English + ...
Hi everyone, I'm trying to do the following automatic conversion: If the date is given as 10.08.2015, 10-08-2015, 10/08/2015, 10.8.2015, 10.08.15 or 10.8.15, and so on, I want to autotranslate to August 10, 2015. This is the rule I wrote: (\s|:|-)(#days#)(/|\.)(#month#)(/|.)(\d{2})(\s|$|\.|#end#) (\s|:|-)(#days#)(/|\.)(#month#)(/|\.)(\d{4})(:|\s|\.|#end#) I included a #days# and #month# translation pair l... See more Hi everyone, I'm trying to do the following automatic conversion: If the date is given as 10.08.2015, 10-08-2015, 10/08/2015, 10.8.2015, 10.08.15 or 10.8.15, and so on, I want to autotranslate to August 10, 2015. This is the rule I wrote: (\s|:|-)(#days#)(/|\.)(#month#)(/|.)(\d{2})(\s|$|\.|#end#) (\s|:|-)(#days#)(/|\.)(#month#)(/|\.)(\d{4})(:|\s|\.|#end#) I included a #days# and #month# translation pair list (e.g. 01 -> 1, 08 -> August, etc.). Still - it does not seem to work properly and it doesn't identify the date in my source text. Can someone please check my code and tell me what is wrong with it? Thank you Ilan ▲ Collapse | | |
Hi, Try this: \b(#days#)(/|\.|-)(#month#)(/|\.|-)(\d{4}|\d{2})\b I think your first and last capturing groups were not necessary to match your patterns, so I substituted them to \b to just mark there is a start or end of a word there. I'm not sure if #end# can even be used outside segmentation rules (unless you created it as a custom list). Also, the hyphen was missing as a possible separator and I combined both rules in one to allow years exp... See more Hi, Try this: \b(#days#)(/|\.|-)(#month#)(/|\.|-)(\d{4}|\d{2})\b I think your first and last capturing groups were not necessary to match your patterns, so I substituted them to \b to just mark there is a start or end of a word there. I'm not sure if #end# can even be used outside segmentation rules (unless you created it as a custom list). Also, the hyphen was missing as a possible separator and I combined both rules in one to allow years expressed as yy and yyyy in the same line, although you may want to keep them separated if you wish to turn 15 into 2015, for example. The expression above is as closest as possible to your original expression, but the following should also work and uses less capturing groups, which may simplify the replacing expression: \b(#days#)[/.-](#month#)[/.-](\d{4}|\d{2})\b ▲ Collapse | | | iyavor Local time: 20:26 Hebrew to English + ... TOPIC STARTER Thanks - questions | Aug 13, 2015 |
Hi Manuel - Thanks for your reply! Question: what does the "\b" do? I'm unfamiliar with this special character. I do want to convert any year to a four-digit year. Usually the years are beyond the year 2000, but not always, so I could have two possible autotranslations that I would choose from. Regarding your suggestion, I tested it with the date 4.2.12. It didn't work - here's what the test showed: 4.2.12(The source text contains regions wher... See more Hi Manuel - Thanks for your reply! Question: what does the "\b" do? I'm unfamiliar with this special character. I do want to convert any year to a four-digit year. Usually the years are beyond the year 2000, but not always, so I could have two possible autotranslations that I would choose from. Regarding your suggestion, I tested it with the date 4.2.12. It didn't work - here's what the test showed: 4.2.12(The source text contains regions where multiple auto-translation rules can be applied; the intersecting parts remained unchanged. The possible substitutions are: 4.2.12->February 4, 12,4.2.12->February 4, 2012;) I've run into this same error message on previous attempts... what do you make of it? Ilan ▲ Collapse | | | The importance of \b | Aug 14, 2015 |
\b marks the start or end of a word. I believe it stands for boundary. So \bPhone would match Phone and Phones, but not iPhone. The message you get appears when either one rule matches several parts of the sample text or, as in your case, part of the text is matched by several rules. So 4.2.12 seems to be matched by two expressions and memoQ cannot decide which one to apply. To avoid this, make sure you create unambiguos expressions, such as these ones: \b(#days#)[/.-](... See more \b marks the start or end of a word. I believe it stands for boundary. So \bPhone would match Phone and Phones, but not iPhone. The message you get appears when either one rule matches several parts of the sample text or, as in your case, part of the text is matched by several rules. So 4.2.12 seems to be matched by two expressions and memoQ cannot decide which one to apply. To avoid this, make sure you create unambiguos expressions, such as these ones: \b(#days#)[/.-](#month#)[/.-](\d{4})\b \b(#days#)[/.-](#month#)[/.-](\d{2})\b Here is where the \b operator is important. Without it, \d{2} also matches the first two digits of a four-digit year, thus creating the ambiguity reported. If the two-digit years may apply to the 1900s or the 2000s, you would actually need three rules, not just two. One solution could be using two rules instead of one for the two-digit years: \b(#days#)[/.-](#month#)[/.-]([3-9]\d)\b \b(#days#)[/.-](#month#)[/.-]([0-2]\d)\b And then replacing the first expression by $2 $1, 19$3 and the second one by $2 $1, 20$3. I assumed that all years above 30 refer to the 1900s, while all below 30 refer to the 2000s, but it can be changed to suit your case. ▲ Collapse | |
|
|
iyavor Local time: 20:26 Hebrew to English + ... TOPIC STARTER Thanks for the explanation - still struggling... | Sep 2, 2015 |
Hi Manuel, Your explanation is quite thorough. Thank you. I am now finding that my rule is detecting some dates, but not all of them. Often, in Hebrew, dates are written with a dash in front of them. Since Hebrew is a RTL language - this is what it would look like מילה ב-12.12.14 The dash is actually before the date in the source text, because of the RTL. I tried to write a REGEX to detect these as well, and these is what I ca... See more Hi Manuel, Your explanation is quite thorough. Thank you. I am now finding that my rule is detecting some dates, but not all of them. Often, in Hebrew, dates are written with a dash in front of them. Since Hebrew is a RTL language - this is what it would look like מילה ב-12.12.14 The dash is actually before the date in the source text, because of the RTL. I tried to write a REGEX to detect these as well, and these is what I came up with: (\b|-)(#days#)[/.-](#month#)[/.-](\d{4})[\b] Now, it should detect the dates - whether they are "stand-alone", or immediately following a hyphen (-). I also tried having a separate rule for each one: (-)(#days#)[/.-](#month#)[/.-](\d{4})[\b] (\b|-)(#days#)[/.-](#month#)[/.-](\d{4})[\b] (-)(#days#)[/.-](#month#)[/.-](\d{4})[\b] (\b|-)(#days#)[/.-](#month#)[/.-](\d{4})[\b] Doesn't work. It does not detect the date after a hyphen, but it does detect the date if it's a separate word... ▲ Collapse | | |
Hi again, Could you provide any examples of the dates you are trying to match? The example posted above for Hebrew, for example, does not match the expressions you posted since it only has two digits for the year instead of four and the dash in the date is (I think) next to the year, not the day digits as in the regex expressions. | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Converting dates formats using autotranslate rules CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
| TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |