Word count for html files Initiator des Themas: o-callaghan
| o-callaghan Deutschland Local time: 13:34 Deutsch > Englisch
Hi,
I am using the Beta version of OmegaT to translate html files on a Mac. When I use the word count feature in OmegaT it gives me a completely different result to openoffice (more than a 1,000 word difference). I also tried an online tool http://www.wordcounttool.com/ just to test the difference and this gave me another result again. When I tested this using a short test document, the online tool w... See more Hi,
I am using the Beta version of OmegaT to translate html files on a Mac. When I use the word count feature in OmegaT it gives me a completely different result to openoffice (more than a 1,000 word difference). I also tried an online tool http://www.wordcounttool.com/ just to test the difference and this gave me another result again. When I tested this using a short test document, the online tool was the most accurate but since I am billing a customer I need to be sure that the method is accurate.
What is the best way of counting the words? I wouldn't expect OmegaT to include words inside in the word count, but it seems like it does. Is there a way of removing the tags so that I can use the OmegaT word count function?
Thanks for your help,
Amy ▲ Collapse | | | Didier Briel Frankreich Local time: 13:34 Englisch > Französisch + ... Check the options in the HTML filter | Jun 1, 2011 |
gocacp wrote:
I am using the Beta version of OmegaT to translate html files on a Mac. When I use the word count feature in OmegaT it gives me a completely different result to openoffice (more than a 1,000 word difference).
Such a difference is not usual.
Check the options in the HTML filter (or the XHTML filter, depending on your source files), and uncheck things you are not translating.
I also tried an online tool http://www.wordcounttool.com/ just to test the difference and this gave me another result again. When I tested this using a short test document, the online tool was the most accurate but since I am billing a customer I need to be sure that the method is accurate.
There is no such thing as an accurate word count. There are different methods, all giving different results. The important thing is to understand what is being counted.
I wouldn't expect OmegaT to include words inside < > in the word count, but it seems like it does.
It doesn't.
Is there a way of removing the tags so that I can use the OmegaT word count function?
OmegaT doesn't count the tags.
What may happen is that you are declaring things as translatable (e.g., images) while they are not to be translated for this project.
Didier | | | Manticore (X) Local time: 14:34 Englisch > Deutsch + ...
It might interest you - I have just started translating a large *.docx text. OmegaT is better than anything else on the market, irrespective of price. | | | Didier Briel Frankreich Local time: 13:34 Englisch > Französisch + ... Thank you for the feedback | Jun 3, 2011 |
Roland Fischer wrote:
It might interest you - I have just started translating a large *.docx text. OmegaT is better than anything else on the market, irrespective of price.
Thank you for the feedback.
OmegaT relies on its user community.
There are plenty ways of getting involved, from a simple "yes" on Sourceforge, to more active roles.
Didier | |
|
|
Post removed: This post was hidden by a moderator or staff member because it was not in line with site rule | Malcolm Rowe Vereinigtes Königreich Local time: 12:34 Französisch > Englisch + ... large variance between OmegaT, SmartCAT and pasting text into Word, when counting words in Excel | Mar 5, 2018 |
I just tried assessing 9 xlsx files in OmegaT and got a total word count of 11063. I ran the same files through SmartCAT and got a total word count of 18,336.
Copying out the text from the largest file into a Word file, not including segments that were just numbers, I got a count of 12,053 from Microsoft Word's built-in word count. Including the numbers, this came to 19,147.
SmartCAT counted the same document at 13,187 words and counted 2,372 segments that contained just num... See more I just tried assessing 9 xlsx files in OmegaT and got a total word count of 11063. I ran the same files through SmartCAT and got a total word count of 18,336.
Copying out the text from the largest file into a Word file, not including segments that were just numbers, I got a count of 12,053 from Microsoft Word's built-in word count. Including the numbers, this came to 19,147.
SmartCAT counted the same document at 13,187 words and counted 2,372 segments that contained just numbers or symbols, which, I think, were not included in the word count. OmegaT counted this file at 8,029 words.
This variance seems enormous. I can understand if it's not counting number/symbol-only segments, which, I think, counts for much of the discrepancy between SmartCAT and Word but, even allowing for that, OmegaT's count comes out at about two thirds of MS Word's count. There is a lot of repetition but this should, surely, just be shown in the statistics and not affect the total words.
Do I have something majorly wrong in my OmegaT settings or have I somehow misunderstood how OmegaT presents word counts?
Can anyone explain how I might have got such different word counts and what I can do to restore my faith in the statistics generated by these CAT tools? I am using OmegaT 3.6.0 update 8.
Thanks.
Malc ▲ Collapse | | | Didier Briel Frankreich Local time: 13:34 Englisch > Französisch + ... OmegaT does not count repetitions on XLSX files | Mar 5, 2018 |
ma1cius wrote:
I just tried assessing 9 xlsx files in OmegaT and got a total word count of 11063. I ran the same files through SmartCAT and got a total word count of 18,336.
OmegaT does not count repetitions in XSLS files, simply because they are not in the file (Microsoft removes them). To get a word count including repetitions, save the XSLS file under another format (e.g., XML 2003 spreadsheet).
Copying out the text from the largest file into a Word file, not including segments that were just numbers, I got a count of 12,053 from Microsoft Word's built-in word count. Including the numbers, this came to 19,147.
SmartCAT counted the same document at 13,187 words and counted 2,372 segments that contained just numbers or symbols, which, I think, were not included in the word count. OmegaT counted this file at 8,029 words.
It's not usual. Generally, OmegaT is rather close to Word.
This variance seems enormous. I can understand if it's not counting number/symbol-only segments,
Indeed, OmegaT does not count numbers.
which, I think, counts for much of the discrepancy between SmartCAT and Word but, even allowing for that, OmegaT's count comes out at about two thirds of MS Word's count. There is a lot of repetition but this should, surely, just be shown in the statistics and not affect the total words.
As I wrote above, this is not usual for Word documents. Have you checked what is loaded or not in OmegaT for the Word filter? Options > File Filters > Microsoft XML.
Do I have something majorly wrong in my OmegaT settings or have I somehow misunderstood how OmegaT presents word counts?
Another setting that might affect word count is Options > Tag processing (whether you include custom tags or not in statistics).
Can anyone explain how I might have got such different word counts and what I can do to restore my faith in the statistics generated by these CAT tools? I am using OmegaT 3.6.0 update 8.
For XLSX files, the explanation is obvious. For Word, it's hard to say without details.
Didier | | | Daniel Frisano Italien Local time: 13:34 Mitglied (2008) Englisch > Italienisch + ...
Right-click, select "Open with...", select MS Word, use that word count. | | | Dieses Forum wird von keinem Moderator betreut. Um Verstöße gegen die ProZ.com-Regeln zu melden oder um Hilfe zu erhalten, wenden Sie sich bitte an unsere ProZ.com-Mitarbeiter » Word count for html files Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
| Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |