https://www.proz.com/forum/general_technical_issues/241763-how_to_count_the_number_of_words_on_a_website_suggestions_needed.html

How to count the number of words on a website - suggestions needed
Thread poster: NihaoCeci
NihaoCeci
NihaoCeci
Local time: 10:00
Italian to Chinese
+ ...
Jan 22, 2013

Hello, I was asked for a quote for the translation of two websites (I only have the URL). I was wondering if there is a way to perform a fast word count without copying and pasting each single page. Any advise? Any software program?
Thank you!


 
Mohamed Mahmoud
Mohamed Mahmoud  Identity Verified
Egypt
Local time: 10:00
Member (2011)
English to Arabic
+ ...
CAT Tools Jan 22, 2013

I think your best quote shall depend on some accurate word count so you can use HTT track or Internet Download manger to download the whole website then import it as a project in SDL or Wordfast Pro where you can perform accurate analysis of the whole project with clear idea of matching and ETC.

 
NihaoCeci
NihaoCeci
Local time: 10:00
Italian to Chinese
+ ...
TOPIC STARTER
Studio Jan 22, 2013

I downloaded the entire website and I'll try to import it in Studio 2009. I guess I have to import the whole folder (140mb). Right? Thanks!

 
Drew MacFadyen
Drew MacFadyen  Identity Verified
United States
Local time: 04:00
Spanish to English
+ ...
Easyling.com free quote option Jan 22, 2013

There are some new utilities specifically for website translation that offer services like word count to enable you to better quote the job. http://www.easyling.com/website-translators-agencies/ They offer a full service, end to end solution including an editing environment that shows the localized text in the translated website. There are also options to perform the translation in... See more
There are some new utilities specifically for website translation that offer services like word count to enable you to better quote the job. http://www.easyling.com/website-translators-agencies/ They offer a full service, end to end solution including an editing environment that shows the localized text in the translated website. There are also options to perform the translation in the CAT Tool of your choosing as well.

Good luck with your project.

Drew
Collapse


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 04:00
Member (2008)
French to English
+ ...
Be careful... Jan 22, 2013

...with website word counts.

There can be a lot more text on a website than is immediately obvious. For example, the "alt" attributes and other text within tags, drop down lists and other active content that can be within a script, etc., which can, in some sites, be a significant proportion of the text and which may not all be picked up by software.

If at all possible it's best to try to get the text from the webmaster in a Word document or similar because in any case
... See more
...with website word counts.

There can be a lot more text on a website than is immediately obvious. For example, the "alt" attributes and other text within tags, drop down lists and other active content that can be within a script, etc., which can, in some sites, be a significant proportion of the text and which may not all be picked up by software.

If at all possible it's best to try to get the text from the webmaster in a Word document or similar because in any case it's likely to require a coder to insert the text in the right places, without breaking the code.
Collapse


 
Cris_Pa
Cris_Pa  Identity Verified
Chile
Local time: 05:00
English to Spanish
a wordcount tool, Count Anything Jan 23, 2013

Maybe this tool can help you. I haven't used it with webpages, so I don't know if the wordcount is accurate.

Here is the link: http://felix-cat.com/tools/

The name of the tool is Count Anything.

Regards

[Edited at 2013-01-23 00:16 GMT]

[Edited at 2013-01-23 00:16 GMT]


 
Rolf Keller
Rolf Keller
Germany
Local time: 10:00
English to German
There is no such entity like "this website" Jan 23, 2013

John Fossey wrote:

active content


I don't think that one can develop a tool that is able to download any and all active content from any website.

Moreover, many web sites contain links to content that doesn't belong to the website in question. But a tool cannot distinguish between different pieces of content, it can look at the included URLs only. Unfortunately the directory/url structure of some websites is neither "clean" nor free of "inactive members" or aliases.


 
NihaoCeci
NihaoCeci
Local time: 10:00
Italian to Chinese
+ ...
TOPIC STARTER
Easyling Jan 23, 2013

The attempt I made with Studio 2009 by downloading the content of the website in a folder and analyzing it, was useless since there were dozens of subfolders with non editable files that I could not process. On the other hand, Easyling.com seems to be very interesting. I will have to test it on a few websites I know to check whether it is reliable or not.
Thank you everyone!


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 04:00
Member (2008)
French to English
+ ...
Some websites don't actually have web pages Jan 25, 2013

I have an inquiry this morning that illustrates the problem. Looking at the source code of the website reveals that it is generated by the Joomla content management system. This means that the webpages you see in the browser don't actually exist! They are created on the fly by Joomla as they are requested, from a database of snippets of text, code., etc. I can try to extract all the text for the sake of a quotation but the web developer will have to supply the text and insert it into the datab... See more
I have an inquiry this morning that illustrates the problem. Looking at the source code of the website reveals that it is generated by the Joomla content management system. This means that the webpages you see in the browser don't actually exist! They are created on the fly by Joomla as they are requested, from a database of snippets of text, code., etc. I can try to extract all the text for the sake of a quotation but the web developer will have to supply the text and insert it into the database if it becomes a job.Collapse


 
Anna Villegas
Anna Villegas
Mexico
Local time: 02:00
English to Spanish
WebBudget XT Jan 25, 2013

WebBudget XT is a world class software tool that helps language professionals and localization managers to quickly assess and translate the content of a web project.

Of course, it has a wordcount app.

They offer 15 days free trial:

http://www.webbudget.com/

Hope it helps.

Carvallo.


 
NihaoCeci
NihaoCeci
Local time: 10:00
Italian to Chinese
+ ...
TOPIC STARTER
Excel or doc? Jan 26, 2013

How can you extract the text of the website in doc or xls format? Does WebBudget XT allow that? Thanks.

 
Anna Villegas
Anna Villegas
Mexico
Local time: 02:00
English to Spanish
Yes... Jan 27, 2013

NihaoCeci wrote:

How can you extract the text of the website in doc or xls format? Does WebBudget XT allow that? Thanks.


Yes, Nihao, it does. Just find your way.

Good luck!


 
Anahit Simonyan
Anahit Simonyan  Identity Verified
Local time: 12:00
English to Armenian
+ ...
My own experience Feb 15, 2013

Perhaps my posting comes a little late, but I want to add my own experience with Webbudget, which has worked fine for downloading the content of the website and getting it translated in a CAT tools.

 
Hanna Sles (X)
Hanna Sles (X)
United States
Local time: 11:00
English to Ukrainian
+ ...
Website Word count tool Jan 12, 2019

Try this website word count tool - https://www.hannasles.com/word-frequency-counter/website-word-count-tool/
You will get total number of words, new words and repetitions. It works online. Free and easy to use.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to count the number of words on a website - suggestions needed






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »