https://www.proz.com/forum/scams/277910-lingvopoint_reduce_profile_info-page2.html

Pages in topic:   < [1 2]
Lingvopoint - reduce profile info?
Thread poster: Deirdre Brophy (X)
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 11:05
Member (2014)
Japanese to English
Automating OCR is not difficult Nov 19, 2014

Triston Goodwin wrote:
They would have to come to each of these adapted profiles individually, take a screenshot, run it through a OCR and then upload the information to their site. And that's assuming that they are able to identify which profiles weren't scanned during the crawl.

It's probably easier than you think.

For example, the PyTesser Python module uses the Tesseract OCR engine, so you could scrape a profile using something like Beautiful Soup and check to see if the profile contains any large images. If it does, pass the image to the OCR engine and parse the (text) result. Not perfect, but likely good enough. Otherwise parse the profile text as usual.

In theory - I say that because it is ProZ, not ourselves, who controls the site - we have two obvious choices. First, try to close the site completely, which would, whatever some members may think, have a chilling effect on site use. Second, just accept that scammers are a fact of life in any profession that delivers intangible services over the internet, and work round them.

Dan


 
Triston Goodwin
Triston Goodwin  Identity Verified
United States
Local time: 04:05
Spanish to English
+ ...
Automated OCR Nov 19, 2014

Dan Lucas wrote:

Triston Goodwin wrote:
They would have to come to each of these adapted profiles individually, take a screenshot, run it through a OCR and then upload the information to their site. And that's assuming that they are able to identify which profiles weren't scanned during the crawl.

It's probably easier than you think.

For example, the PyTesser Python module uses the Tesseract OCR engine, so you could scrape a profile using something like Beautiful Soup and check to see if the profile contains any large images. If it does, pass the image to the OCR engine and parse the (text) result. Not perfect, but likely good enough. Otherwise parse the profile text as usual.

In theory - I say that because it is ProZ, not ourselves, who controls the site - we have two obvious choices. First, try to close the site completely, which would, whatever some members may think, have a chilling effect on site use. Second, just accept that scammers are a fact of life in any profession that delivers intangible services over the internet, and work round them.

Dan




I think you're right. I personally lean more towards the second option.

I haven't seen this kind of automated OCR tool before. Using an image might still be effective at first, since it's not something we really see here on Proz. I know Google sure had a hard time with my profile when I used an image instead of text for my About Me a few months ago.


 
Thayenga
Thayenga  Identity Verified
Germany
Local time: 12:05
Member (2009)
English to German
+ ...
Additionally Nov 20, 2014

Maija Cirule wrote:

As a preventive action, I have included in my "About me" text the following sentence: For business correspondence, I use ONLY the EMAIL address WITH THE DOMAIN NAME specified in my profile, no gmail, yahoo, hotmail, etc., therefore, any my business-related e-mails from free email addresses are INVALID. Besides, I have encrypted my CV (of course, it can be typed but cannot be copied or edited). And last but not the least: never ever include your e-mail address in your CV or elsewhere


My CV's are not publicly available, only upon request, and then they include no sensitive information. Address, Skype, location, email address, etc. will be provided upon first job assignment on my invoice. This might "scare off" a few possible customers, but if an agency or an end-client is serious and legitimate, they understand these precautions that protect both parties. Additionally I have password-protected my PDF business brochures so they cannot be copied or printed - only typed if someone has the time. They also have my name in text fields/watermarks across the pages so that screenshots cannot be "marketed".


 
DLyons
DLyons  Identity Verified
Ireland
Local time: 11:05
Spanish to English
+ ...
Can be bypassed Nov 20, 2014

Thayenga wrote:

Additionally I have password-protected my PDF business brochures so they cannot be copied or printed - only typed if someone has the time. They also have my name in text fields/watermarks across the pages so that screenshots cannot be "marketed".



It's not hard to get around password-protection on PDFs. But time is money to scammers, so usually they just ignore anything that takes extra effort and move on to someone else.


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Lucia Leszinsky[Call to this topic]

You can also contact site staff by submitting a support request »

Lingvopoint - reduce profile info?


Translation news





Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »