Dan Lucas wrote:
Triston Goodwin wrote:
They would have to come to each of these adapted profiles individually, take a screenshot, run it through a OCR and then upload the information to their site. And that's assuming that they are able to identify which profiles weren't scanned during the crawl.
It's probably easier than you think.
For example, the
PyTesser Python module uses the Tesseract OCR engine, so you could scrape a profile using something like
Beautiful Soup and check to see if the profile contains any large images. If it does, pass the image to the OCR engine and parse the (text) result. Not perfect, but likely good enough. Otherwise parse the profile text as usual.
In theory - I say that because it is ProZ, not ourselves, who controls the site - we have two obvious choices. First, try to close the site completely, which would, whatever some members may think, have a chilling effect on site use. Second, just accept that scammers are a fact of life in any profession that delivers intangible services over the internet, and work round them.
Dan
I think you're right. I personally lean more towards the second option.
I haven't seen this kind of automated OCR tool before. Using an image might still be effective at first, since it's not something we really see here on Proz. I know Google sure had a hard time with my profile when I used an image instead of text for my About Me a few months ago.