Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>collating and creating these scrapes has absorbed an enormous amount of my time & energy due to the need to solve CAPTCHAs,...

Have you considered automating this?



I did, but the problem is that I never expected to be scraping for so long and it was always easier to just solve by hand than do a complete rewrite and allow for using CAPTCHA libraries. If I had known I would be scraping for ~3 years, I would have done many things differently: http://www.gwern.net/Black-market%20archives#how-to-crawl-ma...


The whole point of CAPTCHAs is to be difficult to automate. Or are you suggesting automating farming out the CAPTCHA solutions to cheap workers?



There's a plenty of service providers that sell APIs to captcha solving services at reasonable prices.


Or if you don't want to spend money, you can always re-host things on your own site. Let your visitors do the work for you.


I assumed it was a joke. But maybe not?


There are plenty of services that do exactly that. They're pretty cheap.


The main problem is that most CAPTCHAs are terrible, and do a better job of keeping humans out than robots.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: