Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Googlebot is pretty careful and generally doesn’t cause these problems.


Right, then they shouldn't be effected by the rate-limiting, as long as its reasonable. If it was applied evenly to all clients/crawlers, it'd at least allow the possibility for a respectful, well designed crawler to compete.


The problem is, if you own a website, it takes the same amount of resources to handle the crawl from Google and FooCrawler even if both are behaving, but I'm going to get a lot more ROI out of letting Google crawl, so I'm incentivized to block FooCrawler but not Google. In fact, the ROI from Google is so high I'm incentivized to devote extra resources just for them to crawl faster.


We know that. No one claims websites are doing this for no reason. It's explicitly written in the article.

But this sub-thread is about misbehaved crawlers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: