Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If the shared cache ever became significant enough to matter it would be devastated by marketers, scammers and other abusers. Google employs the groomers that make their index at least tolerable, if still clearly imperfect. Without that cadre of well compensated expertise to win the arms race against such abusers the scheme is not feasible.

I suppose this could be crowdsourced if I didn't know about politics and how any attempt at delegating the responsibility for blessing sites and their indexes would become a controversy. Google takes lots of heat about its behavior already, but Google is a private entity and can indulge its private prerogatives for the most part. Without that independence this couldn't function.



I don't really understand your comment. Marketers, scammers and other abusers already publish to the web with the intention to be included in a crawl. Postprocessing crawl data is already a thing.

Assuming this hypothetical shared crawl cache were to exist, it does not preclude google (and all consumers of that cache) doing their own processing downstream of that cache. Does it?

What's the new attack vector?


> I don't really understand your comment.

If you don't then you fail to appreciate the amount of labor it takes to thwart bad actors from ruining indexes. Abusers do publish to the web, and we enjoy not wallowing in their crap because small army of experienced and expensive people at a select few Big Tech companies are actively shielding us from it.

It's easy to anticipate the malcontent view; 'Google spends all its resources on ads and ranking and we don't need all that.' That is naïve; if Google completely neglected grooming out the bad actors people wouldn't use Google and Google's business model wouldn't be viable.

So the obvious question is; where is this mechanism without Google et. al? Will the published caches be 99% crap (and without an active defense against crap you can bet your life it will) and anything derived from it hopelessly polluted? If so then it isn't viable.

Now the instinct will be to find a groomer. Guess what; that's probably doomed too. No selection will be impartial to all, so you get to fight that battle. Good luck.


>Will the published caches be 99% crap

Yes. It will be exactly as crap as whatever's published on the web.

And the utility of google's search engine would be to perform their proprietary processing on top of the publicly-available crawl results. Analogous to how their search is already preforming proprietary processing on top of a crawl cache.

>If you don't then you fail to appreciate the amount of labor it takes to thwart bad actors from ruining indexes.

Did you miss the part where I said "Assuming this hypothetical shared crawl cache were to exist, it does not preclude google (and all consumers of that cache) doing their own processing downstream of that cache. Does it?"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: