If the shared cache ever became significant enough to matter it would be devasta...

finnthehuman · on March 26, 2021

I don't really understand your comment. Marketers, scammers and other abusers already publish to the web with the intention to be included in a crawl. Postprocessing crawl data is already a thing.

Assuming this hypothetical shared crawl cache were to exist, it does not preclude google (and all consumers of that cache) doing their own processing downstream of that cache. Does it?

What's the new attack vector?

topspin · on March 26, 2021

> I don't really understand your comment.

If you don't then you fail to appreciate the amount of labor it takes to thwart bad actors from ruining indexes. Abusers do publish to the web, and we enjoy not wallowing in their crap because small army of experienced and expensive people at a select few Big Tech companies are actively shielding us from it.

It's easy to anticipate the malcontent view; 'Google spends all its resources on ads and ranking and we don't need all that.' That is naïve; if Google completely neglected grooming out the bad actors people wouldn't use Google and Google's business model wouldn't be viable.

So the obvious question is; where is this mechanism without Google et. al? Will the published caches be 99% crap (and without an active defense against crap you can bet your life it will) and anything derived from it hopelessly polluted? If so then it isn't viable.

Now the instinct will be to find a groomer. Guess what; that's probably doomed too. No selection will be impartial to all, so you get to fight that battle. Good luck.

finnthehuman · on March 26, 2021

>Will the published caches be 99% crap

Yes. It will be exactly as crap as whatever's published on the web.

And the utility of google's search engine would be to perform their proprietary processing on top of the publicly-available crawl results. Analogous to how their search is already preforming proprietary processing on top of a crawl cache.

>If you don't then you fail to appreciate the amount of labor it takes to thwart bad actors from ruining indexes.

Did you miss the part where I said "Assuming this hypothetical shared crawl cache were to exist, it does not preclude google (and all consumers of that cache) doing their own processing downstream of that cache. Does it?"