On the contrary, there are no laws that say you *can't* scrape a site. If it's a...

marginalia_nu · on Nov 1, 2021

There are laws against unauthorized computer access.

This is a scenario where you have a server explicitly saying "Stop! You are not permitted to access this computer!", and yet you persist in circumventing that by hiding your identity and accessing it anyway. Those are some murky waters.

fragmede · on Nov 1, 2021

For those that are interested in the specifics, Jamie Williams wrote a piece for the EFF[0] in the wake of hiQ vs Linkedin which dealt with this exact question.

It depends on who the server operator is. If it's your server, yeah, anyone I don't want to be there should go away. If it's your enemy's server, the argument that they're sending that page to the rest of the Internet turns out to be a decent one.

[0] https://www.eff.org/deeplinks/2018/04/scraping-just-automate...

Aeolun · on Nov 1, 2021

The server says nothing of the kind. The response that was previously positive is now broken, and it happens to be fixed if you access it from a different IP.

Maybe we need a status code that means ‘lay off all the requests made from this entire system’?

marginalia_nu · on Nov 1, 2021

How do you interpret a

401 Unauthorized

to mean you are authorized to access the resource?

jhgb · on Nov 1, 2021

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#cli...

> Although the HTTP standard specifies "unauthorized", semantically this response means "unauthenticated". That is, the client must authenticate itself to get the requested response.

So it would seem that it actually doesn't positively imply that you're NOT authorized.

Which kind of makes sense; machines can't detect legality of things, just that certain procedural niceties haven't been observed.

marginalia_nu · on Nov 1, 2021

Fine, send a 403 then.

> The client does not have access rights to the content; that is, it is unauthorized, so the server is refusing to give the requested resource.

Machines don't have any legal responsibility, bot-operators do. Which is why respecting these things is sort of important. At any rate, 40x does not mean "try again with a different user agent and another IP"

hattmall · on Nov 1, 2021

403 is per request, not requester. I get random 403s when just browsing some websites. Does that mean I should close the browser and not hit refresh for fear of breaking some wire fraud unauthorized access law?

marginalia_nu · on Nov 1, 2021

If you go by the semantics of what the 403 code means, absolutely, that's excatly what the status code means.

In practice there's of course nuance, like anyone will occasionally type in the wrong password on a log-in screen, maybe try again and then realize it was the wrong log-in prompt. That's mostly fine.

That's different from deliberate trying to circumvent a measure like this. If you are doing the stuff in the link, you are absolutely crossing a line and you know it.

There's a large difference between "I got a 403 so I hit F5 once" and "I got a 403 so I used a residential proxy and spoofed my user-agent".

jhgb · on Nov 1, 2021

I thought the semantics of 403 was "look for the actual semantics in the response body".

BeFlatXIII · on Nov 2, 2021

If I were on a jury, I’d vote to nullify any scraping case that made it this far.

dragonwriter · on Nov 1, 2021

> On the contrary, there are no laws that say you can't scrape a site.

You are both wrong: copyright law both says you can't (in some cases for some uses) and that you can (under implicit license, fair use, and other rules) in others.

EMIRELADERO · on Nov 1, 2021

Depends on what exactly is being scraped. If it's something like price data or exact values then it isn't protected by copyright at all.

kingcharles · on Nov 1, 2021

Price data can be protected by copyright as a compliation of data.

https://www.bitlaw.com/copyright/database.html

EMIRELADERO · on Nov 1, 2021

In that case, the data compilation itself would be protected, not the individual data points. If I used a scraper to copy everything verbatim, then yes, it would be a violation.