I wish they'd limit it to just stopping credential stuffing.
Here's my scenario: My electricity provider publishes the month's electricity rates on the first of the month, I want to scrape these so that I can update the prices in Home Assistant. This is a very simple task, and it's something that Home Assistant can do with a little configuration. Unfortunately this worked exactly once, after that it started serving up some JavaScript to check my browser.
The information I'm trying to get is public and can be accessed without any kind of authentication. I'm willing to bet that they flipped the anti-bot stuff on their load balancer on for the entire site instead of doing the extra work to only enable it for just electricitycompany.com/myaccount/ (where you do have to log in).
I also asked the company if they'd be willing/able to push the power rates out via the smart meters so that my interface box (Eagle-200) could pick it up, they said they have no plans to do so.
The next step is to scrape the web site for the provincial power regulator, which shows the power rates for each provider. Of course, the regulator's site has different issues (rounding, in particular), I haven't dug any further to see if I can make use of this.
All of this effort to get public information in an automated fashion.
At a minimum any scraper that doesn't execute JS needs to impersonate a screen reader user agent. Locking out disabled people has to be many levels of illegal in most countries.
As a disabled person myself I will go as far to suggest that the websites should allow unfettered bot access to the disabled i.e. anything a user without malicious intent is allowed to do on a platform should be allowed to be done by a bot for a disabled because accessibility & equity are a joke.
Social media platforms has made physical appearance as the first class citizen of the reputation economy. I'm not even talking about those platforms which outright bury content from the disabled as a policy, I'm talking about those platforms whose algorithms favor selfies, videos over text/URLs and thereby putting those with accessibility issues in severe disadvantage.
Why would you use such platforms one might say, Do something which has nothing to do with the reputation economy they might add; Well have you looked at LinkedIn lately? LinkedIn has become ubiquitous with professional job search and 30 second video intro is the very first thing on the profile, not the skills which the platform was meant to be when it was launched. One must be naive to claim that the physical appearance on that video or profile picture doesn't affect the job prospects(Several studies have stated otherwise).
It's not just the physical appearance, The action of creating videos or posting photos itself is hard as a time-constrained person[1] and so I think it's reasonable to ask the platform to allow bots to post deep-fake videos of the user doing silly things which these platform expects from an average user.
Blocking for not supporting JS isn't illegal nor a violation of the US ADA. You can add requirements for disabled people to access your services as long as it's reasonable, and the prevalence of screen readers that work with JS turned on likely qualifies requiring JS a reasonable request. It'd be like saying "you can't deny someone using a IE8 screen reader by only offering TLS 1.3".
> Unfortunately, the days of reliable non-JavaScript capable scraping are over.
Not really. In a lot of cases websites use JavaScript to call some API along with some on the fly generated token to prevent abuse.
As long as that token isn't captcha you can reverse engineer the site to do scraping without javascript and that is so much faster than browser based scraping.
I agree with this. This is what I see on a lot of sites I scrape. Reverse engineering the JS to figure out how the fuck the token was generated is a bitch though.
So then you use headless browsers to render the js and that is even hackier, but totally worth it to hit another full webpage request to get the token, so you can go back to plain requests.
I don't think token is only the thing it comes to play here. If the company wants they can use various other techniques like fingerprinting, tls fingerprinting and lot of thing.
Its just a cat and mouse game. After few year I think hardware attention etc will come to play which can mitigate bot issue somewhat.
I’m not sure what rate you are trying to get but the electric market in the US has 5 minute settlement periods. So for your region you would need to grab the price for each period and average that to get a power rate. Take that rate and add transmission fees, taxes and various other fees your provider tacks on then multiply that by usage. In Texas you can go directly to the ERCOT site and get these prices and not worry about counter measures. I’m not sure where you are but there is likely a similar whole sale site that you can access.
I had never heard of this, but it looks like a reasonable option. My go-to for this type of thing would be Python+Selenium+Firefox, but only due to familiarity with those.
Playwright is easy to get started with. The even tools that allow you to record your browser actions and covert it into code ( https://playwright.dev/ ).
Out of curiosity how is that you have electricity rates that change every month? Are you buying power through a third party organization? The vast majority of place I've seen have a fixed tariff for residential use that changes no more often than every 12-24 months.
In some cases it is because the consumer has opted for variable rates, essentially making a bet that net-net, variable rates will be less expensive than the fixed rates. Or that they would be able to shift usage to reduce usage during spikes. See: https://www.texastribune.org/2021/02/22/texas-pauses-electri...
Feb 2021 FT:
"Bills mount in Texas power market after freeze sends prices soaring:
Financial casualties emerge as grid operator Ercot requires billions in payments
"
In Alberta, the electricity system has been deregulated so you can buy from numerous providers. The Utilities Consumer Advocate shows 187[0] different electricity plans available in my city. My currently plan and provider changes rates monthly, but some providers allow you to sign up for 3-year or 5-year fixed-rate plans.
Thanks to deregulation, the electricity rate isn't the only thing you pay for though. There is also a Transmission Charge, Distribution Charge, and Local Access Fee. These are all per-kWh charges and change very rarely.
In October, my electricity rate is $0.10730/kWh, but my total cost is actually $0.16346/kWh plus the per-day charge ($0.202/day). Tomorrow the November rate will be published.
This also depends on the country. Where I live (Europe) the rate now changes by the minute, or thereabouts. That was made possible after everybody had to change to wireless meters. Sometimes you'll get a warning in advance - a newspaper may write "If you live here or here, don't do your cooking at this particular hour". Some providers still have fixed rate options, some apparently don't. What I dislike the most is that they're trying to force us to run our washing during the night, something the insurance company and the fire department warn intensely against. And I don't want to be at sleep if a fire starts (which happens here and there, through the year). But that's what the pricing scheme tries to enforce.
I’m curious to see the stats they are relying on and the communications materials the fire department and insurance company are using on this topic.
It seems to me from a life-safety angle that their energy would likely be far better spent on recommending smoke alarms, CO meters, and periodic cleaning of dryer vents than on recommendations against sleeping with washing/drying machines running.
They do all of that as well, of course. The problem is that when it does happen (and statistically, it will, somewhere, at some point) there's a chance you don't hear the alarm (very common - just this morning there was a newspaper story about someone who were saved by the neighbours, they didn't wake up right away even with all the alarms blaring. What the alarms did though was to alert the fire department, as per their setup).
In short - if there's a fire it's much better that you're awake and up already.
Octopus Energy in the UK has a tariff that charges half-hourly rates. They also offer an API to interrogate current pricing and usage and encourage their customers - including domestic users - to take advantage of it:
https://developer.octopus.energy/docs/api/
Here's my scenario: My electricity provider publishes the month's electricity rates on the first of the month, I want to scrape these so that I can update the prices in Home Assistant. This is a very simple task, and it's something that Home Assistant can do with a little configuration. Unfortunately this worked exactly once, after that it started serving up some JavaScript to check my browser.
The information I'm trying to get is public and can be accessed without any kind of authentication. I'm willing to bet that they flipped the anti-bot stuff on their load balancer on for the entire site instead of doing the extra work to only enable it for just electricitycompany.com/myaccount/ (where you do have to log in).
I also asked the company if they'd be willing/able to push the power rates out via the smart meters so that my interface box (Eagle-200) could pick it up, they said they have no plans to do so.
The next step is to scrape the web site for the provincial power regulator, which shows the power rates for each provider. Of course, the regulator's site has different issues (rounding, in particular), I haven't dug any further to see if I can make use of this.
All of this effort to get public information in an automated fashion.