Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a 501(c)(3) nonprofit that is not dependent on web traffic for revenue, is a decline in traffic necessarily bad?

I always assumed the need for metastatic growth was limited to VC-backed and ad-revenue dependent companies.



They are highly dependent on web traffic for revenue.

And their costs are even increasing because while human viewers are decreasing they are getting hugged to death by AI scrapes.


If you look up their latest annual report (https://wikimediafoundation.org/wp-content/uploads/2025/04/W...) you can see that they're allocating ~1.7% of their expenses towards hosting.

I doubt that they're getting "hugged to death" by AI scrapers.


People cite this figure a lot, but its a little misleading because when you own your own servers a lot of the expenses that are typically hosting actually fall under a different category.

If you use AWS, the people hired to manage the servers is part of the price tag. When you own your own you have to actually hire those people.


But in that case your costs don't go up because of AI scrapers, as you don't need to scale employees with traffic.


I mean, it's not like you can get away with running with zero SREs if you're running in the cloud. The personnel costs for on-prem hosting are vastly exaggerated, especially if you contract out the actual annoying work to a colo.


Smart hands is more expensive than having dedicated datacenter staff, and the dedicated staff do a considerably better job. It's worth noting that WMF runs _very_ lean in terms of its datacenter staff.

You're also ignoring the need for infrastructure/network engineers, software engineers, fundraising engineers, product managers, community managers, managers, HR, legal, finance/accounting, fundraisers, etc.


this is a very eye-opening read on their financials: https://en.wikipedia.org/wiki/User:Guy_Macon/Wikipedia_has_C...


The easy counters to this article are:

1. I think their spending is a good thing. Charitable scholarships for kids and initiatives to have a more educated populous in general are things that I am happy to donate to.

2. As stated in the article, hosting is still a relatively simple expenditure compared to the rest of their operation. If Wikipedia really eats a huge loss, falling back to just hosting wouldn't be unrealistic, especially since the actual operations of Wikipedia are mostly volunteer run anyways. In the absolute worst case, their free data exports would lead to someone making a successor that can be moved to more or less seamlessly.

The only real argument in my eyes is that their donation campaigns can seem manipulative. I still think it's fine at the end of the day given that Wikipedia is a free service and donating at all is entirely optional.


AFAIK, they don't do any scholarships or really do any educational activities. By far their biggest spending item is just $105 million for salaries, mainly for all of its leadership, which is a majority of its expenses.

The second biggest line item is grants at $25 million, primarily for users to travel to meet up.

Then $10 million for legal fees, $7 million for Wikipedia-hosted travel.

I think it's pretty unethical to say you have to donate to keep Wikipedia running when you're practically paying for C-suite raises and politically-aligned contributors' vacations.


In person meetings move things forward.

Paying the travel for a bunch of highly active volunteer contributors to meet up ocassionally and hash out complex community issues pays massive dividends. It keeps the site moving forward. Its also pretty cheap when you consider how much free labour those volunteers provide.

Whenever people criticize wikimedia finances, i think they miss the forest for the trees. I actually think there is a lot to potentially crticize, but in my opinion everyone goes for the wrong things.


What are the rights things to criticize in your opinion?

Also, asking out of ignorance, what things need to move forward? I thought wikipedia is a solved problem, the only work i would expect it to need is maintenance work, security patches etc.


> What are the rights things to criticize in your opinion?

I think criticism should be based on looking at what they were trying to accomplish by spending the money, was it a worthwhile thing to try and do and was the solution executed effectively.

Just saying they spent $X, X is a big number, it must be wasteful without considering the value that is attempting to be purchssed with that money is a bit meaningless.

> Also, asking out of ignorance, what things need to move forward? I thought wikipedia is a solved problem, the only work i would expect it to need is maintenance work, security patches etc.

I think the person who i was responding to was referring to volunteer travel not staff travel (which of course also happens but i believe would be a different budget line item). This would be mostly for people who write the articles but also for people who do moderation activity. In person meetings can help resolve intractable disputes, share best practises, figure out complex disagreements, build relationships. All the same reasons that real companies fly their staff to expensive offsites.

Software is never done, there are always going to be things that come up and things to be improved. Some of them may be worth it some not.

As an example, there are changes coming to how ip addresses are handled, especially for logged out users. Nobody is exactly saying why, but im 99% sure its GDPR compliance related. That is a big project due to some deeply held assumptions, and probably critical.

A more mid-tier example might be, last year WMF rolled out a (caching) server precense in Brazil. The goal was to reduce latency for South American users. Is that worth it? It was probably a fair bit of money. If WMF was broke it wouldn't be, but given they do have some money, it seems like a reasonable improvement to me. Reasonable minds could probably disagree of course.

And an example of stupid projects might be WMF's ill-fated attempt at making an AI summarizer. That was a pure waste of money.

I guess my point it, WMF is a pretty big entity, some of the things they do are good, some are stupid, and i think people should criticize the projects they embark on rather than the big sum of money taken out of context.


Isn't it true that only around 10% of Wikipedia massive budget is used to actually run the core website? The rest goes to bloated initiatives in the Wikimedia foundations orbit.


Page 21 of their 2024 annual report[1] has expenses listed. About $3,000,000 for web hosting, about $100,000,000 for salaries and benefits out of $178,000,000 total.

[1]: https://wikimediafoundation.org/wp-content/uploads/2025/04/W...


A web server is useless if you aren't paying someone to plug it in.

There is probably a lot to criticize, but you need to go deeper than "salaries" are bad. You need some of those to actually run the website.


Even at a quarter million a year that's 400 salaries. How many people are needed to maintain a website?


How many people are needed to maintain instagram, facebook, etc? Wikipedia isn't just a collection of static content. The page contents include static and dynamic content (via a lua scriptable set of templates), semantic data (wikidata), multimedia management (wikimedia commons), editing tools (WYSIWYG editing, with full support of the wiki markup and templating), global caching infrastructure, multiple datacenters for HA/DR, etc.

There's also the need to support the staff and volunteer developers, which includes wikimedia cloud services, git hosting, config management and orchestration, CI, community hosted tool/bot services, etc.

WMF has ~600 employees, and that's quite lean, for a service of their complexity.


That includes all of the Wikimedia websites and non-profit activities though, not just the functioning of Wikipedia.org proper. That is a much lower percentage of the total.


> they are getting hugged to death by AI scrapes.

Wikipedia is not getting hugged to death by AI scrapers.

The source letter shows a relatively small portion of traffic was reclassified as bot traffic.

They get a lot of page views globally. It’s a popular website. The bot traffic is not crushing their servers.


Infrastructure has long been a tiny portion of wikipedias costs. I think Wikipedia even makes it easy to export all of its data, I don’t think AI scrapers would be a significant new cost


These are two very different cost factors. Scrapers don't use the available data dumps, that's why they scrape. They are also kind of dumb as they get lost in link structures constantly, which leads to unecessary traffic spikes.


How is their revenue traffic-dependent?


Their traffic is potential donations

Something tells me a person is way less likely to donate if they're consuming the content through an LLM middleman


I don't know -- as I said in another comment, my Wikipedia usage has gone down 90% thanks to LLMs.

But that means I'm still using it. Especially for more reference stuff like lists of episodes, filmographies, etc. As well as equations, math techniques, etc.

If you're the kind of person who donates to Wikipedia, you're probably still using it some even if less, and continue to recognize its importance. Possibly even more, as a kind of collaboratively-edited authority like Wikipedia only becomes more important as AI "slop" becomes more prevalent across blogs etc.


But were you originally convinced to donate by one of those giant Jimmy Wales banners that come up once a year? People won't see them anymore if they're using AI summaries.


My point is, using Wikipedia 10% as much is still using Wikipedia.

Does it matter if you see the banner 10 times or 100 times in a month?


In terms of the effectiveness of the banner, yes it absolutely does. Multiple exposures increase your propensity to convert.


Bandwidth is a ridiculously tiny portion of Wiki foundation's spending


Especially considering a considerably amount of their bandwidth is free, via peering agreements.


scraping Wikipedia feels like the stupidest possible move. You can in fact download the entire encyclopedia at any time and take all the time in the world parsing offline.

For such purposes, I'd naively just setup some weekly job to download Wikipedia and then run a "scrape" on that. Even weekly may be overkill; a monthly snapshot may do more than enough.


You can download twice-monthly database dumps, but they consist of the raw wikitext, so you need to do a bunch of extra work to render templates and stuff. Meanwhile, if you write a generic scraper, it can connect to Wikipedia like it connects to any other website and get the correctly-rendered HTML. People who aren't interested in Wikipedia specifically but want to download pretty much the entire internet unsurprisingly choose the latter option.


as somebody that has wrassled with the wikipedia dumps a number of times, i don't understand why wiki doesn't release some sort of sdk that gives you the 'official' parse


I have wrestled with it too. I believe it's because wikitext is an ad-hoc format that evolved so that the only 100% correct parser/renderer is the MediaWiki implementation. It's like asking for an SDK that correctly parses Perl. Only Perl can do that.

There are a bunch of mainly-compatible third party parsers in various languages. The best one I've found so far is Sweble but even it mishandles a small percentage of rare cases.


This. I tried that a few years ago and fell off my chair when I started to realized how DYI the thing is. It's a bunch of unofficial scripts and half-assed out of date help pages.

At the time I though, well it's a bunch of hippies with a small budget, who can blame them? Now I learn that there is 600 of them with a budget in the hundreds of millions??

This is becoming another Mozilla foundation...


There are also dumps of Wikipedia in html format.


Do you mean the discontinued Enterprise HTML dumps https://dumps.wikimedia.org/other/enterprise_html/ or the even older discontinued static HTML dumps https://dumps.wikimedia.org/other/static_html_dumps/current/... or is there another set of dumps I'm not aware of?


[flagged]


They also serve images.


Some of them even fairly often: https://news.ycombinator.com/item?id=26072025


[flagged]


> In this test, the framework responds with the simplest of responses: a "Hello, World" message rendered as plain text.

What a ridiculous comparison. You either don't know what you are talking about, or are a troll. Maybe even both.


there are more complex tests in that benchmark. my point was, a decent server can serve tens or hundreds of thousands requests per second when it's a few kilobytes of static content from a RAM database.

as long as your web service isn't horribly mismanaged, even to a $5 VPS the bot traffic is infinitesimal background radiation.


Bandwidth costs aren't the dominating cost.


> As Miller puts it, “With fewer visits to Wikipedia, fewer volunteers may grow and enrich the content, and fewer individual donors may support this work.”


Contributors are a tiny % of users. I'm sure they've got some room for improvement on incentivizing new contributors. But Wikipedia is a gift to humanity and I hope we find new ways for them to be paid for their contributions to AI.


> Contributors are a tiny % of users most of them were wikipedia users in some form before they were contributors I imagine


1/3 of all donations are from the banner. I just went and looked at their annual report, which disclosed this.


The warning sign is not traffic for ads, although this will result in a drop in donations eventually.

It means that now, people are paying for their AI subscriptions, while they don’t see Wikipedia at all.

The primary source is being intermediated - which is the opposite of what the net was supposed to achieve.

This is the piracy argument, except this time its not little old ladies doing it, but massive for profit firms.


> It means that now, people are paying for their AI subscriptions

Most people are not paying a cent. And the people that are, are paying for stuff like coding assistance or classification, not the kind of info you get on Wikipedia.

Looking up Wikipedia-style information on LLM's is not a driving factor in paid subscriptions to ChatGPT etc.


Wikipedia was never a primary source to begin with


Wait, when were little old ladies the perpetrators of piracy?


I believe that comment is referencing this recent news:

> Sony tells SCOTUS that people accused of piracy aren’t “innocent grandmothers”

https://arstechnica.com/tech-policy/2025/10/sony-tells-scotu...


Thank you!


If nobody uses Wikipedia they won't get any donations, and unfortunately they wasted the last two decades blowing literally hundreds of millions of dollars on random community and outreach programs instead of building an endowment in case something exactly like this happened.

No really, it was in the news a few years ago but nothing changed as far as I know.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: