Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm okay with paying for datasets


Depends on how the courts rule. If the copyright maximalists prevail, only the wealthiest entities will be able to afford to license a useful data set.

Paradoxically enough, this is the outcome that most "Hacker News" denizens seem to be rooting for.


It's almost as if people believe in fairness and compensating people for their work.

Also, it's worth noting that this is only true as long as we're stuck in the "must train on the entire sum total of human output ever created" local minimum for machine learning. Given that most biological entities learn with much less data, this might well be the thing that prods ML research to using an approach that isn't "IDK, buy a few containers of GPUs, and half a DC of storage, see if that makes things better".


> It's almost as if people believe in fairness and compensating people for their work.

Yet in this case we are talking about compensating the compilers/massagers/owners of the datasets, not the original authors from wherever the data was originally scraped.


Copyright is hideously broken, but in theory: the owners only own it because they compensate the authors, which they only do out of an expectation of future profit (on average).

That theory's a fantasy, because extractive systems involving gatekeepers get established, but in this specific case, enforcing copyright would make things fairer for authors. There's no extractive copyright-taking gatekeeper for websites: scrapers don't get copyright, so can't re-license the material they've scraped (unless it's permissively-licensed or something).


I'd still get most of my dataset from torrent but I could pay for specific things like high quality source code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: