- cross-posted to:
- linustechtips@lemmit.online
- cross-posted to:
- linustechtips@lemmit.online
Reddit user content being sold to AI company in $60M/year deal::It’s being reported that a deal has been struck to allow an unnamed large AI company to use Reddit user…
“We need to closec the api in order to protect our users from being used for ai”
I mean, they never claimed it was to protect users. It was to protect their user’s data from being used without paying Reddit. They didn’t like that AI companies were using Reddit content as a free source of training data, they never gave a shit about their users’ privacy.
This is also slightly off. It was primarily to eliminate third party apps from the existing landscape. Reddit want money from users in one of two ways:
- Use their app and pay with your data via invasive tracking and advertising.
- Pay for a third party app that pays them for API access.
Due to the extortionate pricing, (2) was only ever hypothetical. In reality there was no sustainable model for this for any third party app, even as a non-profit.
The case around AI does exist, but it was smoke and mirrors for Reddit pulling the same nonsense that Twitter did once they realized they might get away with it, regardless of the short term damage it would do to their public image.
I think the 3rd party apps very a nice bonus but considering the timing I’m pretty sure the AI boom was the main reason.
I suppose you’re forgetting about Reddit premium…
I mean, yeah, doesn’t everyone?
Well yeah I suppose that’s the case, I was more stating the fact that it’s another way for reddit to suck money from its users. The best part is nothing really changes.
It was more like “We need to closec the api in order to protect our profits from the use of your data”
That’s how little they got‽ Holy shit. That’s the steal of the fucking century for all that content. Reddit clearly puts the same stock in its negotiators as it does its 3rd party ecosystem. Anyone who values them more than maybe 2x this price for their IPO is a fucking idiot. Forget Trump’s Art of the Deal. spez needs to write a book.
To be fair, most of the content is written by AI’s, so it’s AI training AI
Like human training human, this will end badly
Getting access to the massive backlog of user data over the last 15 years for a mere 60 million. I’m glad reddit shot themselves in the foot, I’d go delete my user data from reddit, but im sure they’ll be crawling the backups as well.
Any AI company who buys more then a year is dumb.
Unless they’re leasing the information every year, which would essentially make their ai dependent on the data, but that data is probably the best source to use on the internet. Also, without continuously using the most current comments and posts, the ai model won’t be able to give any info about current events topics and such.
Pay $60m, back it up and scrape new content.
As now countlessly proven by all the lawsuits or potential lawsuits abound, it’s still pretty easy to show what ai models were trained on. It’s the entire reason a company is paying reddit for the data instead of scraping it in various ways (ways that were easier before reddit closed off their api). Maybe in a few years time they’ll have it worked out to where there’s no way to pick up on where an ai scraped it’s data from, but they aren’t there yet.
I appreciate your use of the interrobang
I have a replacement action set up to change a ? and a ! to ‽. I use it at least once a week!
Great‽ ;)
Considering that the data has almost certainly been scraped already, that might have been the best that they could get for it. Or else the companies might just get it from their archives/training sets for free, like they did before.
Putting aside pretty much everything else about this announcement: That’s… shockingly cheap.
Probably because it was harvested long before they locked API. I suspect it’s not a purchase but a way to legitimize the datasets already in the works since Reddit said they are now trading them. And our favorite CEO struggles to turn any profits, so he hardly had any leverage to ask for more.
It’s mostly data that’s publically available. It’s more of a gamble I think, it’s only worth anything if the government decides you need to pay for the data you use in training.
1m for every IQ point of the average Reddit user
lol dude most of us were over there for years before jumping ship and coming here
Wait
Fuck
Shhh, let’s just pretend the average IQ over there dropped when we left.
Before I deleted my account I removed all posts and changed all my comments to a complaint about the enshittification under way. Two accounts, 13 and 11 years old, it was a lot of work.
This is awesome. Like chucking some frozen prawns under the floor boards when your landlord kicks you out.
deleted by creator
Those AI companies should love fediverse then. I mean, all data here is basically open for anyone to grab. Heck, they don’t even need to grab the data, just run their own instance and the federation data will flood in on its own.
Oh, don’t give them ideas please!
This was my thought exactly. Shouldn’t there be a “no_ai.txt” on the servers somehow?
That would be about as effective as
robots.txt
, unfortunately.
Won’t be long long before reddit is selling 90% AI generated content passing for human generated content!
Feels like they’re already there.
Does this include art OC posted there being used to train art bots? If I were posting OC art I’d just delete that shit right away, not that it’ll help I suppose
Waaaay too late for that
And now those artists can’t sue like others have done. Really hope the products realize this and jump ship
I can see it now, that ai model is going to be really, really fucking angry. lol
Honestly, I can see the appeal of a model going “fuck spez” unprompted once in a while.
Shower thought: what if a large number of people made lots of posts and comments on reddit using only AI generated content?
Considering the spam problem, in a way, it sort of is already happening.
It’s possible that par tof the API changes might have been to curb off that kind of behaviour before people decided to go and do just that too, or stop them using bots to wipe their profiles out.
Honestly, you just need to convince people to go through their comments and break any chains with nonsense. I bet that they are training conversational abilities (I mean what other good is the data set, it’s not like redditors are experts, or when there is that the experts get upvoted at all.)
This is going to backfire when the content they are selling is used by AI to make bots to make the content that gets sold to make the AI to make bots to make the content.
This is why its so important we don’t legislate against AI and make it illegal to use scraped data. All the data is already owned by someone, putting up walls only screws us out of the open source scene.
And legislate content ownership altogether. The idea that Reddit spent more than a decade growing its community just so that it could use our content as its own property is a huge issue. How do we safely and fairly communicate and express our ideas in society where the platforms that enable this automatically claim ownership of our ideas? Social media are middlemen with outsized influence.
$60 Million or $60,000? Sometimes people use MM for Million and M for ‘Mille’ aka thousand. Other times people use M for Million and k for Thousand. Not a great article if they can’t clarify that.
I’ve never seen Mille used in reference to money. Only in advertising (eg CPM = cost per mille = cost per thousand ad impressions)
But to answer your question, the original Bloomberg article says 60 million.