What is making you stink discord isn’t also selling all its data to AI companies?
Let them scrape. AI as it currently is, is still autocomplete with extra steps, and still prone to hallucination. As it is it will be usable to make cheap, passable content, but not hit those moments of inspiration of human art (yet – there are real AI groups looking to make AGI)
It is a bubble which will pop and AI will be seen as a tool (a resource-costly tool) that requires its own set of experts independent from the experts that use ACAD or write editorial copy or do investigative work. Id est, it’s not the replacement of employees that boards of directors want it to be.
And AGI is centuries from being efficient enough that you can make Rosie the Robot who cleans your house and makes a good upside-down pineapple cake.
What prevents Discord from selling the chat logs to AI companies?
Discord’s complete lack of indexing. Although it’s definitely not impossible to scrape data from Discord it would take more resources than say reddit.
If an AI company pays Discord they won’t scrape but get the data directly.
But they Index everything. Just request your data and you’ll get a neat package of all your messages with timestamps and all.
They store your data, they don’t correlate your data.
So what? You can still sell it to AI companies without assigning an user to each message. They don’t care about who wrote it when stealing the content.
Self hosting is the way.
Doesn’t really solve the AI scraping or the silo problem and as Codeberg found out recently, solving the AI scraping DDOS is never ending
Anubis?
Codeberg was running Anubis. Apparently several bots have started just solving Anubis and scraping away again.
I have never understood why people moved stuff to the closed Discord server system…
Because UI/UX beats any abstract things like privacy & data ownership.
Go for the voicechats, stay for the poorly organized forum megathread experience
To build up a close (the two definitions of “close”) community. To speak freely (even if you have to respect the TnCs of Discord + community guidelines.)
Is that what we’re up against? I thought every time I voice my mind on forums it gets upvoted or downvoted or ignored, but always ultimately ignored 🤷🏼♀️.
resists overwhming urge to ignore
You forgot about influencers who will read your knowledge and present as theirs in their videos.
Abolish intellectual property
Its been shit since covid. Everyone constantly online, and really ramping up the stupid as fuck culture wars. “Back in my day” I could log into a chat room, have some fun conversations, and then log off without getting pissed off or pissed on. I could look at movie news, and not be swapped by performative hate or praise for whatever fucking movie is or isnt “woke”.
Everywhere you go, you see “Be civil” or “Be respectful”. But all that really means is, dont question out echo chamber. And if you do, well, turns out not being civil towards you doesnt count.
Left and right doesnt matter. Its all hate and performative praise as far as the eye can see.
The forum I call home tolerates a lot of hate speech.
I think I’m out, but it’s less about the AI scraping and more about moderation.
I don’t even oppose hate speech at this point as long as its directed towards people who believe in the project 2025 agenda instead of the other way around (which it almost exclusively always is) 🤷 we need a kiwi farms but for targeting delusional conservatives. The enemy got to where they are today partly due to mass internet trolling and letting them trample the internet unopposed leaves weak-minded normies to adopt and fight for their views. “being nice” about it ain’t getting anybody anywhere and its time for these pieces of shit to actually experience bullying for themselves.
Too bad no such communities exist on the internet.
deleted by creator
And that’s why ketchup makes an excellent fuel additive.
Yeah. The vinegar is rich in hydrocarbons, which improve the fuel/air ratio during combustion whilst also keeping the engine smelling nice.
Porque no los dos?
Discord is targeting an IPO by end of year. I doubt the AI bubble bursts by then.
Anyone wanna bet against their valuation being based on AI training data value?
IDGAF about LLM bots scraping public forums, they are public and available to anyone. I do mind them scraping shadow libraries, and training on copywritten material, which they should not do
LLM bots are scraping so much that increases costs of maintaing forums and sometimes even ddosin them for example Codeberg.
Public and copyrighted are not mutually exclusive.
This discussion is a creative work and the copyright is collectively owned by the text contributors.
Please reach out to the authors individually for a license before using it to train your AI sex bot.
I hereby and in perpetuity grant an exclusive, non-geographically-limited license to my comments to F.I.S.T.O. and only F.I.S.T.O.
not the makers of F.I.S.T.O. lets be clear
(IANAL) Wouldn’t this count as fair use since the AI sex bot is only using snippets?
That’s currently being argued in the courts. There’s a lot that goes into it from right to distribution, to proving that although the AI bot can’t reproduce everything even though it normally doesn’t. [https://arstechnica.com/features/2025/06/study-metas-llama-3-1-can-recall-42-percent-of-the-first-harry-potter-book/](A very real example of reproducibility)
There’s also arguments about how they accessed large amounts of content. The law doesn’t just recognize whether you can access something or not, but what you access it for. There’s laws about accessing things with the sole purpose of using it to develop a commercial product. All of it is a tangled mess that there’s no current clear answer to (legally, morally I think there is but that’s very opinionated)
Oh my dude, that second ship sailed decades ago.
Time was there was really just one place (maybe two) where you could find an answer to a question. (Usenet mostly.)
Now there’s easily two dozen at least, from SO/SE, Quora, Yahoo Answers, all the way to Reddit subs…
The balkanization of information. It screwed the knowledge of the public, but it made a few people super rich. Whee
Yeah but those places were still publicly indexed, discord is private
If I’m going to share my information and knowledge publicly on an Internet site, I’d like everyone to have fair and open access to it, not at the whims of a multinational corp to gatekeep for me. So the fact that AI can access it too doesn’t discourage me.
You have information from me because I choose to share it, not because a site has demanded I give it up without a clear benefit to me in return.
I think there’s a lot of solid arguments against letting AI steal everything, but with the scraping there’s an even more immediate problem. They don’t rate limit or do it in an intelligent method. It becomes a full blown ddos that has take down entire sites and slowed many more to the point of near uselessness.
They’re in a very literal sense crashing large chunks of the Internet and causing havoc which costs very real money to fix, either by upping server resources or installing AI scraping mitigation resources so that every still has access to the free information you mention.
That is definitely a problem that needs to be dealt with, since AI scrapers hogging bandwidth or making sites inaccessible means it is hampering equal access to everyone. Ignoring conventions and not rate limiting itself are harmful to the open internet.
So yes, those kinds of AI scraping behaviours should be mitigated, but on the principle of AI ingesting my public data, I’m not against it, if it can access it reasonably and fairly like anyone else.
My problem with it is that in Ye Olde Times before 2022, if you needed some info on, I dunno, amethist cutting blades, you joined the crystal geode cutting forum and maybe became a contributing member of the group.
Now, you ask chatGPT, and contribute nothing.