

The crawlers for LLM are not themselves LLMs.
The crawlers for LLM are not themselves LLMs.
The article kind of fumbles the wording and creates confusion. There are, however, some passages that indicate to me that the actual data was recovered. All of the following are taking about the NAND flash memory.
The engineers quickly found that all the data was there despite Tesla’s previous claims.
…
Now, the plaintiffs had access to everything.
…
Moore was astonished by all the data found through cloning the Autopilot ECU:
“For an engineer like me, the data out of those computers was a treasure‑trove of how this crash happened.”
…
On top of all the data being so much more helpful, Moore found unallocated space and metadata for snapshot_collision_airbag‑deployment.tar’, including its SHA‑1 checksum and the exact server path.
It seems that maybe the .tar file itself was not recovered, but all the data about the crash was still there.
Forensic analysis managed to retrieve this data, so it must have been stored in non-volatile memory.
I see a few top level comments agreeing with the sentiment that users are being entitled or abusive, but what are they actually referring to? The linked image certainly has no evidence of such behavior. Someone who claims to be the developer filed a deletion request for the duckstation-git AUR package on the AUR and they say:
Every time, it turns into abuse towards me, as you can also see in the comments for the package.
I read through a few pages of the comments here and they’re mostly people talking about fixing issues with the package, and what to do about the dev purposely breaking the build… I only found a single message that could be called abuse:
@eugene, not really but i suspect it’s an uphill battle, check the commit message: https://github.com/stenzek/duckstation/commit/30df16cc767297c544e1311a3de4d10da30fe00c
FWIW, I’m moving to pcsx-redux, I rather run a little bit less advanced PSX emulator than software by this upstream asshat. Regardless, much thanks for maintaining the AUR package so far.
And even this is not a good example of what stenzek is describing. For one, it’s obviously a reaction to stenzek’s hostile changes and not the sort of user coming for support and being abusive that stenzek is talking about. The user is also explicitly moving to a different emulator and not expecting any change from duckstation.
I remember the maintainer claiming they had permission from all contributors to change the license but I can’t find a link to it now.
This makes no sense. There might be various reasons a person might want/need to be on facebook. Does that mean they waive all right to privacy in every aspect of their life forever?
Sign the petition even if it’s surpassed 1mil signatures by the time you read this! The signatures will be verified after the petition is complete. This could lead to removal of any number of them. We don’t want to barely make it. Let’s go as high as possible!
“Fair use” is the exact opposite of what you’re saying here. It says that you don’t need to ask for any permission. The judge ruled that obtaining illegitimate copies was unlawful but use without the creators consent is perfectly fine.
Of course they’re not “three laws safe”. They’re black boxes that spit out text. We don’t have enough understanding and control over how they work to force them to comply with the three laws of robotics, and the LLMs themselves do not have the reasoning capability or the consistency to enforce them even if we prompt them to.
They work the exact same way we do.
Two things being difficult to understand does not mean that they are the exact same.
NVMEs are claiming sequential write speeds of several GBps (capital B as in byte). The article talks about 10Gbps (lowercase b as in bits), so 1.25GBps. Even with raw storage writes the NVME might not be the bottleneck in this scenario.
And then there’s the fact that disk writes are buffered in RAM. These motherboards are not available yet so we’re talking about future PC builds. It is safe to say that many of them will be used in systems with 32GB RAM. If you’re idling/doing light activity while waiting for a download to finish you’ll have most of your RAM free and you would be able to get 25-30GB before storage speed becomes a factor.
From the article:
Those joining from unsupported platforms will be automatically placed in audio-only mode to protect shared content.
and
“This feature will be available on Teams desktop applications (both Windows and Mac) and Teams mobile applications (both iOS and Android).”
So this is actually worse than just blocking screen capturing. This will break video calls for some setups for no reason at all since all it takes to break this is a phone camera - one of the most common things in the world.
It can’t be both. It’s not self-driving. That’s just what they call it to oversell it. I’m assuming they had to add the “Supervised” part for legal reasons.
Learning what a character looks like is not a copyright violation
And nobody claimed it was. But you’re claiming that this knowledge cannot possibly be used to make a work that infringes on the original. This analogy about whether brains are copyright violations make no sense and is not equivalent to your initial claim.
Just find the case law where AI training has been ruled a copyright violation.
But that’s not what I claimed is happening. It’s also not the opposite of what you claimed. You claimed that AI training is not even in the domain of copyright, which is different from something that is possibly in that domain, but is ruled to not be infringing. Also, this all started by you responding to another user saying the copyright situation “should be fixed”. As in they (and I) don’t agree that the current situation is fair. A current court ruling cannot prove that things should change. That makes no sense.
Honestly, none of your responses have actually supported your initial position. You’re constantly moving to something else that sounds vaguely similar but is neither equivalent to what you said nor a direct response to my objections.
The NYT was just one example. The Mario examples didn’t require any such techniques. Not that it matters. Whether it’s easy or hard to reproduce such an example, it is definitive proof that the information can in fact be encoded in some way inside of the model, contradicting your claim that it is not.
If it was actually storing the images it was being trained on then it would be compressing them to under 1 byte of data.
Storing a copy of the entire dataset is not a prerequisite to reproducing copyright-protected elements of someone’s work. Mario’s likeness itself is a protected work of art even if you don’t exactly reproduce any (let alone every) image that contained him in the training data. The possibility of fitting the entirety of the dataset inside a model is completely irrelevant to the discussion.
This is simply incorrect.
Yet evidence supports it, while you have presented none to support your claims.
When an AI trains on data it isn’t copying the data, the model doesn’t “contain” the training data in any meaningful sense.
And what’s your evidence for this claim? It seems to be false given the times people have tricked LLMs into spitting out verbatim or near-verbatim copies of training data. See this article as one of many examples out there.
People who insist that AI training is violating copyright are advocating for ideas and styles to be covered by copyright.
Again, what’s the evidence for this? Why do you think that of all the observable patterns, the AI will specifically copy “ideas” and “styles” but never copyrighted works of art? The examples from the above article contradict this as well. AIs don’t seem to be able to distinguish between abstract ideas like “plumbers fix pipes” and specific copyright-protected works of art. They’ll happily reproduce either one.
That sound weird to me. How big is the population of people who are technical enough to even check what certificate provider you are using but ignorant enough to think that let’s encrypt is bad because it’s free?
“Gender” means nothing without context. By a MAGAs definition of gender this policy doesn’t protect trans people, for example. We don’t know how this rule will be interpreted in practice. Even if you don’t consider the intent behind making this change, this is objectively a weaker guarantee of protection than what we had with “gender identity and expression”.
Law enforcement AI is a terrible idea and it doesn’t matter whether you feed it “false facts” or not. There’s enough bias in law enforcement that the data is essentially always poisoned.
I agree with you that the one liner isn’t a good example, but I do prefer the “left to right” syntax shown in the article. My brain just really likes getting the information in this order: “Iterate over Collection, and for each object do Operation(object)”.
The cost of writing member functions for each class is a valid concern. I’m really interested in the concept of uniform function call syntax for this reason, though I haven’t played around with a language that has it to get a feeling of what its downsides might be.