Table of Contents
Earlier this year, AI-generated ads started circulating online featuring real creators promoting products they had never agreed to endorse. Their likenesses had been pulled from content they posted publicly, fed into AI systems, and turned into ads without anyone contacting them first. Some found out through their own followers. Others stumbled across the videos themselves.
What made those cases revealing was not just what the systems produced. It was how they worked. The people involved were not victims of a hack or a data breach. Their content was publicly available, and the systems that used it were not bypassing any kind of permission step. There was no permission step. That is how most AI systems handle data by default, and it points to a gap between what people think they have agreed to and what these systems are actually built to check. The difference comes down to two words that sound similar but mean very different things.
What Is the Difference Between Access and Permission in AI?
Access and permission are two different things, but AI systems treat them as one. Access means data can be retrieved. A public post, a profile photo, a song on a streaming platform, an article someone published online. All of these are accessible to anyone with the right tools. Permission means the person who created or shared that content agreed that this specific use is acceptable. That agreement has to come from somewhere. It does not automatically follow from the fact that something is visible.
Most people assume some form of agreement happened when a company uses their data. Maybe they accepted a terms of service, or left a privacy setting unchanged. What they often do not realize is that clicking through a platform’s terms of service is not the same as giving an AI company permission to train on everything they have ever posted there. Consent was given to one thing. The data ends up used for something else entirely. The same dynamic shapes how people who contribute data for pay often discover the terms extend far beyond what they understood when they signed up.
In April 2026, Microsoft updated GitHub’s privacy policy so that developers’ code would be used for AI training by default. Previously, explicit consent was required before any of that could happen. The update flipped that to opt-out, with no advance notice. Developers pushed back, but what the episode actually showed was that the original opt-in policy had always acknowledged something real. Access and permission are two different questions. The new policy simply stopped treating them that way. To see why that matters, it helps to follow what actually happens to data once it enters an AI system.
How Do AI Models Use Data Without Explicit Consent?
Most people never see what happens to their data after it leaves a platform. Following it through three stages shows why permission is so hard to recover once the process has started.
1. Ingestion
Data is gathered from accessible sources, things like websites, social platforms, public repositories, and open databases. At this stage, the only questions being asked are technical ones. Can it be retrieved? Is it in a usable format? Who created it and whether they authorized this use are not part of the filter. Content moves through based on whether it is reachable, not whether anyone said it could be used.
2. Training
Once collected, data is combined with millions of other inputs and used to adjust how the model behaves. This is where things become effectively irreversible. The model does not store recordings or images as files that could later be found and deleted. Everything gets processed into numerical patterns, something closer to habits the model has absorbed than content it is holding onto, distributed across the system in ways that cannot be traced back to any specific person. By the end of training, there is no record of what came from whom.
3. Generation
When the model produces output, it draws on everything it absorbed during training. There is no log of which inputs shaped which outputs, and nothing checks whether the data behind any given output was used with permission. The connection to the original source is gone, and there is no technical path to restore it.
The person whose content was used has no way of knowing any of this happened. There is no notification, no record to request, no paper trail to follow. And even for those who do try to give consent upfront, that agreement tends not to survive contact with how these systems actually work.
Why Is Consent Difficult to Apply to AI Systems?
Consent works when it covers something specific. You agree to a defined use, at a defined time, and you can change your mind if things shift. AI systems do not hold still long enough for that to work. A model trained today will keep generating outputs for years, feeding into applications that did not exist when the data was collected. Someone who posts fitness videos online might find their movement patterns absorbed into a training dataset used to build a motion recognition system they have never interacted with. The content was shared in one context. It ends up working in several others. This is part of what makes personality theft so difficult to detect or reverse once it has happened.
Public availability gets used to sidestep this problem entirely. If something is out there, the assumption goes, it is fair to use. But making something visible and agreeing to every use that follows are not the same decision. California’s Training Data Transparency Act, which took effect in January 2026, makes exactly this distinction. It requires developers to disclose what data they trained on because legislators concluded that public availability is not the same as informed contribution.
And even when some form of authorization does exist, it rarely survives what happens next. Once data is absorbed into a training dataset alongside millions of other inputs, any conditions attached to it become invisible. The context it carried, who created it, what it was for, where it was meant to go, does not travel with it. What gets encoded is a pattern. Not a record of where it came from or whether anyone said it could be used this way.
What Needs to Change About Permission in AI Systems?
Permission has never been built into how these systems work. It gets assumed, inferred, or justified after the fact. A genuine fix would look different in three specific ways.
1. The right to know
Right now, a creator has no way to find out whether their work was used in training. No notification arrives, no record exists to request, no paper trail to follow. The TRAIN Act would create that basic right, and the push to legislate it reflects how far outside the original design permission has always sat.
2. Permission that travels with data
When a recording gets absorbed into a model, fine-tuned into a product, and licensed to another company, the terms that originally applied to it should travel with it. Right now they do not. The data moves. The agreement stays behind. Building systems where authorization stays attached to data as it moves through pipelines is a harder problem than writing better terms of service, but it is the one that actually matters.
3. Meaningful control, not just disclosure
Telling someone their data was used is a start. Giving them real say over how it keeps being used is the harder goal. Some states have moved toward right of publicity laws that give individuals direct say over commercial uses of their identity. A number of industries have started building frameworks where permission is negotiated before use rather than assumed from availability, something explored in more depth in Identity Is Becoming Licensed Infrastructure. These efforts are early and uneven, but they all point toward the same thing. A version of AI development where a system has to establish authorization before it acts, not inherit it from the fact that data happened to be reachable.
FAQs
No. Most systems collect data based on what is accessible, not what has been authorized. Permission is not a built-in check at any stage of the pipeline.
Not automatically. Making something publicly visible is not the same as agreeing to every downstream use of it, including AI training.
It means the person whose data is being used agreed to that specific use. In practice, most training pipelines have no mechanism to verify or record that agreement.
Consent is designed for a specific, defined use. AI systems continuously reuse data across products and applications that often did not exist when the data was collected.