A third of new websites are AI-generated. A Stanford study explains what changed.

News

May 6

Researchers from Stanford, Imperial College London, and the Internet Archive sampled the web for 33 months and found that by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted. That number was zero before ChatGPT launched in late 2022. Three years to remake a third of the public-facing internet.

The headline number is the easy part. The team also tested six common worries about what AI text was doing to the web. Only two held up.

Their setup: pull archived snapshots month by month from the Wayback Machine, run the text through Pangram v3 (an AI detector that scores writing as human-written or machine-written), and give each hypothesis a measurable signal. The signals: writing turning sanitized and cheerful, citations dropping out, facts getting hallucinated more often, viewpoints narrowing, the web sliding into one uniform style, or semantic density (roughly, meaning per word) collapsing into word soup.

Two findings survived: the web is getting more positive in tone, and semantic diversity (the range of distinct ideas covered) is shrinking. AI text covers a narrower band of any given topic than human writing did.

The four the data didn't support are the more interesting result. AI text wasn't more likely to contain false claims (the team paid human fact-checkers, which is the gold standard). It wasn't dropping source links. It wasn't producing low-density word soup. The stylistic monoculture worry came back without statistically significant evidence.

What makes this more solid than the average press release is the controls: Wayback Machine snapshots so content didn't shift between sample and analysis, a named detector tested against alternatives, and human fact-checkers rather than one AI grading another. The trend lines are large enough that ordinary detection error doesn't change the direction.

Here's what this means.

I use these tools every day. They save me real time. So when I read this paper, the part that landed wasn't the 35% number. It was a finding tucked into the back half: the team surveyed US adults and found most believed AI was harming the web on dimensions the data couldn't confirm. People who don't use AI believed it most. People who use it daily believed it least.

That's awkward, and honest. The loudest worry about AI on the internet (that it's drowning the web in confident lies) is loudest among people with the least direct experience of how AI text actually shows up.

What the data did show is quieter: the web got blander and more agreeable. The thing AI is best at is producing copy that doesn't hurt to read, doesn't claim things it can't back up, and has no edges. That's a real change worth noticing. The warning we were given was about something else.

For a small business owner reading reviews or comparing tools, the practical effect is that the surface signals you used to spot quality (clean grammar, polished About page, confident tone, citations) don't separate the useful from the bland anymore. Everyone's grammar is fine now. Everyone sounds upbeat. The signal got swallowed.

What to do.

Read for friction. A useful page has specific names, specific complaints, opinions someone actually disagreed with, and details only a practitioner would know. Pleasant positivity is the new spam.
Click the citations. AI keeps its sources, the paper says, but doesn't always read them well. Open one and check whether it actually supports the claim above it.
Ask for a real example. When a recommendation has no downside or trade-off in it, treat it as marketing.
For your own site: keep your edges. Writing that says "this didn't work for us" or "we charged $X and it took Y hours" reads as human in a way it didn't have to before. Don't sand it off.

This is one paper, one detector, one slice of the web (newly published sites). "AI-generated" includes drafts a human edited carefully and drafts nobody touched, and the paper can't separate them. I do it myself.

But the overall finding holds. The web is changing fast, and the changes worth worrying about are not the ones we were warned about.

← All posts

Source: Study Finds A Third of New Websites are AI-Generated, 404 Media

The paper: The Impact of AI-Generated Text on the Internet (Dolezal, Alam, Graham, Bohacek; arXiv 2604.26965)

AIresearchinternetweb

Joel Folgner

A third of new websites are AI-generated. A Stanford study explains what changed.

Fake face. Real money.

I Can’t Let You Save That, Dave: The AI Hard Drive Hostage Crisis