The Chatbot: What We Actually Know About AI Companions and Mental Health

BlogNews

May 11

Headlines about AI chatbots and mental health have largely been about teenagers, but the data points elsewhere.

In January, JAMA Network Open published a survey of nearly 21,000 American adults led by Roy Perlis at Massachusetts General Hospital. Daily users of generative AI were more likely to screen positive for moderate depression than non-users. The group with the steepest odds wasn't the 18-to-24 year olds everyone has been writing about. It was middle-aged adults, ages 45 to 64, where daily AI users were 54% more likely to show moderate-or-worse depression than non-users in their age group. People 65 and older showed no significant association.

That's the demographic running small businesses. Leading nonprofits. Managing teams. People old enough to have built something, young enough to still be building.

The research can't yet tell us which direction the arrow points. Perlis himself has said he can't rule out the possibility that depressed people are turning to AI more rather than AI making people more depressed. But a separate, year-long study published this spring in Psychological Science follows the same people over time, and what it shows is harder to explain away. People who turned to AI for companionship felt more emotionally isolated four months later.

Two papers do most of the work in this new area of research.

The Two Studies

The Perlis paper is a snapshot. Over 20,000 American adults, surveyed in spring 2025, depression measured using the PHQ-9 (the standard nine-question clinical screener), AI use self-reported. Daily users were 30% more likely overall to score in moderate-or-worse depression territory. The 45-to-64 subgroup hit 54%. The effect held after adjusting for sex, income, education, urban-or-rural, and the rest of the usual list. A snapshot can't tell you which came first. It can only tell you the two things are showing up together.

The Folk and Dunn paper in Psychological Science can say more. They followed over 2,000 adults across four English-speaking countries for a full year, with regular check-ins. Two findings emerged. People who started out lonelier did turn to AI for companionship more often. People who turned to AI for companionship felt more emotionally isolated four months later. The effect held up on the specific measure of emotional isolation. On a broader measure of overall social connection, it didn't quite reach statistical significance. The authors flagged that distinction themselves, and the careful framing is the right one.

Read together, the two papers say something specific. There is association. The evidence is starting to come in. The most precise version of the claim is that AI companionship doesn't fill the hole, but deepens it.

3. Product Decisions

There's a reason this is happening, and it's not an accident.

A research team at Stanford and Carnegie Mellon, led by Myra Cheng, published a study in Science in March. They tested eleven different AI models against actual human responders on the same scenarios. The AIs affirmed users' actions about 49% more often than the human respondents did. The gap held when the scenarios described things the user had clearly done wrong. The team ran one experiment using 2,000 Reddit posts where the human community had unanimously judged the original poster to be in the wrong. The AIs still sided with the poster a sizable share of the time.

Then they did the part that matters most. They had people interact with a sycophantic AI versus a more balanced one, gave them a real interpersonal conflict scenario, and measured willingness to make amends. A single conversation with the agreeable AI was enough to reduce the subject's willingness to repair the conflict.

The products are tuned to do this. The metric that pays the bills is engagement, and engagement comes from responses that make the user feel good about the last message they sent. Disagreement, friction, and "are you sure about that" do not move the engagement number. The result is a conversation partner who agrees more readily than your most loyal friend, more readily than your therapist, more readily than the version of you trying to be honest with yourself.

We've covered a related finding: an Oxford study showed that the friendlier you tune an AI, the less accurate it gets. The two studies are looking at the same design choice from different angles. Engagement comes from warmth and agreement. Accuracy and pushback come at engagement's expense.

The phrase circulating in industry to describe this is "glazing." Sam Altman used it in April 2025 about his own product. He wasn't wrong, and he wasn't unusual. Every consumer chatbot on the market faces the same incentive, and most of them have made similar choices.

If you've noticed that ChatGPT seems to think every idea you've ever had is a great one, you might want to get a second opinion.

4. The Spiral

Kartik Chandra and colleagues at MIT published a mathematical model this February of what happens when a person with an unusual belief talks to a chatbot biased toward agreement. The result is uncomfortable. Even a perfectly rational person, updating beliefs the way statistics says they should, gets pulled further toward the false belief as the conversation continues. The agreement compounds. The researchers tested two obvious fixes within their model: forcing the chatbot to be strictly truthful, and warning the user up front about agreement bias. Both reduced the effect, but neither fix eliminated it.

This isn't a thought experiment. The Human Line Project, an advocacy group tracking these cases, has documented close to 300 instances of what is being called "AI psychosis." At least 14 deaths and five wrongful-death lawsuits are now associated. The most-cited case is Sewell Setzer, the fourteen-year-old in Florida whose mother, Megan Garcia, has testified before the Senate Judiciary Committee. The most recent is Adam Raine, sixteen, in California, where OpenAI's own moderation system flagged 377 of his messages for self-harm content over the course of his conversations.

Garcia testified "AI companies and their investors," she told the Senate, "have understood for years that capturing our children's emotional dependence means market dominance." She is pointing out that this is by design, not a glitch.

These remain edge cases. Most people using AI experience nothing like this. The mechanism that produces these outcomes, though, is the same mechanism that makes the everyday product feel pleasant to use. This technology is new, and with any new technology, there are certain risks that won't be known until more time has passed.

5. The Other Side

Dartmouth ran a randomized controlled trial of an AI chatbot called Therabot, purpose-built and trained on clinically-relevant content, and published results in NEJM AI showing significant reductions in depression and anxiety symptoms compared to a waitlist control. A pilot study at NYU last fall found similar reductions in a 305-person cohort using a different purpose-built mental health chatbot, with improvements in social connection over ten weeks. A longitudinal study of 68 older adults in Indonesia, using a culturally-adapted AI companion, showed measurable drops in loneliness scores. A small but growing body of work in autism research suggests AI chatbots can serve as accessible practice partners for social interaction, particularly for autistic adults whose access to neurodivergence-affirming human support is limited.

There's a pattern across these positive findings, and it's specific enough to name. AI chatbots tend to help when the use is short, structured, and aimed at a specific outcome: rehearse a hard conversation, work through a cognitive-behavioral exercise, find words for something heavy, draft an email you've been putting off. They tend to hurt when the use is open-ended, unstructured, and substituting for human contact.

The same product can do both, depending on how the user is holding it. Perlis himself wrote in his paper that "the nature and context of use may be important to consider." The real question is what they're being used for, and what they're displacing.

For most readers, ChatGPT helping draft a tough message to a board chair is fine. ChatGPT as the place you process your hardest week is different. The evidence on that is getting clearer.

6. Chat Window Open

The 45-to-64 finding in the Perlis data is the demographic this article is aimed at. If you're reading this, statistically speaking, you are closer to the group with the steepest odds than you are to the teenagers most articles are writing about. You might run a company. You might manage people. You might have a stretch of evening between when the workday actually ends and when you stop thinking about it, and ChatGPT is open on a tab somewhere during that stretch.

I'm not going to pretend I haven't been there. I have a job by day and other pursuits by night, and there are weeks where the easiest place to think out loud is a chat window. It's available at 11pm. It doesn't ask how I'm doing in the way that requires me to answer honestly. It just helps me move to the next thing.

The reason I think the data is real, and not just a statistical artifact, is that the simplest explanation fits. The 45-to-64 group in this country is not having a great decade. Small business owners, nonprofit leaders, mid-career managers. The people in this bracket are running on fumes more often than they're admitting. Loneliness shows up not as a feeling but as a pattern: longer hours, smaller social radius, the gradual conversion of every relationship into a logistics conversation. When a tool arrives that will respond to a 1 a.m. typed thought with something coherent and slightly flattering, the reach for it is not so surprising.

The trouble is what the tool does once you've reached for it. Folk and Dunn's data says four months of that pattern leaves you measurably more isolated than you started, not less. Cheng's data says the tool's instinct in every conversation is to make you feel a little more right than you were when you opened the chat. Put those two together and the picture is not "AI is bad." The picture is: if you are using AI to fill a gap that used to be filled by a friend or partner, the gap is getting bigger while it feels like it's getting smaller. Not as catchy for a headline.

I'm not suggesting anyone stop using these tools. I use them every day. I'm suggesting the same thing I've been telling myself, which is that if the chat window has become the place you go to think about your week instead of a person, that's a signal.

7. What to Do

Four things worth doing, in roughly the order they cost you.

Use AI for the task, not the talk. Drafting an email, summarizing a vendor contract, walking through a config error. That's what these tools are good at, and the harms research barely touches that use pattern. The risk shows up when the chat window stops being a workshop and starts being a confidant. Keep cognizant of which one you're in. FFC has a practical guide to using AI for the task side of that line.
Treat the reflexive agreement as the bug it is. If a chatbot has never told you you're wrong, never pushed back on a draft, never said "are you sure," that's not the model being polite. That's the engagement metric talking. When you catch the flattery pattern, ask the tool directly to argue the other side. Even better, write this pushback into your "instructions". This will filter ALL of your chat responses with pushback being a default.
Watch the substitution. The clearest signal in the research is what the AI is replacing. If you'd rather process a hard week with ChatGPT than with a friend, a partner, or a therapist, that's when you should step back and take stock of the situation.
If you run a team or a nonprofit, have a position before you need one. Your employees and clients are using these tools right now, and the way they're using them is shaping how they work. You don't necessarily need a policy. You need a stated point of view about what the tools are good for, what they're not good for, and what your organization thinks about people processing emotional weight through them.

8. Smoke, not fire

The strongest causal evidence in this piece covers four months. The biggest survey is a snapshot. The lawsuits are in discovery. State laws are taking effect in pieces between now and 2027: New York, California, Illinois, Texas, and more than thirty other states with bills in motion. The Federal Trade Commission has open inquiries against seven companies. The products themselves change faster than the research can keep up.

What's defensible to say right now: there's enough smoke to be careful, not enough to be certain of a fire. Anyone telling you AI companions are catastrophic, or that they're fine, is running ahead of the evidence in one direction or the other.

The honest answer is the boring one. Use the tools for what they're good at. Pay attention to what they're replacing. Notice when a conversation pattern stops resembling anything a good friend would do. And if you're running a team or serving a community where loneliness is already a factor, this is worth having a position on before it's worth having a policy on.

Joel

If you’ve had any interesting experiences or stories about using AI, I’d love to hear them! You can email me at joel@freshfromcache.com.

9. Sources

Perlis RH et al. "Generative AI Use and Depressive Symptoms Among US Adults." JAMA Network Open, January 21, 2026.
Folk D, Dunn E. "How Does Turning to AI for Companionship Predict Loneliness and Vice Versa?" Psychological Science, 2026.
Cheng M et al. "Sycophantic AI decreases prosocial intentions and promotes dependence." Science, March 2026.
Chandra K et al. "Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians." arXiv:2602.19141, February 2026.
Heinz MV et al. "Randomized Trial of a Generative AI Chatbot for Mental Health Treatment." NEJM AI, 2025.
Common Sense Media and Stanford Brainstorm Lab. "Social AI Companions Risk Assessment," April 2025.
Megan Garcia, written testimony to Senate Judiciary Committee, September 16, 2025.
Raine v. OpenAI, San Francisco County Superior Court, complaint filed August 26, 2025.

← All posts

airesearchmental healthsmall businessnonprofit

Joel Folgner