In early 2025, OpenAI and the MIT Media Lab published a joint research effort on heavy users of ChatGPT and emotional outcomes. The work attracted a lot of coverage, some of it careful and some of it the usual headline compression, and it has been cited in the AI companion conversation ever since.
The study is worth understanding properly for two reasons. First, it is one of the largest pieces of empirical work to date on conversational AI and emotional well-being, with both observational analysis at population scale and a controlled experiment alongside it. Second, although ChatGPT is a general assistant rather than a companion app, the patterns the research surfaces translate directly to the questions readers of this site care about. Heavy emotional engagement, voice-versus-text differences, and the small-subset-with-concerning-patterns finding all show up in how AI companions are actually used.
If you only have a paragraph: the research found that most ChatGPT users do not show signs of problematic emotional engagement. A small subset of heavy users does. Within that subset, voice-based interaction and emotionally framed conversation correlated with worse self-reported well-being on several measures, including loneliness and emotional dependence. The careful framing in the papers themselves was that these patterns are real, the population at risk is small, and the mechanism is not yet well understood. The headlines compressed all that into “ChatGPT makes you lonely,” which is not what the work actually says.
What the study is
The work was a collaboration between OpenAI’s research team and the MIT Media Lab, with multiple authors associated with the Affective Computing group. It is structured in two parts.
The first part is an observational analysis of usage data and self-reported survey responses from a large population of ChatGPT users. The aim is to characterize how people actually use the assistant emotionally: how often they ask it for emotional support, how often they have voice conversations, how often they describe the assistant in relational terms, and how those patterns relate to self-reported loneliness, life satisfaction, and emotional dependence.
The second part is a randomized controlled trial run over several weeks with around a thousand participants, randomly assigned to different interaction modes (text, neutral voice, engaging voice) and different conversation types (personal, non-personal, open). The RCT measures changes in the same well-being outcomes over the duration of the trial.
We are deliberately conservative on the most specific numbers (exact n’s, exact effect sizes, exact percentages). The papers and OpenAI’s research write-up are the primary sources to consult for those, and overconfident summary is exactly the failure mode this site exists to avoid. Read the originals when stakes warrant it.
What the study found
Three findings worth holding in mind.
Most users are fine. The dominant pattern across the population is that most people use ChatGPT for non-emotional tasks (work, coding, writing, search-style questions) and report no measurable change in well-being outcomes over the study window. The “power user” framing that is sometimes implied by coverage applies to a small tail of the distribution, not the typical user.
A small subset of heavy users shows concerning patterns. Within the tail of the distribution (the heaviest emotional users, often using voice modes, often discussing personal topics), the research observed correlations with higher loneliness, more emotional dependence, and lower socialization with other people. Whether the use is causing the loneliness or the loneliness is driving the use is exactly the kind of question observational data alone cannot resolve, which is why the team paired it with the RCT.
Voice and emotional framing matter, in measurable ways. The RCT found that interaction mode and conversation type interacted with personal and emotional outcomes. Voice-based conversation, especially around personal topics, looked different in the data from text-based or neutral-task conversation. Some of those differences cut in the direction of more dependence and worse well-being for some users; some looked benign or beneficial. The big-picture takeaway, in the team’s framing, is that the medium and the content matter and are worth designing around.
What the study does not say
A few things worth being clear about.
It does not say ChatGPT, or AI in general, makes people lonely. The population-level finding is that most users show no measurable change. The concerning patterns concentrate in a small subset. Translating that into a universal claim about the technology overstates what the work supports.
It does not establish strong causation in either direction. The observational portion is correlational. The RCT helps with causation but is short (weeks, not years) and is testing ChatGPT specifically, not all AI companions. Whether long-run companion use changes loneliness in either direction is still an open question across the literature.
It does not generalize directly to AI companion apps. ChatGPT in 2025 is a general assistant with optional voice and personality. Replika, Kindroid, Nomi, and the companion apps proper are designed to maximize relational engagement. We should expect the patterns the study identifies to be louder in companion apps, not quieter, but the study does not measure those apps and the magnitudes are unknown.
It does not address clinical populations. The research is on general-population users. People with depression, anxiety, suicidal ideation, and other clinical concerns may experience the same products differently. The study is not a clinical trial and does not speak to clinical outcomes.
It does not endorse or condemn any specific design choice. The research surfaces patterns. It does not prescribe what OpenAI, Anthropic, or any companion-app maker should do with that information.
Why the findings translate to companion apps
Three reasons the work matters here even though it is not a study of companion apps.
First, the patterns of use the study identifies (heavy use, voice mode, emotional framing, talking about personal topics) are the default mode of interaction for companion apps, not an edge case. If a small subset of ChatGPT users who lean into those patterns shows concerning outcomes, the same patterns dialed up across the full population of a companion app are worth taking seriously.
Second, the methods are a useful reference point. The combination of large-scale observational analysis and a controlled experiment is the design the field has been calling for. Future companion-specific research that uses similar approaches will be able to be compared back to this baseline.
Third, the policy conversation is going to use this study. SB 243 in California, the Garcia v. Character Technologies lawsuit, and the EU AI Act all engage with the question of harm from chatbot use. The OpenAI and MIT work is part of the empirical record those policy debates will lean on, and companion app users and operators should know what it actually says rather than what the headlines say it says.
What this means for users
If you are a user of an AI companion app, the study supports a few practical readings.
The case that most casual use is benign is broadly consistent with the data. The case that heavy use, voice mode, and emotionally framed conversation can have downsides for some users is also consistent with the data. The case that any specific user can predict in advance which group they will fall into is much weaker; people are not always good at noticing when their use is starting to displace other things in their life.
If you are using a companion app heavily, especially in voice mode, especially for emotionally personal conversation, it is worth checking in periodically on whether your other relationships and social functioning are getting better or worse over time. The Media Lab’s broader framing (how you use the app matters more than whether you use it) is a useful frame here.
We covered the practical implications in AI Companions for Loneliness and the broader research backdrop in AI Companions and Mental Health.
The broader research context
This study sits alongside several related lines of work.
The Stanford Replika study (Maples et al.) is a survey-based snapshot of Replika users, finding meaningful effects in a college-student population. The MIT Media Lab’s broader companion chatbots project is the lab’s ongoing program, of which this OpenAI collaboration is one output. De Freitas and colleagues at Harvard have published industry-skeptical work documenting specific harm patterns, complementing the MIT work’s more measured tone with a sharper critique. Skjuve and colleagues in Norway have done extensive qualitative interview research on Replika user experiences.
Across these literatures the picture is consistent: the technology is doing real work for some users, real harm to others, and the careful answers depend on use pattern, design choices, and individual context.
Where to read it
OpenAI and the MIT Media Lab both published research write-ups of this work, and the underlying papers are available through standard academic channels. We strongly recommend reading the original papers for any consequential use of the findings; this summary, like every summary, loses nuance.
FAQ
Is this study peer-reviewed?
The collaboration produced both a public-facing research write-up from OpenAI and peer-reviewed papers from the Media Lab side. Check the primary sources for the specific publication status of each component.
Does this mean I should stop using ChatGPT or my companion app?
No. The study finds that most users are fine and that a subset of heavy users shows concerning patterns. The careful read is to be honest with yourself about your own pattern of use and check in on whether your broader social functioning is improving or eroding.
Does this study say AI is bad for mental health?
No. It documents heterogeneous outcomes: benign or beneficial for most users, concerning for some heavy users in specific patterns of use. Treating it as a blanket indictment of the technology overstates what the work supports.
Is this evidence companion apps are dangerous?
The study is on ChatGPT, not companion apps. The findings are suggestive about companion apps because the patterns of use that correlated with worse outcomes (heavy emotional engagement, voice mode, personal topics) are the default mode for companion apps. Suggestive is not proof. Companion-specific work that uses similar methods is needed.
What is the “power user” framing about?
The patterns of concern cluster in a small subset of heavy users, not in the typical user. Coverage that implies the average ChatGPT or companion-app user is at risk is overstating what the data supports.
Related reading
AI Companions and Mental Health for the broader research backdrop.
The Stanford Replika Study for the most-cited single piece of companion-specific research.
The MIT Media Lab Companion Chatbots Project for the broader research direction this work sits within.
AI Companions for Loneliness for the practical implications.
If you are a researcher in this area and we got something wrong, please write us at the contact form. Corrections are made quickly; reviews are not.