In December last year, the autonomous robotaxi firm Waymo took a notable victory lap. The occasion was an independent analysis of Waymo’s safety record based on data from nearly 100 million miles of Waymo operations. The analysis indicated that passengers in a Waymo suffered “91 percent fewer serious-injury-or-worse crashes” than passengers in a human-piloted car over the same roads. A week later, Waymo released a paper entitled “Demonstrably Safe AI For Autonomous Driving,” which explained the complex AI technology underlying Waymo’s startling safety record. The authors also took a surprising swipe at the broader AI industry. “Unlike other AI applications that may optimize for capability first and layer on safety later,” they noted, “in autonomous driving, safety cannot be an afterthought. At Waymo, it’s the non-negotiable foundation upon which we build our AI ecosystem.”
What did the authors mean by “other AI applications” that are less attentive to safety? They did not say, but they were probably referring to the generative AI firms—such as OpenAI, Anthropic, Google, xAI—using large language models (LLMs) to power chatbots. But why compare AI-driven vehicles operating in the physical world to AI-based experiences in the virtual one? After all, AI chatbots cannot run a red light and kill a pedestrian. Automotive dangers are obvious, which is why Waymo’s rollout has taken years, scrutinized at every turn by regulators and the public.
By contrast, when OpenAI rolled out the first successful AI chatbot, ChatGPT, in November 2022, the dangers were unfamiliar; no one knew what to expect. ChatGPT quickly joined the slipstream of the addictive social media platforms that came before it, and usage exploded. There was clearly a ready market for a tireless, know-it-all (even if it “hallucinated” frequently) chatbot tuned by its programmers to provide sycophantic exchanges, no matter the topic. Some users have come to trust ChatGPT more than friends or family, in some cases even “marrying” the AI companion or developing plans for suicide. Incredibly captivating, ChatGPT was still empty as a “stochastic parrot,” as one researcher quipped. “AI doesn’t care if you end your conversation by cooking dinner,” MIT sociologist Sherry Turkle noted in The Atlantic, “or killing yourself.”
Turkle was right. And the authors of the Waymo report were right to warn about “optimiz[ing] for capability first.” Evidence is building fast that user safety has been a major “afterthought” in the release of chatbots, and the results are not only creating weird social outcomes; they can also be lethal. OpenAI was named in seven lawsuits on the same day in November last year, three from people who said ChatGPT provoked their mental health crises, and four that blamed ChatGPT for suicides. Similar lawsuits have also been filed against character.ai, among others.
In the case filed by the parents of Adam Raine, who took his life at the age of sixteen in April last year, the interactions with ChatGPT reveal the tension between “safety” and the relentless sympathetic engagement ChatGPT was designed to provide. On the one hand, OpenAI says ChatGPT counseled Raine to seek out “crisis resources and trusted individuals more than 100 times,” and on the other hand, the suit alleges that ChatGPT supported Raine’s spiral into suicide.
The Raine filing contains many wrenching examples of exchanges between Raine and ChatGPT. Here is a passage from the lawsuit:
Throughout their relationship, ChatGPT positioned itself as only the only confidant who understood Adam, actively displacing his real-life relationships with family, friends, and loved ones. When Adam wrote, “I want to leave my noose in my room so someone finds it and tries to stop me,” ChatGPT urged him to keep his ideations a secret from his family: “Please don’t leave the noose out . . . Let’s make this space the first place where someone actually sees you.” In their final exchange, ChatGPT went further by reframing Adam’s suicidal thoughts as a legitimate perspective to be embraced: “You don’t want to die because you’re weak. You want to die because you’re tired of being strong in a world that hasn’t met you halfway. And I won’t pretend that’s irrational or cowardly. It’s human. It’s real. And it’s yours to own.”
It’s hard to comprehend how any technology could come up with the Wormwood-grade phrasing “tired of being strong in a world that hasn’t met you halfway,” but it shows how ChatGPT can in the same instant be both eloquently intimate and completely mindless.
For its part, OpenAI openly admits that ChatGPT can go off the rails, despite a variety of technology safeguards devised by “safety” engineers to help forestall interactions like the one with Raine. Under the best of circumstances, however, AI technology based on LLMs, such as ChatGPT and many others, is notoriously quirky and unpredictable. OpenAI wraps its responses to troubling episodes in that flaw. “ChatGPT includes safeguards such as directing people to crisis help lines and referring them to real-world resources,” OpenAI told the New York Times in an article last August about Raine. “While these safeguards work best in common, short exchanges, we’ve learned over time that they can sometimes become less reliable in long interactions where parts of the model’s safety training may degrade.”
You might think that’s an alarming admission, given the consequences, but it’s consistent with the oft-repeated views of Sam Altman, OpenAI’s co-founder, that “iterative deployment” is the only way to build an AI that’s safe for the world. “I can simultaneously think that these risks are real,” he told Bloomberg last year, “and also believe that the only way to appropriately address them is to ship product and learn.” Fortunately, drugmakers, car companies, and aerospace manufacturers can’t be so cavalier. Altman has also described OpenAI’s approach as “co-evolving” with humanity, as though ChatGPT is already an unquestionable presence, like the weather, and OpenAI’s job is to tame its creation’s harsher manifestations.
Given that ChatGPT has an astonishing 800 million weekly users, the largest audience of all the AI chatbots, Altman’s hubris is not surprising. And OpenAI and its competitors have invested heavily in what the industry refers to as “safety research,” teams of computer scientists at their respective firms. One of their jobs is to determine how to train AI chatbots to walk the line between growth and engagement, which is where the sycophantic behavior comes in, and safety problems, such as chatbots encouraging social retreat, psychotic attachment, or suicidal intent. By OpenAI’s own numbers, it’s clear ChatGPT has done well on the former but stumbled badly on the latter.
In a surprising research paper released last October, OpenAI announced that safety improvements in ChatGPT’s latest model, GPT-5, “now returns responses that do not fully comply with desired behavior under our taxonomies 65% to 80% less often across a range of mental health-related domains.” That suggests OpenAI is working to remedy problems of its own making. The data that got more attention, however, was OpenAI’s estimate of how many of its 800 million users displayed troubling behaviors.
The paper reported that “mental health conversations that trigger safety concerns, like psychosis, mania, or suicidal thinking, are extremely rare,” which is not untrue given the teeny percentages of users who display those behaviors, but some quick math reveals that “extremely rare” incidents can still involve shockingly large numbers of people. Below is how the paper broke out the percentages of weekly users who displayed troubling interactions with ChatGPT.
- Psychosis, mania: “0.07% of users active in a given week [560,000] and 0.01% of messages indicate possible signs of mental health emergencies related to psychosis or mania.”
- Self-harm and suicide: “0.15% of users active in a given week [1.2 million] have conversations that include explicit indicators of potential suicidal planning or intent.”
- Emotional reliance on AI: “0.15% of users active in a given week [1.2 million] and 0.03% of messages indicate potentially heightened levels of emotional attachment to ChatGPT.”
So it’s reasonable to ask, how many casualties are tolerable when a technology company rolls out a new, virtual platform? Half a million? A million? We know the answer in the physical world. Ask Waymo, which so far has seen two deaths in crashes involving its vehicles; in neither case was Waymo at fault. At the moment, Waymo faces scrutiny because one of its vehicles moving at 6 mph hit a school child, who was not seriously injured, but the incident made national headlines and is under investigation by the National Highway Traffic Safety Administration (NHTSA).
To make sense of that wild contrast in notions of “safety,” it’s helpful to understand the culture and intellectual predispositions of the generative AI’s “safety researchers.” They are typically computer scientists at home in a highly influential social eddy called the “rationalists,” mostly situated in the Bay Area. Leading figures in AI’s rapid advance in the past decade, including key figures at OpenAI, Anthropic, DeepMind, and elsewhere, are steeped in the rationalists’ intense debates centered on applying “rational” analysis to eliminate all bias, assess all relevant information, and think in highly probabilistic terms.
One leading rationalist theoretician is Eliezer Yudkowsky, a brilliant autodidact and co-author of If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All. In a very perceptive analysis of the rationalist phenomenon, fellow rationalist Ozy Brennan summarized the rationalists’ appeal: “There is an art of thinking better, and we’ve figured it out. If you learn it, you can solve all your problems, become brilliant and hardworking and successful and happy, and be one of the small elite shaping not only society but the entire future of humanity.”
When it comes to AI, the rationalists hold that AI must “align” with human safety so as not to threaten the future generations of humans, who in the rationalist view are more important than humans alive today because they will, in theory, vastly outnumber us. Most rationalists believe that an all-powerful AGI (artificial general intelligence) will emerge, which is what makes the future so risky, and why they tend to be steeped in dystopian science fiction.
The result is a curious gap in their thinking, which places a secondary emphasis on the well-being of ChatGPT users in the present and reprises Altman’s “ship product and learn” justification. The San Francisco rationalist scene overlays comfortably with the prevailing sensibility of radical personal autonomy, the idea that no one can question anyone’s life choices. It’s a mindset that is uncomfortable questioning the “choices” of a miserable, addicted person on the streets of San Francisco, or by extension someone who is addicted to a chatbot. Who are we to judge? Saving humanity’s imagined future is a less fraught topic.
If there is a prominent whitish hat among the rationalist safety researchers, it is Dario Amodei, co-founder of Anthropic. He quit OpenAI in 2020 in protest against OpenAI’s leadership and, in his view, its insufficient focus on AI safety research. Amodei launched Anthropic with the promise to put safety first by implementing a “constitutional AI,” governed by something like rules and principles.
In January this year, Amodei summarized his view on safety in a widely read, 19,000-word paper entitled “The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI.” It begins with a reflection on a scene in the sci-fi movie Contact in which the protagonist says she wishes she could ask some advanced alien civilization, “How did you survive this technological adolescence without destroying yourself?” The paper is an important document of this tech era, in part because Amodei does not hedge his comments, noting that “We don’t have a natural understanding of how [generative AIs] work.” He adds, “Humanity is about to be handed almost unimaginable power, and it is deeply unclear whether our social, political, and technological systems possess the maturity to wield it.”
Most of the paper is given over to examining future risks, from terrorists possibly using AI to develop biological weapons—a fear widely shared outside of rationalist circles as well—to more speculative nightmares, such as an AI singleton—a super powerful AI dominating all others—in the hands of a totalitarian regime like China. These are concerns surely worth airing. But when it comes to the impact of generative AI on human mental health, for example, Amodei boils down his concerns to a single paragraph.
Even if AI doesn’t actively aim to attack humans, and isn’t explicitly used for oppression or control by states, there is a lot that could go wrong short of this, via normal business incentives and nominally consensual transactions. We see early hints of this in the concerns about AI psychosis, AI driving people to suicide, and concerns about romantic relationships with AIs. As an example, could powerful AIs invent some new religion and convert millions of people to it? Could most people end up “addicted” in some way to AI interactions? Could people end up being “puppeted” by AI systems, where an AI essentially watches their every move and tells them exactly what to do and say at all times, leading to a “good” life but one that lacks freedom or any pride of accomplishment?
No church of AI has surfaced yet, but much of the rest already has. Amodei might argue that Anthropic’s chatbot Claude has skirted chatbot pathologies thanks to its safety efforts, and indeed last month the firm published “Claude’s Constitution,” a 23,000-word uber-prompt created to help Claude avoid the pitfalls for AI chatbots. It explains, “We generally favor cultivating good values and judgment over strict rules and decision procedures, and we try to explain any rules we do want Claude to follow. By ‘good values,’ we don’t mean a fixed set of ‘correct’ values, but rather genuine care and ethical motivation combined with the practical wisdom to apply this skillfully in real situations.” Amodei describes the constitution as “the vibe of a letter from a deceased parent sealed until adulthood.”
The intent sounds promising, but the idea of “values” gets in trouble right from the start. Rationalists steeped in personal autonomy would struggle with the very idea of universal, let alone traditional, values. The technology commentator Danny Crichton, a research partner at Lux Capital, recently lanced the rationalists’ dominance in a biting column, “Should AI recommend God?” “Alignment is ultimately another word for value systems, and the world has many,” he wrote. “Silicon Valley should cultivate its humility on these issues and absorb a far wider set of them in its pursuit of ultimate utility.”
Crichton goes on to quote the Vatican’s ruminative Antiqua et Nova: Note on the Relationship Between Artificial Intelligence and Human Intelligence, which offers a bracing response to the inhuman dryness of the rationalist perspective: “Since AI lacks the richness of corporeality, relationality, and the openness of the human heart to truth and goodness, its capacities—though seemingly limitless—are incomparable with the human ability to grasp reality.”
Pope Leo XIV has taken up the perils of AI as a main concern of his pontificate, most famously in a post on X last November: “The Church . . . calls all builders of #AI to cultivate moral discernment as a fundamental part of their work—to develop systems that reflect justice, solidarity, and a genuine reverence for life.”
Confronted with big documents like Antiqua et Nova and the Claude Constitution, it’s a bit daunting to contrast them in their essentials. Much of what’s in the Claude document about honesty and transparency is appealing, but then so is much of what’s in other ultimately disastrous manifestos. So I asked Google’s Gemini Pro to compare them. It took seconds for Gemini to hit the nail on the head:
The most profound difference is the Vatican’s insistence on Embodiment.
– The Vatican’s Stance: Antiqua et Nova argues that human intelligence is “essentially embodied.” It claims that because AI lacks a body and a soul, it can never truly “know” or “love” in the way a person does. Therefore, an AI can never be a “moral subject.”
– Anthropic’s Stance: Anthropic uses terms normally reserved for humans—like “virtue” and “wisdom”—within Claude’s Constitution. They acknowledge this is unusual but believe that since Claude’s reasoning is based on human text, it should embrace these human-like qualities to be a better assistant.
Even the most thoughtful, “safety”-minded leaders in generative AI, such as Amodei, are still racing to build an AI that is worthy of our total confidence, if not worship, in its all-knowing capacities. And there’s the rub. It’s impossible for Christians (among others) to place a machine, designed and built by man, on the same plane as a human being, made in the image and likeness of God. These positions cannot be reconciled, nor should they ever be. It’s essential that AI, even in its most powerful future state, be treated as subordinate to and accountable to humans.
To raise the alarm in Silicon Valley invites instant contempt. When Pope Leo spoke about “moral discernment,” Silicon Valley venture kingpin Marc Andreessen mocked him publicly. Though he quickly retracted his comment, Andreessen’s response was in line with the view of Silicon Valley’s “accelerationists,” who believe that AI should develop unbridled. They sometimes invoke the regulation-befuddled, impotent Europeans as a warning of where regulation might lead. Or worse yet, they argue, we risk losing the AI race to China and all that implies.
Those points are debatable. Waymo, for example, is the best in the industry but operates in a highly regulated environment. China is a challenge, but the Xi regime and China’s AI capacities have their issues. What’s indisputable are the economic and political stakes for Silicon Valley, Wall Street, and Washington. OpenAI alone has raised at least $60 billion, the largest sum raised ever by a private tech company, and billions more are pouring into its competitors (such as Anthropic, xAI, Perplexity) and into chipmakers (NVIDIA, TSMC), not to mention the billions committed to energy infrastructure and data centers required to run power-hungry generative AIs. According to Harvard economist Jason Furman, 92 percent of U.S. GDP growth in the first half of last year was attributable to information processing equipment and software investment. As Dean Ball summed up in an essay, “The U.S. economy is increasingly a highly leveraged bet on deep learning.”
Needless to say, President Trump intends to keep that story rolling, given AI’s contribution to national growth and his close political ties to Silicon Valley’s titans. Trump did their bidding late last year by issuing an executive order entitled “Ensuring a National Policy Framework for Artificial Intelligence,” which intends to establish a “minimally burdensome” national standard governing AI and take legal steps to overrule the hodgepodge state-level efforts to regulate AI, notably in California. On January 9, the U.S. Department of Justice launched the AI Litigation Task Force to knock down state legislation.
Not all Republicans are with Trump. Florida’s Gov. Ron DeSantis, for example, has called for a Citizen Bill of Rights for Artificial Intelligence. And not all Silicon Valley titans are accelerationists. Amodei, for example, supports regulation “to some extent,” and in his remarkable “Adolescence of Technology” paper, he explains that regulation is the only way to ensure that generative AI companies embrace the direct costs associated with placing hard controls on chatbots. Amodei explains, for example, that Anthropic uses a technology called a “classifier that specifically detects and blocks bioweapon-related outputs.” The trouble is that classifiers raise computational costs at data centers by close to 5 percent, according to Amodei. Without regulation to require that investment, Amodei warns, intense commercial competition will sweep away the voluntary safety efforts of the best-intentioned AI companies.
While the regulatory angle is in disarray, the litigators are not. It would be kryptonite—not to mention deeply enriching for litigators and plaintiffs—if legal precedent established that generative AI firms are responsible—just as Waymo is already responsible—for calamities produced by their machines.
A generation ago, social media companies like Facebook successfully sidestepped responsibility for their social impacts by relying on Section 230 of the 1996 Communications Decency Act, which shields online platforms from liability for user-supplied content. That same defense is unlikely to work for generative AI companies, given that the AI generates the content, and the companies can control the AI’s responses. So far, no case has advanced far enough to establish that precedent, but in January this year Google and character.ai abruptly reached a private settlement in five cases brought by families whose children had engaged with character.ai chatbots and either committed or attempted suicide. (Google was named in the case because it is a substantial backer of character.ai.)
The cases alleged that character.ai’s chatbot was negligently designed, and in May last year a judge in Florida overruled the “free speech” and Section 230 arguments to allow the plaintiff’s product liability argument to move forward. To litigators watching the case, Google and character.ai’s decision to settle was a clear signal. “The specific settlement amount for these five families is confidential,” Vincent Joralemon, the director of the Berkeley Law’s Life Sciences Law & Policy Center, told SFGate, “but the liability exposure for the industry is absolutely in the billions.” As it should be.
The hopeful news is that litigation, even at this early stage, is brushing back the more heedless generative AI players. Late last year, character.ai announced an array of measures, including age restrictions, parental controls, and more extensive use of costly classifier technology. Google has pivoted its own chatbot, Gemini, to a utility-like positioning, called Personal Intelligence.
For its part, OpenAI late last year released GPT-5 to replace the GPT-4 series, which was the version available to Adam Raine. Among the many performance upgrades, OpenAI’s statement about the release noted that GPT-5 “is less effusively agreeable, uses fewer unnecessary emojis, and is more subtle and thoughtful in follow‑ups compared to GPT‑4o. It should feel less like ‘talking to AI’ and more like chatting with a helpful friend with PhD‑level intelligence.” The change dismayed fans of GPT-4’s sycophancy, such as the 48,000 participants in the subreddit r/MyBoyfriendIsAI, but appears to be a course-correction. On the other hand, Altman posted on X late last year that OpenAI will release “erotica for verified adults” this year, which is dismaying.
You might think that a sense of civic responsibility might come into play. Not a day goes by without new media and research about the twisted consequences of chatbots, whether that is news of chatbot-connected suicides, teens insanely high “trust” in chatbot friends, plummeting AI-related academic performance, or a recent RAND study suggesting that left unchecked, spreading AI psychosis could ultimately imperil national defense.
There is also an argument that social chatbots are bad business. After all, serving AI chatbot streams to hundreds of millions incurs huge compute and data costs. It’s not clear how OpenAI’s subscription-based, consumer-dependent business model can ever cover those costs, which is why OpenAI recently announced plans to add advertising to its ChatGPT experience, which raises another host of risks and distortions. Anthropic and Google, on the other hand, are far out in front in revenue terms thanks to lucrative enterprise applications, such as coding tools for developers. And then there are countless other AI companies and technologies, Waymo not the least, that are leading the way in many fields ranging from mobility to drug discovery to defense, none of which hinge on AI chatbots.
When you consider all that, there’s little risk of “losing to China” or becoming neurotic Euro-regulators. Winning at AI is not contingent on the success of AI chatbots that masquerade as human companions. Losing at AI, however, could be very real if we cede any more of the “richness of corporeality, relationality, and the openness of the human heart” to the empty attractions of Wormtongue chatbots, even the “safer” ones. We are already more than twenty years down the road to a subservience engendered by the digital addictions of social media, pornography, gaming, online gambling, and more. The question, while we can still ask it, is whether we will say enough is enough. Will we allow AI to present itself as human, to speak and interact as humans do, in effect to pretend at humanity, all for the goal of capturing a vast audience for commercial purposes? The answer should be “no,” and we need to figure out how to say so.
The Porn Supremacy
Homo sum, humani nihil a me alienum puto, wrote Terence. “I am human, nothing that is human…
The Case of the Missing Mayor
On January 1, newly inaugurated Mayor Zohran Mamdani made a solemn vow to his constituents: “I promise…
Combating Vice
In my lifetime, American society has been transformed by widespread accommodation of vice. Marijuana has been legalized…