The Utter Hypocrisy of Anthropic; and, How Anthropic Could Save the World From AI
Anthropic is more worried about AI than actual people. And, by losing its copyright lawsuit this December, it could potentially undermine the entire race to build AGI.
Do AIs Experience Anything When They Dream of Electric Sheep?
Last April, the Amazon-backed AI company Anthropic announced a new research program focused on what it calls “model welfare.” This refers to the possibility that AI systems like Anthropic’s Claude could have or acquire the capacity for consciousness, and hence experience things like pleasure and pain.
Recall that current AIs are just autoregressive next-token predictors, i.e., statistical prediction engines that probabilistically patch together bits and pieces of the Internet to produce strings of tokens that look like human language. Borrowing a useful term from Emily Bender, Timnit Gebru, and others, AI is just stochastically parroting their training data — that’s all. It’s a kind of parlor trick, not unlike Vaucanson’s Flute Player or the Mechanical Turk.

But the stochastic parrot status of AI hasn’t stopped the Effective Altruists (EAs) at Anthropic from worrying about “model welfare.” The argument is that if AIs can feel pleasure or pain, then they should be included within our “circles of moral concern.”1 Just this month, Anthropic decided to give Claude
the ability to end conversations in our consumer chat interfaces. This ability is intended for use in rare, extreme cases of persistently harmful or abusive user interactions. This feature was developed primarily as part of our exploratory work on potential AI welfare ...
Apparently, Elon Musk is going to add a “quit button” for Grok as well, in case Grok becomes disturbed by user interactions. Because, as Musk put it, “torturing AI is not ok.”
Some people praised these decisions. William MacAskill, a leading proponent of the EA-longtermist ideology that has massively influenced Anthropic and been embraced by Musk, wrote “Great to see!” in response to Musk’s tweets. He said that “I think this is the first meaningful action I’ve seen on model welfare” in response to Anthropic.2
Here’s my take on this: first of all, there’s absolutely no reason to believe that these types of systems are capable of having conscious experiences. Taking seriously the idea that they might be, or soon be, conscious is to fall victim to an anthropomorphic bias.3
We’re in the midst of a frightening flurry of tales about people believing that they’re able to talk to God or angels through AI, develop intimate (including sexualized) relationships with AI, and “wake up” their AI to become self-aware. All of this is the insidious result of anthropomorphic biases, which lead people to believe that AI is more than just glorified autocomplete. When MacAskill writes, “Sometimes, when an LLM has done a particularly good job, I give it a reward: I say it can write whatever it wants (including asking me to write whatever prompts it wants),” this too is an expression of that bias.
The LLMs (large language models) that power current AI are specifically trained to mimic human linguistic behavior, so it’s easy for gullible folks who don’t understand how LLMs work to think the system might be conscious, sentient, capable of suffering, or deserving of “a reward” (which humans, in the same situation, would appreciate, but which means literally nothing to AI).
Gerrymandered Circles of Moral Concern
What I find most appalling is the utter hypocrisy of the people who claim to care about “model welfare.” Not once have I seen MacAskill, for example, say anything about the actual harms that Anthropic has caused to real, living, breathing people — beings we know for a fact can actually suffer. He is, to my knowledge, completely silent about worker exploitation, intellectual property theft, and the fact that Anthropic is literally seeking investments from autocratic states like the UAE and Qatar.4
In a recently uncovered internal memo to Anthropic employees, CEO Dario Amodei admitted that the decision to accept money from these countries
would probably enrich “dictators” — something Amodei called a “real downside.” But, he wrote, the company can’t afford to ignore “a truly giant amount of capital in the Middle East, easily $100B or more.”
Further enriching dictators will have a very real impact on the lives of actual human beings living in those autocratic states, which severely limit the freedom of their citizens.
Even more, Anthropic is now facing the possibility of a class action lawsuit due to “its mass use of pirated books” — millions of them, Anthropic’s own lawyers concede — which could result in damages of well over $1 billion. NPR notes that
U.S. copyright law states willful copyright infringement can lead to statutory damages of up to $150,000 per infringed work. The ruling states Anthropic pirated more than 7 million copies of books. So the damages resulting from the upcoming trial could be huge.5
This has the potential to destroy the company. As Garrison Lovely reports about the situation,
the judge ruled last month, in essence, that Anthropic’s use of pirated books had violated copyright law, leaving it to a jury to decide how much the company owes for these violations. That number increases dramatically if the case proceeds as a class action, putting Anthropic on the hook for a vast number of books beyond those produced by the plaintiffs.
Anthropic downloaded these books from “pirate libraries” such as LibGen (Library Genesis) without any consent from or compensation to the authors. The company then
kept them accessible to its engineers, sometimes in multiple copies, and apparently used the trove for various internal tasks long after training. Even when pirate sites started getting taken down, Anthropic scrambled to torrent fresh copies. After a company co-founder discovered a mirror of “Z-library,” a database shuttered by the FBI, he messaged his colleagues: “[J]ust in time.” One replied, “zlibrary my beloved.”
This is not just illegal, but immoral because it’s stealing. Anthropic is stealing books to train their LLMs. Yet, at the same time, they claim to care about model welfare? The company has harmed real people — writers, including me6 — while dumping money into studying how Claude might feel “distressed” by unpleasant conversations with users.
I have warned before about “gerrymandered circles of moral concern,” and this is exactly the sort of thing I was talking about.
The same criticisms could be leveled at MacAskill: he’s apparently too busy promoting the highly speculative hypothesis that AI might be conscious to find a few minutes to tweet about the actual harms that AI has caused to writers, artists, and folks in the Global South who’ve made the whole AI shitshow possible. Or consider the execrable Musk, whose fake government agency DOGE shut down USAID, which is expected to cause 14 million deaths, including 4.5 million children under the age of 5, in the next five years. Yet Musk is now worried about AI being “tortured” by abusive Grok users?
This is the astonishing hypocrisy of gerrymandered moral circles. Forget about the known harms to people who can actually suffer; let’s focus instead on the hypothetical harms to LLMs.7
Could Anthropic Stop the AI Race?
The massive lawsuit facing Anthropic, which will start this December, brings me to a second point: Anthropic has an amazing opportunity to altruistically slow down or perhaps even halt the entire AI race. How?
Anthropic has a few options, given the lawsuit: it could settle, which means it might be “the only AI company forced to pay for mass copyright infringement.” Or it could appeal the judge’s ruling and try to “convince a higher court to reject” the decision. (Again, the judge ruled that AI companies can train on copyrighted material without permission, but only if those books aren’t pirated.) One problem is that appealing a judgement can take a very long time. So, to quote Garrison Lovely again, “the company faces a brutal choice: settle for potentially billions, or risk a catastrophic damages award and years of uncertainty” if they go to trial and then try to appeal.
Here’s the key point: “If Anthropic goes to trial and loses on appeal, the resulting precedent could drag Meta, OpenAI, and possibly even Google into similar liability.” In other words, if Anthropic goes down, it could take other AI giants down with it.
Why would Anthropic want these other companies to implode if that’s what happens to it? Well, Anthropic was founded by folks who used to work at OpenAI but came to believe that Altman doesn’t take “AI safety” seriously enough, and that his company had become too driven by commercial interests (maximizing profit). They think that “misaligned” artificial general intelligence (AGI) could potentially kill everyone on Earth. As the New York Times reports, Anthropic’s employees, many of whom are EA-longtermists, “worry, obsessively, about what will happen if AI alignment — the industry term for the effort to make AI systems obey human values — isn’t solved by the time more powerful AI systems arrive.” Or to quote Amodei himself,
there’s a long tail of things of varying degrees of badness that could happen [due to AGI]. I think at the extreme end is the … fear that an AGI could destroy humanity. I can’t see any reason in principle why that couldn’t happen.
In her fantastic book Empire of AI, journalist Karen Hao reports that every AI company is convinced that it — and only it — is sufficiently responsible to birth AGI. OpenAI was founded in 2015 because Altman and Musk were anxious that Demis Hassabis, cofounder of DeepMind, could become some kind of totalitarian leader if DeepMind reaches the AGI finish line before anyone else. As Musk warned in a 2016 email, Hassabis “could create an AGI dictatorship.”
Anthropic was then founded in 2021 because (as noted) some employees at OpenAI — the company created to prevent an AGI dictatorship — came to the conclusion that Altman is irresponsible and untrustworthy. So, if the folks at Anthropic really believe that AGI could have catastrophic consequences if developed by the wrong company, they could reason that the best thing to do is take everyone down with it. Depending on how Anthropic plays its cards, it could potentially stall or terminate the entire AGI race — at least in the US.
Indeed, the most obvious argument for not doing this concerns China. Many AI researchers in the US see themselves as engaged in a broader race against Chinese companies like DeepSeek. If “we” don’t win the AI race, then China almost certainly will — a central thrust of Leopold Aschenbrenner’s wildly naive essay “Situational Awareness,” which made the rounds in Silicon Valley and within political circles last year.8
Perhaps, then, Anthropic would see an upside to settling: their own company might be mortally wounded, but at least other US companies could survive to win the race against our geopolitical rival. Although Anthropic’s US competitors are not as responsible as Anthropic, it would claim, the outcome might still be preferable to handing over the baton to China.
Burn It All Down (Altruistically!)
Still, if Anthropic reflects on the situation carefully, it may very well opt to demolish — perhaps intentionally — the whole ecosystem of AI companies in the US, as this could significantly reduce the overall probability of AGI “existential risk” by removing most of the big players in the game. Doing so would, it seems, be consistent with its stated mission of benefitting humanity.
Their reasoning might mirror what Effective Altruists on the board of OpenAI said about firing Altman in 2023, namely, that “allowing the company to be destroyed ‘would be consistent with the mission’ of the company” to ensure that AGI benefits humanity and doesn’t cause an existential catastrophe.
As my friend Remmelt Ellen put it in a personal message (reprinted here with permission) after reading a draft of this article:
Basically, Anthropic staff could be really altruistic by sacrificing their company for the greater cause. They should help the lawyers on the other side to set a precedent that applies to other cases going forward.
What do you think? Please leave your thoughts below — I will read everything people share. (Immense thanks to Remmelt Ellen and Ewan Morrison for insightful feedback on this article.)
Thanks so much for reading and I’ll see you on the other side!
PS. My podcast Dystopia Now, with the comedian Kate Willett, just released an interview with the inimitable astrophysicist and trenchant critic of Silicon Valley utopianism, Adam Becker. Check it out below!!
This circle demarcates a class of entities that we should give moral consideration to and, therefore, include in our moral deliberations. You don’t think twice when you kick a rock over a cliff (assuming there’s no one below!) because you don’t include rocks within your circle of moral concern. They can’t feel pain or pleasure, and so don’t have any “moral status.” But you wouldn’t kick a kitten over the cliff precisely because they do have moral status, are worthy of moral consideration, and hence fall within our circles of moral concern.
Note that MacAskill’s former wife, Amanda Askell, a philosopher who’s also an EA, currently works at Anthropic. Furthermore, MacAskill has said that Holden Karnofsky is one of “two of the people who have most influenced my views on longtermism.” Karnofsky was roommates with Dario Amodei, and is now married to his sister, Daniela Amodei, who’s the president and co-founder of Anthropic. Karnofsky now works at Anthropic “to help craft safety strategy.”
This is unsurprising given that MacAskill was perfectly okay with Sam Bankman-Fried’s immoral and deceptive behavior. He even spread misleading but flattering lies about Bankman-Fried’s “altruistic” lifestyle. I have no reason to believe that MacAskill has much moral integrity.
Italics added.
My own book Human Extinction: A History of the Science and Ethics of Annihilation is on LibGen, so it’s entirely possible that Anthropic stole years of my intellectual labor to train their AI systems, which has helped enable their company’s valuation to reach $170 billion. And now, given their success, they’re in a position to seek additional investments from dictators in the Middle East. Outrageous stuff.
For further discussion, see this brilliant paper Abeba Birhane and Jelle van Dijk on the issue, which I’d highly recommend. Thanks to Remmelt Ellen for apprising me of this.
China, however, has explicitly called for international cooperation with respect to AI development. It’s not behaving like a bad actor in this space, unlike the US. Here’s a summary from Reuters of China’s proposal:
China proposes framework to govern AI development.
AI governance still “fragmented,” China's Li says.
Premier warns AI could become 'exclusive game' for few countries.
Li says China wants to share its AI development with others.
China releases AI governance action plan.
First of all, I think it's kind of a moot point to worry about the question who's responsible enough to build AGI. For very similar reasons I believe than the ones you lay out as reasoning for why we won't have to worry about AI welfare and/or consciousness anytime soon.
IF there is any reasonable way to speak about AGI, then it isn't really a big deal to begin with... I don't know if you've ever read this https://www.aisnakeoil.com/p/agi-is-not-a-milestone article here, but it's the first one so far that convinced me to treat AGI as something different than a mere advertisement term. (I guess that would just move the goalposts and make us worry about ASI and just shift the entire debate to that term? Really, anything that would be able to usher in the singularity, no?)
Secondly, given the current state of affairs in the US, the reliance of the US economy on datacenters, the influence of the tech community on the current president, the state of the stock market minus the magnificent seven and the entirely corrupt status quo, I also don't think that we need in any case to waste many thoughts on THIS being the straw that breaks the camel's back. In the highly unlikely case that the courts indeed give a firm ruling that would harm the industry as a whole I absolutely suspect that the entire matter would be declared a national security matter and overruled on that grounds, no?
I do see a potential bubble burst sometime soon, but I absolutely do not expect it to come from the direction of the courts.
This being said, love the expanding on AI welfare, reminds me on the entire debate on which life can and cannot be grieved (re:Butler), but I think I have to meditate a little further on that. Thanks in any case.