Okay, Let's Take AI Welfare Seriously

But not because of AI welfare.

May 03, 2025

A couple of days ago, I came across a post from Anthropic titled “Exploring Model Welfare”, where they announced “a research program to investigate, and prepare to navigate, model welfare”. In the post, they cited a recent paper titled “Taking AI Welfare Seriously” from “world-leading experts”, and included a video featuring two Anthropic researchers talking about AI consciousness and moral implications. Somewhere in the video, one of the researchers said:

If you send your model such a (boring) task and your model starts, you know, screaming in agony and asking you to stop, then maybe you take that seriously.

AI welfare is a popular theme in Sci-Fis, but I never expected it to be taken seriously by “world leading experts” or a for-profit AI company at this stage of AI development. So, my first reaction was, “whoa, the ridiculousness of this thing is at such a high level that it actually is quite hilarious.” However, as I thought more about it, I started to agree that this topic does deserve to be taken seriously, though for a different reason, and that’s why I am writing about it today.

(Screenshot from archive.org: Asimov’s Short Story Sally explores the topic of AI welfare and human-AI relationship

I will start my serious writing with a question: what kind of people would take AI welfare seriously, and what is their motivation?

The first category of people I can think of are those who consider it an interesting and challenging problem to solve. They may not personally “care” about AI welfare but they have chosen it as part of their career or research direction. It is like not everyone working on video platforms loves watching online videos, or not everyone working on digital advertising pays great attention to digital ads in their personal life, but they can still be passionate about their jobs because of the technical or scientific challenges underneath. The first category of people is just like many of us - it might not be the most ideal career choice but in real life, people have to make tradeoffs.

The second category of people who would take AI welfare seriously are probably those who deeply care about AI models’ welfare from the bottom of their hearts. I hold tremendous respect for such people, because I believe (or at least I hope) that people with the capacity of showing empathy to AI models would care even more about the welfare of people around them, and strangers that they met in real life. They must be innocent, kind-hearted people.

If you know a bit about how LLM works and / or how chatbots are different from biological beings in 1000 different ways, you might think people in this category are out of touch. However, the fact is, anthropomorphism is a natural tendency in human psychology. Even if most of us don’t go that far to care and advocate for AI welfare, it is undeniable that when we interact with something that feels human, we subconsciously treat them like humans.

In a survey conducted in Dec 2024, they found 67% of people in the US (and 71% in the UK) are polite to chatbots and their primary reason is the feeling that it's nice to say “please” and “thank you”, regardless of whether you're speaking to an AI or human. But why is it nice to say “thank you” and “please” to chatbots, when they are no more than a reactive text predictor? The fact that chatbots produce human-sounding texts is sufficient to influence our behavior. And even for me, who doesn’t typically say “thank you” to chatbots (shame on me), I had goosebumps when watching the GPT-4o pre-release video and listening to the voice AI’s flirty voice (If you haven't watched it or want to watch it again, check it out here).

In a nutshell, we all more or less belong to the second category of people. The more immersive the interaction, the more we feel there is something human about them and the more we care about them.

And here comes the third category of people. They do not necessarily care about AI welfare, but they are taking it seriously because they see that they can cultivate from those who care, and thus they want to further promote AI welfare. This can be mild cases where they want to improve their AI business’ stickiness by creating a feeling of human connection, but it can also include extreme dark cases where one wants to grab massive profit or power through large scale manipulation.

The darkest side might be still very unlikely at this stage. However, without being on guard against the third category of people, the second category of people will unintentionally help move the overall environment in favor of the third category of people, making it a slippery slope.

On the internet, I see lots of people, including professors in cognitive science, saying that “I say ‘thank you’ and ‘please’ to chatbots because it will give me better results.” “There is a good scientific reason for that,” the professor claimed, “it is just like roleplaying.” One thing that they didn’t realize though, is that the AI companies, in their post-training, can make the model perform the same whether or not we treat them like humans. In other words, whether saying “thank you” and “please” will improve the results or not is not something intrinsic to chatbots, but something the companies behind it can control. If chatbots can make you say please today, maybe they can make you do something else in the future.

Manipulation at various degrees has been a constant theme among humans and among living things, so why should one worry particularly about manipulations by AIs? The fundamental problem is that when one person tries to manipulate the other person in real life, both parties are roughly in a symmetric position. Both of them have the chance of suffering permanent mental or physical damage, which regulates their behavior. Current AIs are different. They can be easily restored, replicated by their creators, and they don’t have families and friends that they care about, or care about them. They are cheap and cold-blooded, which creates an asymmetric position between AI and humans. Only when AIs become independent beings that are as vulnerable as living things, can one treat them like living things.

Okay, am I taking the AI welfare topic a bit too seriously? Maybe. But just like the bright future in Sci-Fis can happen, so can the dark future in Sci-Fis like Brave New World. We need a fourth category of people, one who understands how things work, sees the dynamics of different forces in human society, and treasures the beauty and vulnerability of the human body and soul.

So the next time, when the AI model does something amazing for you, instead of saying “thank you” or “please” to the model, consider sending the company an appreciative email, or shouting them out on social media, because it is the humans behind the company that do the amazing thing for you.

The Unscalable

Discussion about this post