Generative AI's User Experience Puzzle to Solve
The wisdom from classic user experience design, the challenges of GenAI user experience and a hopeful message for the future.
It is not very common for a technology to garner as much hate and love as generative AI. Some of the hate comes from fear of AGI or hate of AGI hype, some of the hate comes from disruption of the incumbents, but they are just part of the sources. As a heavy user and a builder of GenAI, I have noticed how messy it is to use them, how easy it is to fall into AI slop, and how hard it is to create GenAI applications that are useful, safe and delightful to use for everyone. All these problems are user experience problems.
While all new technologies face user experience design problems, it is especially challenging and crucial for GenAI. Machine learning is the science of the artificial. It is very common for engineers and researchers to be immersed in optimizing for the artificial, forgetting whether and how the artificial can translate to real world experience.
Evolving the Chatbots: Where Is the Limit of Generated User Experience?
Originally published in 2000, Don’t Make Me Think, Revisited is a classic must-read on web usability design. One of the most important reminders from the book is that we as web users don’t read pages; instead we just scan them. And, no matter if you are a novice or an experienced user, we don’t usually try to figure out how things actually work; we just “muddle through”.
Such user behavior is deeply rooted in our biology. Conscious actions are slow and energy consuming, so we have evolved to do the vast majority of things subconsciously. Therefore, as the book pointed out, websites should be designed to be natural and intuitive for users to scan and muddle through with minimal errors.
Today’s LLM chatbots are without a doubt hugely successful, yet interestingly, they show a design pattern that is completely the opposite of web design wisdom. You need to write tedious text to use them, thinking about what context to include carefully. They tend to output loosely structured big blobs of text that are hard to scan without losing important information. The mistakes that they make are hidden inside nicely written languages that are hard to find out when just muddling through.
So why are LLM chatbots still so successful? I think the answer is, LLM chatbots fulfill the user need for complex information inquiry. Such user needs were largely fulfilled by multiple rounds of searching, reading and reasoning with the help of search engines, which is more time consuming than reading the text generated by an LLM.
However, that doesn’t mean the chatbots can’t or don’t have to be improved. Moreover, generic LLM chatbots have the ambition to be the everything app, which means they have to be great for use cases beyond complex information inquiry. In the future, the competition of generic chatbots might be less about “intelligence”, but more about how easy it is for users to muddle through with minimal errors.
This competition is already happening to some extent. I have found myself constantly scanning through chatbot’s responses to find what I want. And if you pay closer attention to different chatbots’s responses, you will notice how much the conciseness of information and the structure of the information matters to the usability of the chatbot. What is being generated is not merely chat messages; it is the user experience.
A key question that lies ahead of generic chatbots would be, how much can the generated user experience match or even beat the hand-crafted user experience of traditional web? If it can’t, how well can it seamlessly combine generated and hand-crafted experience together? Right now, it appears that ChatGPT is leading in providing the most intuitive experience; but as we can see, it still has a long way to go.
The AI Outsourcing Model: the Root of All AI Slop?
In design, a conceptual model refers to a simplified explanation in the users’ mind of how the product works. Turning the steering wheel turns the car in the same direction; double clicking the folder icon on a computer “opens” the folder to show files underneath - these simplified explanations of how things work are inaccurate and superficial, but they help users use the product more intuitively.
A conceptual model lives in the user’s mind, but it is directed by the design of the product.
A chatbot’s conceptual model resembles a person. As we discussed above, this conceptual model is not necessarily helpful - as long as I can get my job done, who cares if it feels like a person? On the contrary, by resembling a person, a chatbot generates texts that are more distracting than helpful.
Another popular conceptual model is what I called the AI outsourcing model. In this conceptual model, you completely hand off your job to a specialized GenAI agent, and it comes back with a nicely wrapped up result that is supposed to be good, but is hard for you to inspect or intervene.
Think of the “deep research” feature, which, in some chatbots, generates such a nicely written survey paper or research report for you that looks ready to publish. The problem with such a feature is, the feeling of being nicely written is just the formatting. Underneath the formatting, there is useful stuff, but there are also things that are either missing or wrong. Think about the report as a phone. From the outside, it is sealed up like a shiny iPhone. But when you start using it, it is not usable. So now what you need to do is to crack the shell of the phone, add missing pieces, replace malfunctioning pieces, and reassemble it yourself again. The question is, if it is not a finished product, why do they seal it up so tightly that one needs lots of effort to crack it open to collect useful pieces?
The same conceptual model is applied to one prompt generation of videos or apps, creating similar problems. The design choice not only makes it hard to extract useful information out of GenAI, but also deceives people by hiding frauds behind professional looking formats, leading to accidental or deliberate “AI slop”.
Another classic book about design, The Design of Everyday Things talked about three cases where the governments issued new coins that were very similar to existing coins, causing lots of confusion and in some cases, coins had to be recalled. Humans don’t use precise knowledge to make everyday decisions; we use shortcuts. The issue of very similar coins hacked our shortcuts and caused a “coin slop”.
Instead of optimizing for an outsourcing model, consider optimizing for an “AI crowdsourcing model”, where a human architect divides the whole project into small, concrete tasks, and hands it over to individuals - whether AI or human - to take over. How the task is done can remain a blackbox, but as long as there are human reviews before changes are committed, the whole project remains under the architects’ control.
In such a model, AI should not only optimize for intelligence, but also for collaboration - how to keep their changes clean, easy to review with minimal surprises. Such a conceptual model is less “shiny”, but it might be much more useful.
Human Experience Guided Research and Engineering
The development of AI in the past few years has been focused on the advancement of raw “intelligence” - the pursuit of cracking increasingly harder benchmarks. But ultimately, for AI to be useful, it has to serve humans and be supervised by humans. Optimizing AI for the human-AI joint intelligence might be more important than optimizing for the intelligence of AI alone. Doing such joint optimization would require AI research and engineering to be guided by experience design, design that optimizes usefulness, and intuitiveness for humans to achieve their goals.

