Intelligence Is About How Much You Have Forgotten
Every day, there is an insane amount of information that gets to our brains through our eyes, ears, touches, etc., but we somehow manage to forget most of them.
If I say the average college math students today are smarter than Archimedes (who lived more than 2000 years ago and is widely considered as one of the greatest mathematicians of all time), very few people would agree. But the fact is, today’s math students do know a lot more than Archimedes did. Archimedes didn’t know about complex numbers or abstract algebra. He did know some calculus but he likely wouldn’t be able to do moderately complex calculus calculations. And obviously he didn’t know how to program or build a machine learning model.
As you can see, we consider Archimedes one of the greatest mathematicians of all time not because of how much he knew, but because of how much and how deep he could derive from the little knowledge the world had at that time and from his own experience.
While most of us are (well at least I am) not as smart as Archimedes, we all possess some similar qualities. Similar to Archimedes, every day, there is an insane amount of information that gets to our brains through our eyes, ears, touches, etc., but we somehow manage to forget most of them. Instead, our brain derives some basic rules of the world - shaper objects can cut things more easily, frictions make objects hotter, etc. From there, we as a species derive higher & higher levels of abstractions about the world - mechanics, arithmetic, algebra, and so on.
Why did evolution give us the ability to forget most things we experience while still being able to learn from the experience? I have no clue. However, I can definitely see the benefits of abstraction over memorization:
Faster to adapt to new circumstances. This is because the basic rules are unlikely to be out of date and secondly, even if they get out of date, you just need to update a small set of rules instead of overriding all the neuron connections in the brain.
Easier to share. Each individual can only see part of the world. Sharing can grow the knowledge of everyone, but it will be only feasible if what is being shared is a handful of rules instead of a full dump of memory.
More cost effective. By abstracting most things into rules and forgetting about the details, we can then use the memorization for fresh information or urgent scenarios where we need faster reactions.
Among these 3 advantages, the ease of sharing is especially important because our civilization is built on top of ideas that are built on top of other ideas. Without the ability to share, our civilization won’t exist.
Today’s LLM is apparently a different kind of existence. They are like some of the students that I encountered before, who could write seemingly great compositions because they read lots of example compositions, remembered some pretty sentences or phrases and could use them in the right places. Good teachers, though, could tell that there wasn’t lots of thinking and fundamental opinions in what they wrote. If we want these models to be truly intelligent who can automatically adapt to new environments and learn from other models (I will talk about in my next post why learning from other models is essential for machine intelligence), they have to learn to abstract and forget.
Below is the Chinese version (translated and modified on top of GPT4):
如果我说今天的大学数学学生平均来看比阿基米德(他生活在2000多年前,被广泛认为是有史以来最伟大的数学家之一)更聪明,很少有人会同意。但事实是,今天的数学学生确实知道的比阿基米德多。阿基米德不知道复数或抽象代数。他确实知道一些微积分,但他很有可能无法进行中等复杂度的微积分计算。显然,他也不知道如何编程或构建机器学习模型。
所以,我们认为阿基米德是有史以来最伟大的数学家之一,不是因为他知道了多少,而是根据他从当时世界上的少量知识和自己的经验中归纳出的东西的数量和深奥程度。
虽然我们大部分人(至少我)不如阿基米德聪明,但我们都具有与他类似的品质。类似于阿基米德,每天,我们的大脑通过我们的眼睛、耳朵、触摸等接收到大量的信息,但我们不知怎的忘记了其中的大部分。相反,我们的大脑归纳出一些世界的基本规则 - 更锐利的物体可以更容易地切割东西,摩擦使物体更热等。从那里开始,我们这个物种又归纳出关于世界的更高层次的抽象概念 - 力学、算术、代数等。
为什么进化给了我们忘记大部分我们经历的事情同时又能从中学习的能力?我不得而知。然而,我确实可以看到抽象胜过记忆的好处:
更快地适应新环境。这是因为基本规则不太可能过时,其次,即使它们过时了,你只需要更新一小套规则,而不是重写大脑中的所有神经连接。
更容易共享。每个个体只能看到世界的一部分;共享可以增长每个人的知识。但只有当共享的是一些规则而不是完整的记忆时,这才是可行的。
更具成本效益。通过将大部分事物抽象成规则并忘记细节,我们可以使用记忆存储新信息或紧急情况下需要快速反应的场景。
在这三个优点中,共享的便利性尤为重要,因为我们的文明是建立在想法之上的,而这些想法又是构建在其他想法之上。没有共享的能力,我们的文明不可能出现。
今天的大型语言模型显然是一种不同的存在。它们就像我遇到的一些学生,他们能写出看似很棒的作文,因为他们阅读了大量的范文,记住了一些漂亮的句子或短语,并能在合适的地方使用它们。然而,好的老师会看得出他们写的东西中没有很多思考和深刻的观点。如果我们希望这些模型拥有真正的智能,能够自动适应新环境并从其他模型学习的智能(在我的下一篇文章中,我将展示为什么从其他模型学习对于机器智能来说是必不可少的要素),它们必须学会抽象和遗忘。