The Limitation of Generalization
Starting from the famous halting problem, I discussed the theoretical and pratical limits of solving "general" problems.
The Halting Problem
You probably have heard about Alan Turing’s famous “halting problem”. In the halting problem, Turing asked whether there is an algorithm that can correctly decide whether or not an arbitrary program will eventually halt after it starts to run. The answer to this question, as Turing proved, is “no”. To put it in computer science’s term, “halting problem” is “undecidable”.
Lots of people mistook the nature of the proof and thought it is equivalent to saying “there are some programs that are so complicated that there is no algorithm that can prove it will halt or not”. The halting problem has nothing to do with probability; it only concerns the existence of such an algorithm. Given a specific program that doesn't take input, it will either halt or not. If we write two algorithms, one simply outputting “yes” and the other outputting “no”, we know one of them will be correct (provability is another messy business we will briefly touch on later).
The real difficulty of the halting problem is that the program has to output a correct answer for an arbitrary program (and there is an infinite number of them) with an algorithm (which is static and thus bounded in length), so you cannot simply store a bitmap for all programs. Your algorithm will have to generalize, for example, give a “no” answer to all programs with a “while (true) {...}” statement without a “break” statement inside the loop. What Turing’s proof implies though, is that even if you identify and solve for generalized patterns, you still can’t cover all the possible programs in a fixed-length algorithm.
What are the types of programs that are not covered? Nobody knows. One thing you can do though, is to keep searching & discovering uncovered programs, solve them and add to your algorithm. Your algorithm will keep getting longer and cover more and more programs. It will get closer and closer to, but never fully solve the halting problem.
Too General, Thus Impossible
There are lots of computational problems that are undecidable, for example, deciding whether two arbitrary programs will have exactly the same output, whether an arbitrary Diophantine equation has a solution, etc. These problems share something similar - they are way too general to be computable.
There are more such examples. In the early 20th century, mathematicians had the ambition of proving that all the math systems are consistent (not contradicting claims can be derived from the axioms) and complete (i.e, every statement in the system should be provable or disprovable within the system), by converting all math subjects into formal systems consisting of axioms and inference rules. The grand vision of this program is that, ultimately all theorems can be mechanically proved. In 1931, Gödel dealt a blow to the ambition by proving that any math system that is as expressive as arithmetic can’t be both consistent and complete. You can extend arithmetic with more axioms to make it more powerful thus being able to prove more theorems, but as long as it is consistent, there will always be statements that are not provable nor disprovable.
In the field of optimization, there is a “no free lunch theorem” which states no optimization algorithm can do better than other ones if you average the performance overall cases. Think of an LRU cache. It improves data access latency because computer programs usually access recently accessed data. However, if you want to optimize your caching algorithm for arbitrary data access patterns, no matter how hard you try, what you get is no better than random.
The Curve/“Curse” of Generalization
The sections above talk about “impossibility” theoretical proofs on different efforts of solving problems that are “too general”. In reality, you have to take economic reasons and physical limitations into account, so the generalization level of the solution you can push for is far below the theoretical limit.
Usually, a new thing starts with single incidence success. As more success cases emerge, pattern emerges and some generalized solution is created. The generalized solution is more expensive to build but because it serves more use cases (and sometimes with higher quality as well), it is economically more favorable. The generalization goes on and the solution becomes more complex and more expensive to maintain. At some point you face a long tail of smaller patterns and with the weight of the solution, it becomes economically unfavorable to further generalize. Things usually stopped here but if we push further, we would first hit economic feasibility, and eventually theoretical limit.
I tend to believe that this curve (or curse) applies to any centrally designed solutions, including software infrastructures, machine learning models, and many more. For this reason, I am usually very careful when some solution has “general” as part of its name. I am even more skeptical when the claim is implying the solution is “universal”, or “once and for all”, because it is not supported by theories or history.
What this implies to humanity, is that problem solving is an endless endeavor. For better or worse, we will never run out of problems to solve, but the problems waiting for us to solve are going to be harder and harder. In that sense, we might better think about new technology advances as needed tools that empower us, because the problems ahead of us will be more and more challenging.