The AI Knowledge Problem: Leibnitz, Wolfram, and LLMs

Gottfried Wilhelm Leibnitz (1646-1716), who famously invented calculus at the same time as Isaac Newton (Newton got the credit; we got the Leibnitz notation, thankfully), also believed that if we could discover the correct way of representing our world in symbols, we could calculate the world using logic and mathematics. He also invented binary mathematics and made significant advances in logic to this end.

Stephen Wolfram is on a similar mission to make the world computable. For about 40 years, he has been working diligently on this project, starting with Mathematica, then several years ago expanding that to the Wolfram Programming Language, and more recently to the Wolfram Physics Project. Along the way, Wolfram has expanded on the ideas of von Neumann, Turing, and Godel by describing the principle of computational equivalence, and by carefully differentiating between the traditional, formula-based methods of math and physics, and computational or algorithmic methods.

Large Language Models (LLMs) like GPT-4 (ChatGPT) provide text that sounds like their training data, given prompts that they can use to predict what text most likely comes next. They neither encodes things about the world based on knowledge of how the world works for possible later computation (Leibnitz), nor do they compute answers to questions about the world by feeding knowledge into a compute engine (Wolfram). They simply predict what they think a human would write based on about 1500-3000 words of written prompt.

What do these have to do with the AI Knowledge Problem?

They represent three fundamentally different ways of dealing with it.

Leibnitz, based on the belief of his time that the world was inherently knowable, started with quantifying knowledge about the world. He believed that if we could solve the problem of describing the world in a consistent, logical, and rigorous manner, we could then perform computation based on that knowledge to make new discoveries. Alas, Leibnitz was burdened with living in a time before computing machines were available for him to fully experiment with his ideas.

Wolfram, who is burdened with the additional knowledge of the past 300 years, most specifically that of Godel, who showed that there is a limit to what is knowable within any system and that nothing is 100% knowable, chose to start with computation. He seems to believe fundamentally that if we build a good enough compute system, then we can feed it knowledge and get good answers. He has had some success with this approach. Most of us use his technology, even if we don’t know it. Wolfram Alpha, his “search engine” built on Wolfram Language, powers big chunks of Siri and Bing.

LLMs basically ignore the knowledge problem altogether. The most generous thing that can be said about LLMs and knowledge is that they regress to the mean of the training data, and if the training data overwhelmingly supports one hypothesis, the LLM will likely reproduce that hypothesis and some supporting statements. Of course, it can be prompted in a way to argue against a prevailing hypothesis. Unless the OpenAI safety engineers have decided that doing so causes a problem.

So we have a few choices about how to approach the AI Knowledge Problem.

Option 1 – The Leibnitz Paradigm. We can start with knowledge, properly quantified and qualified, and add compute and a friendly interface. The challenges here are several, not the least of which is that there are fundamental limitations on knowability which will require us to qualify our knowledge with a measure of uncertainty. Luckily, we have developed the mathematics necessary to do this. There are societal barriers as well, especially that those who profit from certainty do not welcome uncertainty, and they certainly do not gladly accept falsification. But to advance our knowledge, we need both – to acknowledge uncertainty and to welcome falsification.

Option 2 – The Wolfram Paradigm. We can start with computation and add knowledge and a friendly interface. This has the advantage of appearing to be useful more quickly by providing answers earlier in the process of making the world computable. But we have a GIGO (Garbage In, Garbage Out) problem when we scrape the internet for the knowledge to feed our compute engine. Without a rigorous understanding of the quality and certainty of the information going into the compute engine, we cannot assess the quality of the outputs. The risk here is that we attribute rigor to the process based on the computation, but forget that the computation depends on the rigor of the data.

Option 3 – The ChatGPT Paradigm. We can ignore the knowledge problem and use AI and human intervention to ensure that the outputs of our tools conform to some vaguely-defined standard of acceptability. There are many problems here. First, what is the standard of acceptability? If we are building tools to solve problems, as OpenAI claims in their advertising, then we risk overlooking solutions to difficult problems because of an arbitrary standard of acceptability, which may very well rest upon the belief that “we already know the right answer.” Second, if OpenAI continues to refuse to make their tool Open, then the users will not be able to assess the rigor of the output at all. Third, and perhaps most fundamentally, LLMs regress to the mean of the training dataset. As LLMs get more capable, the training data is becoming something like “a weighted representation of the entire internet.” The predictable result of LLMs becoming authoritative sources is that we continue to divide into our increasingly polarized camps and throw experts at each other. Even if we escape the polarization, we are still regressing to the mean of what we already think we know, which is a terrible way of inventing new solutions to difficult problems.

Which Option do you think holds the most promise for helping us solve our most difficult problems?

The AI Knowledge Problem

Artificial Intelligence has a knowledge problem. The latest wave of big AI tools, Large Language Models (LLMs) like ChatGPT/GPT4 and Llama, don’t encode actual knowledge about the world. They are only models that predict what words should come next, based on many billions of words of training data. LLMs usually sound like they are providing “correct” answers to our prompts, but this is merely a side-effect of the fact that LLMs are trained on data that contains enough correct information that it usually ends up sounding correct.

We are already seeing people getting frustrated with LLMs for “hallucinating” or “making things up” when the LLM generates text that is wrong or nonsensical. In Europe, government regulators are hard at work trying to mitigate potentially harmful uses and outputs of LLMs. OpenAI, the company behind ChatGPT, is working hard to make sure their product is safe, and at the same time, they are contributing to the problem by selling ChatGPT as a problem-solving tool.

A Very Brief History of AI

In the 1940s and 1950s, mathematicians and computer pioneers (most of whom were mathematicians) got excited about the potential of machines to think like humans. In the first wave of AI, this looked primarily like programming computers to play chess and checkers. These programs are search algorithms – they build a tree of possible moves and then search it for winning moves based on rules that assign values to different board positions. It soon became apparent that because of the number of possible positions in chess (about 10^44), we wouldn’t have enough computing power to search deeply enough to beat humans at chess for a very long time, and researchers moved on to other problems.

The second wave of AI in the 1960s-1980s consisted largely of heuristics engines and expert systems. Heuristics engines use a set of shortcuts or rules of thumb to quickly find a “good enough” solution to a complex problem that might not be solvable with traditional methods. Most anti-virus software uses a heuristics engine to identify suspicious behaviors. Expert systems encode knowledge about a specific domain of expertise in a rigorous way and use formal logic to solve problems within that domain. Examples include programs that can generate and verify mathematical proofs and programs for medical diagnosis. These systems were very highly developed by the early 1980s but fell out of favor for two main reasons: (1) they were extremely expensive to develop and maintain, and (2) they behaved unpredictably when faced with inputs that didn’t map directly to the underlying knowledge.

In the 1990s, computing power caught up to the chess problem enough to allow Deep Blue to beat Gary Kasparov, the top human player of the era. It was a landmark moment in AI, but looking back, probably said more about the weakness of human chess players than the strength of computers. We still can’t compute the full game tree of chess, or even exactly how many legal, reachable board positions there are.

By the early 2000s, large pattern recognition problems started to gain favor, as the combination of increasing computing power and improved neural network methods captured the imagination of researchers and programmers. Significant advances were made in techniques that would eventually prove fruitful in areas like voice recognition, handwriting recognition, facial and object recognition, and language translation. In 2012, a major neural network advance made image recognition practical, and neural networks have dominated the development of AI since then.

In the 2010s- present, the advent of compute-as-a-service enabled by cloud computing and the use of parallel processing GPUs to implement neural networks have lowered the barriers to AI development. LLMs were initially made possible by this new compute environment, and are gradually being moved onto smaller platforms. What we haven’t seen yet is a return to the knowledge-based approaches of the 70s and 80s.

A Call to Action

As we are poised to become dependent on LLMs to enhance our productivity, it will be increasingly urgent that we build a knowledge base under these tools. If we fail to do so, we risk granting authority to LLMs that are fundamentally untethered from actual knowledge of the world.

In future posts, we will examine the human knowledge problem, ponder what a return to knowledge-based AI might look like, and consider how incorporating a bottom-up knowledge representation might help us solve some of the biggest problems in scientific research today.