ChatGPT passes the Turing Test - now what?

In 1950, computer science pioneer and all-around genius Alan Turing published a paper called “Computing Machinery and Intelligence”. In it he set out to explore whether computers were capable of thinking or not. However, since answering that question first requires defining what “thinking” means, and since this in turn would open a philosophical can of worms, Turing decided to pose the question differently.

Assume you have a human being, who we will call the interrogator, another human who we will call a player, and a computer. The interrogator can communicate with the player and the computer through a terminal and ask them any question he or she wants. The goal is for the interrogator to determine which one of the players they are communicating with is the computer. If they can’t consistently determine who the computer is then the computer wins. Turing called this hypothetical game “The Imitation Game”. In time, it came to he known as the Turing test.

For decades the Turing test with all of its spin-offs has captured the imagination of experts, sci-fi writers, futurists, and laymen. It has inspired countless books, comics, graphic novels and even movies. Take for example the Academy Award winning movie Ex-Machina, in which the protagonist is tasked with running the test on a very advanced (and attractive) intelligent robot.

In its core the Turing test is about language. If the computer knows enough language to be able to communicate with a human being well enough as to fool them into thinking that they are not communicating with a machine then the computer would surely pass the test, but it would not necessarily prove that it can think or reason in any way. It would just prove that the computer can almost perfectly imitate human language.

Most 19th and 20th century scientists and philosophers who studied language and knowledge believed that knowledge is just linguistic, that knowing something simply means that you are capable of thinking the right sentence that explains that concept. According to this model, knowledge is a big web of sentences connected to each-other which express all the true claims we know and that expressing knowledge means grabbing the correct sentences from this web.

This view is still assumed by some people. Everything that can be known can be contained in a finite number of encyclopedia volumes, so just reading everything there is to read might give one a comprehensive knowledge of everything. This view also motivated a lot of the early work in Symbolic AI, where symbol manipulation — arbitrary symbols being bound together in different ways according to logical rules — was the default paradigm.

For researchers that subscribe to this view, an AI’s knowledge consists of a massive database of true sentences which are logically connected with each-other by hand, and the AI system can count as intelligent only if it manages to retrieve the right sentence at the right time. In other words, this view describes an AI that can pass the Turing test: if a computer says everything that it is supposed to say, then it must also mean that it understands what it’s talking about, since generating the right sentences at the right time exhausts knowledge.

This brings us to ChatGPT. As most people know by now, ChatGPT has a remarkable ability in generating the right sentences (most of the time). It can imitate styles, simulate a conversation, write essays, retrieve (but not really) information, and much more. It is almost impossible to tell that you are not conversing with a human at times.

In order to understand how ChatGPT works, we must first understand the theory behind it. ChatGPT is what is called a Large Language Model. A LLM is a probabilistic model over a human language (in our case that language is English). LLMs are trained to pick up on the probability of a word or sentence to appear in a specific context, looking to the surrounding words and sentences and then piecing together what is going on. This allows them to eventually come up with a plausible way to generate the most likely sentence based on the context in which that sentence is going to be generated in. ChatGPT has been trained by masking the future words in a sentence and by forcing the system to predict which word is most likely, eventually correcting it on bad guesses. The system then gets very proficient at guessing the most likely words, which makes it into an effective predictive system.

In practical terms, this means that every time you input a prompt in ChatGPT, you are giving the system an initial context on which to generate the response. Thus, it is simply using probability to generate the most likely sentences based on the user’s request. It can effectively generate all sentences required to explain a concept without actually understanding that concept. For example, it can explain the theory behind trigonometry but it cannot solve a simple trigonometric equation.

Another limitation of knowledge representation of LLMs is that they focus only on language. All knowledge representational schemas involve some sort of compression of information, but what gets left in and left out in the compression varies. Language struggles with more concrete information, such as describing the shape of objects, the functioning of a car engine or the nuanced complexities of a symphony. But there are non-linguistic representational schemes which can express this information in an accessible way: images, recordings, videos, graphs, and maps.

This explains why a machine trained on a language can have such a broad but shallow knowledge. It has acquired a small part of all human knowledge through a tiny bottleneck. But that small part of human knowledge can be about anything. It is like a vast shallow pond, it can seem endless, but if you dive into it you risk bumping your head at the bottom.