Over 60 years ago, Alan Turing (“a brilliant mathematician”) published a paper in which he suggested a pragmatic alternative to the question “Can machines think?”. His alternative took the form of a parlour game, in which a judge has a text-based conversation with both a computer and a human, and the judge has to guess which is which. He called this “The imitation game”, and it was ever since misinterpreted as a scientific test of intelligence, redubbed “The Turing Test”.
A little less conversation, a little more action please
It might surprise you that the question so often attributed to Alan Turing, “Can machines think?”, was not his, but a public question that he criticized:
Turing’s motivation was apparent throughout the paper: The question had been the subject of endless theoretical discussion and nay-saying (This is still the case today). As this did not help the field advance, he suggested that we should take a more pragmatic and constructive stance. He used the concept of his imitation game as a guideline to counter stubborn philosophical arguments against machine intelligence, and urged his colleagues not to let those objections hold them back.
A test of unintelligence
Perhaps the most insightful part of the paper are the sample questions that Turing suggested. They were chosen deliberately to represent skills that were at the time considered to require intelligence: Math, poetry and chess. It wasn’t until the victory of chess computer Deep Blue in 1997 that chess was scrapped as an intelligent feat. If this were a test to demonstrate and prove the computer’s intelligence, why then are the answers below wrong?
To the poetry question, the imaginary computer might as well have written a sonnet and so proven itself intelligent (A sonnet is a 14-line rhyme with a very specific scheme). Instead it dodges the question, proving nothing.
The math outcome should be 105721, not 105621. Turing later highlights this as a counterargument to “Machines can not make mistakes”, which is the awkward yet common argument that machines only follow preprogrammed instructions without consideration.
The chess answer is not wrong though. Given two kings and one knight on a board, the computer moves the knight to the king’s row. But a mere child could have given that answer, as it is the only move that makes any sense.
These sample answers pass up every opportunity to appear intelligent. One can argue that the intelligence is ultimately found in pretending to be dumb, but one cannot deny that this conflicts directly with the purpose of a test of intelligence. Rather than prove to match “the intellectual capacities of man” in all aspects, it only proves to fail at them, as most humans would at these questions. Clearly then, the imitation game is not for demonstrating intelligence.
The rules: There are no rules
The first encountered misinterpretation is that the computer should pretend to be a woman specifically, going by Turing’s initial outline of the imitation game concept, in which a man has to pretend being a woman:
However I suggest that people who believe this should read beyond the first paragraph. There are many instances where Turing refers to both the computer’s behaviour and its opponent’s as that of “a man”. Gender has no bearing on the matter since the question is one of intellect.
The second misinterpretation is that Turing specified a benchmark for a test by this statement:
5 minute interrogations and (100%-70%=) 30% chance of misidentifying the computer as a human; Many took these to be the specifications of a test, because they are the only numbers mentioned in the paper. This interpretation was strengthened by the hero-worship that anything a genius says must be a matter of fact.
Others feel that the bar Turing set is too low for a meaningful test and brush his words aside as a “prediction”. Yet at the time there was no A.I. to base any predictions on, and Alan Turing did not consider himself a clairvoyant. In a later BBC interview, Turing said it would be “at least 100 years, I should say” before a machine would stand any chance in the game, where earlier he mentioned 50 years. One can hardly accuse these “predictions” of being attempts at accuracy.
Instead of either interpretation, you can clearly read that the 5 minutes and 70/30% chance are labeled as Alan Turing’s personal beliefs in possibilities. His opinion, his expectations, his hopes, not rules to a test. He was sick and tired of people saying it couldn’t be done, so he was just saying it could.
On the subject of benchmarks, it should also be noted that the computer has at best a 50% chance, i.e. a random chance of winning under normal circumstances: If the computer and the human in comparison both seem perfectly human, the judge still has to flip the proverbial coin at 50/50 odds. That the judge is aware of having to choose is clear from the initial parlour game between man and woman, and likewise between human and computer, or it would beat the purpose of interrogation:
How well would men do at pretending to be women? Less than 50/50 odds, I should think.
Looks like a test, quacks like a test, but flies like a rock
Not only are the rules for passing completely left up to interpretation, but also the manner in which the game is to be played. Considering that Turing was a man of exact science and that his other arguments in the paper were elaborate to the point of calculating the necessary digital storage space, would he define a scientific test so vaguely? We find the answer in the fact that Turing mainly refers to his proposal as a “game” and “experiment”, but rarely as a “test”. He makes no mention of “passing” and even explains that it is not the point to try it out:
The pointlessness proved itself in practice: Yes, several chatbots have passed various interpretations of the game, most notably Eugene Goostman in 2014, and even Cleverbot passed one based on audience vote. But did an intelligent program ever pass? No. Although nobody can agree on what intelligence is, everybody including the creators do agree that those that passed weren’t intelligent or thinking; They worked mainly through keyword-triggered responses.
Winning isn’t everything
Although Turing did seem to imagine the game as a battle of wits, ultimately its judging criteria is not how “intelligent” an A.I. is, but how “human” it seems. In reality, humans are much more characterised by their flaws, emotions and eccentricities than by their intelligence in conversation, and so a highly intelligent rational A.I. would ironically not do well at this game.
In the end, Turing Tests are behaviouristic assumptions, drawing conclusions from appearances like doctors in medieval times. By the same logic one might conclude that a computer has the flu because it has a high temperature and is making coughing sounds. Obviously this isn’t a satisfying analysis. We could continue to guess whether computers are intelligent due the fact that they can do math, play chess or have conversations, or we could do what everybody does anyway once a computer passes a test: Ask “How does it work?”, then decide for ourselves how intelligent we find that process. No question could be more scientific or more insightful.
So, where does that leave “The Turing Test” when it was never an adequate test of intelligence, nor meant to be? Personally I think Turing Tests are still suitable to demonstrate the progression of conversational skills, a challenge becoming more important with the rise of social robots. And it is important that the public stay informed to settle increasing unrest about artificial intelligence. Other than that, I think it is time to lay the interpretations to rest and continue building A.I. that Alan Turing could only dream of.
In ending, more than any technical detail, I ask you to consider Turing’s hopes: