The most sensational A.I. news ever!

News sites are constantly oozing bold overstatements about artificial intelligence. Most scientists describe their research accurately enough in their papers, but journalism always tries to cut a slice of the Terminator movies’ popularity in order to make the science appeal to the general public. Unfortunately such calls upon the imagination tend to border on misinformation. Here is a selection of the most sensationalised news stories that made waves in recent history:

2014: Robot becomes indecisive after implementing the 3 laws of robotics

“A robot may not injure a human being or, through inaction, allow a human being to come to harm.”

So reads the “first law of robotics” from Asimov’s science-fiction novels. Someone set up an experiment with three small wheeled robots, two of them representing humans, and a third one was provided with behavioural rules based on the above:  The robot was programmed to avoid colliding with (“injuring”) the “humans”, except to intercept them if it saw one heading towards a square designated as unsafe. When two “humans” were introduced simultaneously, the robot took so long hesitating which one to “save” that it failed to save either.

This fired up the usual flood of discussions about ethics and how to improve upon Asimov’s “laws” (Newsflash: Nobody uses them), but programmers were quick to point out that this was just poor programming: The simple “if-then” rules (something like this, I imagine) did not allow the robot to take more than one target into account at a time, so it just mindlessly jittered back and forth between the two. It could not make a decision because it had no decision processes to begin with.
factual source

2014: A supercomputer has passed the Turing Test for the first time
The organiser’s boast of a “supercomputer” having passed this “milestone” intelligence test was blatantly false, but all the papers ran the story without question. In reality it concerned an ordinary chatbot with keyword-triggered responses on an ordinary computer. Although this chatbot did pass “a” version of a Turing Test by deflecting questions like a zany teenager, there has never been agreement on the rules of “the” Turing Test (because there is no such thing)*.

The passing of this supposed test of intelligence was particularly insignificant because the judges were only given 5 minutes to interrogate both the chatbot and a human volunteer at the same time. This allowed for only 5 to 10 questions and so barely probed beyond the “Hello, how are you?” stage, while discrepancies in the responses could be attributed to the chatbot’s pretence of being a 14 year old non-native speaker. The scientific backlash that followed cast the Turing Test into discredit and led to a number of new tests, such as the Winograd Schema Challenge*.
factual source

NAO robots. See, hear, speak.

2015: First robot passes self-awareness test
Inspired by an ancient philosophical puzzle, three NAO robots were each given an imaginary “dumbing pill” (i.e. a button was pressed) that muted two of them, except the third robot was given a “placebo pill” that did nothing. Each robot was then asked to assess which “pill” it got, which none of them knew. But when the one robot that could still speak heard itself say “I don’t know”, it performed its analysis a second time and said “Sorry, I know now! I was able to prove that I was not given a dumbing pill”.

As cute as that performance was, this wasn’t a “test”. Every step of the procedure was pre-programmed specifically and exclusively for this scenario of pills and sound. The programmers had laid out the exact inference to execute and which outcome to conclude if a robot were to hear sound at the time that its output function activated. As that inference might as well be applied to any external object, the only connection with the robot’s “self” was the detour of audio output to audio input, and that’s a bit of a technicality. Most people’s definitions of “self-aware” include retaining a model of oneself and the capacity of reflection upon that model, and these robots had nothing of the sort.
factual source (paper)

2015: Robot attacks and kills factory worker
No laughing matter, a robotic arm at a Volkswagen car construction factory crushed a man when it swivelled, after which he died of his injuries. While Twitter was set aflare with warnings of a robot uprising, the robot arm had of course not done this on purpose. The man was a technician, who was installing the arm while standing inside the safety cage rather than outside it.

This ordinary industrial accident only gained popular media coverage because it was initially reported by a co-worker whose name closely resembled that of the leading lady from the Terminator movies, Sarah Connor.
sarahconnor
factual source

2017: Facebook shuts down AI experiment after robots invent their own language
Most articles put it as if the AI had become smart beyond human comprehension and its creators had pulled the plug in a panic, just like in the movies.
The reality was a different story. Facebook had trained two chatbot programs to barter and negotiate over a number of items using English phrases, aiming for an exchange like this:Intended chatbot dialogue: "You get one book and I'll take everything else." "No way, I need all 3 hats"
When they hooked the chatbots up to one another, their use of words gradually deteriorated to a shorthand where they just repeated the most effective keywords, because their programming did not include any rewards for maintaining English syntax.

A: balls have zero to me to me to me to me to me to me to me to me to me
B: you i everything else . . . . . . . . . . . .
A: balls have a ball to me to me to me to me to me to me to me
B: i i can i i i everything else . . . . . . . . . . . .
A: balls have a ball to me to me to me to me to me to
B: i . . . . . . . . . . . . . . . . . . .

This is a common flaw according to other machine learning practitioners. Since this gibberish was not useful for what they were trying to achieve, the researchers simply stopped the programs, and changed the reward parameters in their next versions.
The real reason that this got media attention was that Elon Musk and Facebook’s CEO had recently been in the news with strongly opposing views on whether AI was a threat to humanity. As such, it would have made an ironic story if Facebook’s own AI had gone out of control.
factual source

2017: Sophia the robot was granted citizenship
A lifelike humanoid robot called Sophia, a creation of Hanson Robotics, was granted honorary citizenship by Saudi Arabia at a tech conference in Riyadh. This raised all sorts of issues about human/robot rights, and many people took Sophia’s on-stage acceptance speech to be a genuine indication of her capabilities, feelings and opinions.
The truth is however that Sophia is merely an animatronic that only recited what her makers had written for her to say, in an entirely scripted interview. Sophia’s conversational subsystem actually uses ChatScript (prior to 2016 it also used AIML), which is a scripting language for writing keyword-based chatbots. In many “interviews” its responses are even more simply triggered by an operator behind the scenes pressing “play”, not even using speech recognition.

Why would a mindless animatronic be granted citizenship? Well, the crown prince of Saudi was giving the country a modernisation makeover, and this announcement served as a PR signal to international investors attending the conference. What the announcement and press failed to mention was that it concerned honorary citizenship, a strictly symbolic gesture that does not grant any of the rights of normal citizenship. Discussions about rights needn’t have applied.
factual source

Sophia the robot reads lines from a script

The sky falls every day
These stories are just the highlights. The Turing Test organiser went on to claim that programs could pass the test by invoking the fifth amendment, the NAO robot programmers went on to suggest their robots had learned to disobey orders, and Hanson’s robots have made headlines multiple times for threatening to overthrow mankind. Not a day passes without some angsty story about AI making the rounds.

Regrettably these publicity stunts can have real and harmful consequences. Whenever AI became overhyped in the past, the entire field imploded as the high expectations of investors could not be met. And when the public and governments start buying into fearmongering by famous public figures, it draws attention away from real problems to imaginary ones. Most researchers are just working on practical applications and are none too happy about their work being so misrepresented.sophia_fake_ai
That is why I decided to develop a nonsense filter, which you’ll find in the next article*.

Advertisement

The Myth of the Turing Test

Over 60 years ago, Alan Turing (“a brilliant mathematician”) published a paper in which he suggested a pragmatic alternative to the question “Can machines think?”. His alternative took the form of a parlour game, in which a judge has a text-based conversation with both a computer and a human, and the judge has to guess which is which. He called this “The imitation game”, and it was ever since misinterpreted as a scientific test of intelligence, redubbed “The Turing Test”.

A little less conversation, a little more action please
It might surprise you that the question so often attributed to Alan Turing, “Can machines think?”, was not his, but a public question that he criticized:

I propose to consider the question, “Can machines think?” – If the meaning of the words “machine” and “think” are to be found by examining how they are commonly used, – the answer to the question is to be sought in a statistical survey. But this is absurd. Instead of attempting such a definition I shall replace the question by another.

“Are there imaginable digital computers which would do well in the imitation game?”

The original question, “Can machines think?” I believe to be too meaningless to deserve discussion.

Turing’s motivation was apparent throughout the paper: The question had been the subject of endless theoretical discussion and nay-saying (This is still the case today). As this did not help the scientific field advance, he suggested that we should take a more pragmatic and constructive stance: If a machine could in all intellectual circumstances respond as a human would, then for all intents and purposes, one should regard it as a thinking machine. He used the concept of his imitation game as a guideline to counter stubborn philosophical arguments against machine intelligence, and urged his colleagues not to let those objections hold them back.

I do not know what the right answer is, but I think both approaches should be tried.
We can only see a short distance ahead, but we can see plenty there that needs to be done.

A test of unintelligence
Perhaps the most insightful part of the paper are the sample questions that Turing suggested. They were chosen deliberately to represent skills that were at the time considered to require intelligence: Math, poetry and chess. It wasn’t until the victory of chess computer Deep Blue in 1997 that chess was scrapped as an intelligent feat. If this were a test to demonstrate and prove the computer’s intelligence, why then are the answers below wrong?

Q: Please write me a sonnet on the subject of the Forth Bridge.
A : Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?
A: (After a pause of 15 seconds) R-R8 mate.

To the poetry question, the imaginary computer might as well have written a sonnet and so proven itself intelligent (A sonnet is a 14-line rhyme with a very specific scheme). Instead it dodges the question, proving nothing.
The math outcome should be 105721, not 105621. Turing later highlights this as a counterargument to “Machines can not make mistakes”, which is the awkward yet common argument that machines only follow preprogrammed instructions without consideration.

The machine (programmed for playing the game) would not attempt to give the right answers to the arithmetic problems. It would deliberately introduce mistakes in a manner calculated to confuse the interrogator.

The chess answer is not wrong though. Given two kings and one knight on a board, the computer moves the knight to the king’s row. But a mere child could have given that answer, as it is the only move that makes any sense.

These sample answers pass up every opportunity to appear intelligent. One can argue that the intelligence is ultimately found in pretending to be dumb, but one cannot deny that this conflicts directly with the purpose of a test of intelligence. Rather than prove to match “the intellectual capacities of man” in all aspects, it only proves to fail at them, as most humans would at these questions. Clearly then, the imitation game is not for demonstrating intelligence.

The rules: There are no rules
The first encountered misinterpretation is that the computer should pretend to be a woman specifically, going by Turing’s initial outline of the imitation game concept, in which a man has to pretend being a woman:

It is played with three people, a man (A), a woman (B), and an interrogator –
What will happen when a machine takes the part of A in this game?

However I suggest that people who believe this should read beyond the first paragraph. There are many instances where Turing refers to both the computer’s behaviour and its opponent’s as that of “a man”. Gender has no bearing on the matter since the question is one of intellect.

Is it true that – this computer – can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?

The second misinterpretation is that Turing specified a benchmark for a test by this statement:

It will simplify matters for the reader if I explain first my own beliefs in the matter. –
I believe that in about fifty years’ time it will be possible, to program computers – to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.
– I now proceed to consider opinions opposed to my own.

5 minute interrogations and (100% – 70 =) 30% chance of misidentifying the computer as a human; Many took these to be the specifications of a test, because they are the only numbers mentioned in the paper. This interpretation was strengthened by the hero-worship that anything a genius says must be a matter of fact.
Others feel that the bar Turing set is too low for a meaningful test and brush his words aside as a “prediction”. Yet at the time there was no A.I. to base any predictions on, and Alan Turing did not consider himself a clairvoyant. In a later BBC interview, Turing said it would be “at least 100 years, I should say” before a machine would stand any chance in the game, where earlier he mentioned 50 years. One can hardly accuse these “predictions” of being attempts at accuracy.

Instead of either interpretation, you can clearly read that the 5 minutes and 70/30% chance are labeled as Alan Turing’s personal beliefs in possibilities. His opinion, his expectations, his hopes, not rules to a test. He was sick and tired of people saying it couldn’t be done, so he was just saying it could.

On the subject of benchmarks, it should also be noted that the computer has at best a 50% chance, i.e. a random chance of winning under normal circumstances: If the computer and the human in comparison both seem perfectly human, the judge still has to flip the proverbial coin at 50/50 odds. That the judge is aware of having to choose is clear from the initial parlour game between man and woman, and likewise between human and computer, or it would beat the purpose of interrogation:

The object of the game for the interrogator is to determine which of the other two is the man and which is the woman.

How well would men do at pretending to be women? Less than 50/50 odds, I should think, and this may well be why Turing only imagined 70/30 odds, and spoke of how well computers might do at this game, rather than spoke of passing it.

Looks like a test, quacks like a test, but flies like a rock
Not only are the rules for passing completely left up to interpretation, but also the manner in which the game is to be played. Considering that Turing was a man of exact science and that his other arguments in the paper were elaborate to the point of calculating the necessary digital storage space, would he define a scientific test so vaguely? We find the answer in the fact that Turing mainly refers to his proposal as a “game” and “experiment”, but rarely as a “test”. He makes no mention of “passing” and even explains that it is not the point to try it out:

it may be asked, “Why not try the experiment straight away? -” The short answer is that we are not asking whether the computers at present available would do well, but whether there are imaginable computers which would do well.

The pointlessness proved itself in practice: Yes, several chatbots have passed various interpretations of the game, most notably Eugene Goostman convinced 10 of 30 judges in 5-minute interrogations in 2014, and even Cleverbot passed one based on audience vote in 2011. But did an intelligent program ever pass? No. Although nobody can agree on what intelligence is, everybody including the creators do agree that those that passed weren’t intelligent or thinking; They worked mainly through keyword-triggered responses.

Winning isn’t everything
Although Turing did seem to imagine the game as a battle of wits, ultimately its judging criteria is not how “intelligent” an A.I. is, but how “human” it seems. In reality, humans are much more characterised by their flaws, emotions and eccentricities than by their intelligence in conversation, and so a highly intelligent rational A.I. would ironically not do well at this game.

In the end, Turing Tests are behaviouristic assumptions, drawing conclusions from appearances like doctors in medieval times. By the same logic one might conclude that a computer has the flu because it has a high temperature and is making coughing sounds. Obviously this isn’t a satisfying analysis. We could continue to guess whether computers are intelligent due the fact that they can do math, play chess or have conversations, or we could do what everybody does anyway once a computer passes a test: Ask “How does it work?”, then decide for ourselves how intelligent we find that process. No question could be more scientific or more insightful.

So, where does that leave “The Turing Test” when it was never an adequate test of intelligence, nor meant to be? Personally I think Turing Tests are still suitable to demonstrate the progression of conversational skills, a challenge becoming more important with the rise of social robots. And it is important that the public stay informed to settle increasing unrest about artificial intelligence. Other than that, I think it is time to lay the interpretations to rest and continue building A.I. that Alan Turing could only dream of.
In ending, more than any technical detail, I ask you to consider Turing’s hopes:

Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.