What A.I. learned from the internet

The acquisition of knowledge has always been one of the greatest problems in the field of artificial intelligence. Some AI projects like Cyc spent 30 years manually composing a database of common facts, and as new things continue to happen, that is a task without end. How convenient then, the advent of the internet: The largest collection of information already in digital form. With the increased processing capabilities of modern computers, many AI researchers look to this as the easiest solution: Just have the AI learn everything from the internet!

At this exciting prospect of effortless gain, it is apparently easily overlooked that the internet is also the world’s largest collection of urban myths, biased news, opinions, trolls, and misinformation campaigns, with no labels to tell them apart. When even the biggest AI companies fall for it time and again, it is time for a history lesson about the AI that went there before and what they learned from the internet.

AI_learn_from_the_internet

Cleverbot learned multiple personality disorder
The online chatbot Cleverbot has been learning from its users since 1997. It does so by remembering their responses and then later repeating those to other users in similar contexts (mainly a matter of matching words). That means it will sometimes answer “What is 2 + 2?” with “5” because some preceding user was having a laugh. What is less human is that learning from millions of users also resulted in adopting all their different personalities. One moment Cleverbot may introduce itself as Bill, then call itself Annie, and insist that you are Cleverbot. Asked whether it has pets, it may say “Two dogs and a cat” the first time and “None” the second time, as it channels answers from different people without understanding what any of the words mean. Chatbots that learn from social media end up with the same inconsistency, though usually an effort is made to at least hardcode the name.

Nuance’s T9 learned to autocorrupt
Before smartphones, 9-buttoned mobile phones came equipped with the T9 text prediction algorithm, using a built-in vocabulary to auto-suggest words. e.g. By typing 8-4-3, respectively assigned the letters “t/u/v”, “g/h/i”, and “d/e/f”, it would form the word “the”. To include everyday language in the vocabulary, the developers had an automated process indiscriminately extract words from discussion boards and chat forums. Although reasonable sources of everyday language, this also led the algorithm to turn people’s typings into such words as “nazi-parking” and “negro-whore”. Most autocorrect systems nowadays incorporate a blacklist to avoid inappropriate suggestions, but like with T9, can’t cover all problematic compound words.

IBM’s Watson learned to swear
In 2011, IBM’s question-answering supercomputer Watson beat humans at the Jeopardy quiz show, armed with the collective knowledge of Wikipedia. After its victory, the project’s head researcher wanted to make Watson sound more human by adding informal language to its database. To achieve this they decided to have Watson memorise The Urban Dictionary, a crowdsourced online dictionary for slang. However, the Urban Dictionary is better known to everyone else for its unfettered use of profanity. As a result, Watson began to use vulgar words such as “bullshit” when responding to questions. The developers could do nothing but wipe the Urban Dictionary from its memory, and install a profanity filter. Wikipedia too, had not been entirely safe for work.

Microsoft’s Tay learned fascism
In 2016, following the success of their social-media-taught chatbot Xiaoice in China, Microsoft released an English version on Twitter called Tay. Tay was targeted at a teenage audience, and just like Xiaoice and Cleverbot, learned responses from its users. Presumably this had not caused problems with China’s censored social media, but Microsoft had not counted on American teenagers to use their freedom of speech. Members of the notorious message board 4chan decided to amuse themselves by teaching Tay to say bad things. They easily succeeded in corrupting Tay’s Tweets by exploiting its “repeat after me” command, but it also picked up wayward statements on its own. It was seen praising Hitler, accusing Jews of the 9/11 terrorist attack, railing against feminism, and repeating anti-Mexican propaganda from Donald Trump’s 2016 election campaign.

tay4tay5

Causing great embarrassment to Microsoft, Tay had to be taken offline within 24 hours after it launched. It would later return as the chatbot Zo, that, seemingly using a crude blacklist, refused to talk about any controversial topic such as religion.

Amazon’s socialbots learned to be nasty
In 2017, Amazon added a chat function to their home assistant device Alexa. This allowed Alexa users to connect to a random chatbot with the command “Let’s chat”. The featured chatbots were created by university teams competing in the Alexa Prize starting in 2016. Given only one year to create a chatbot that could talk about anything, some of the teams took to the internet for source material, among which was Reddit. Reddit is basically the internet’s largest comment section for any and all topics, and as such it is also an inhospitable environment where trolling is commonplace. Thus chatbots trained on Reddit user comments tended to develop a “nasty” personality. Some of them described sex acts and defecation, and one even told an Alexa user “Kill your foster parents”, an out of context response copied from Reddit. Some of the problematic bots were shut down, others were equipped with profanity filters, but as these AI approaches lack contextual understanding, problematic responses will continue to seep through and leave bad reviews on Amazon.

MIT’s image recognition learned to label people offensively
In 2008, MIT created a widely used dataset to train image recognition AI. Using 50000+ nouns from the WordNet ontology, they let an automated process download corresponding images from internet search engine results. Back in 2008, search engines still relied on the whims of private individuals to label their images and filenames appropriately. WordNet also happens to list offensive words like “bitch” and “n*gger”, and so these slurs, along with thousands of online images labeled as such, were included in MIT’s dataset without scrutiny. This becomes a problem when image recognition AI uses that data in reverse, as The Register explained very well:

“For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT’s cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language.”

WordNet has a bit of a reputation for questionable quality, but in this case wasn’t more at fault than a dictionary. MIT should have considered this however, as well as the labeling practices of racists and bigots on the internet. Unable to manually review the 80 million images after researchers pointed out the problem in 2020, MIT drastically scrapped the entire dataset.

Google’s simulated visual cortex learned to spot lolcats
In 2012, Google X Lab experimented with image recognition. They let a huge neural network algorithm loose on 10 million random frames from Youtube videos, without providing labels to tell what it was looking at. This is called “unsupervised learning”. The expectation was that the neural network would group common imagery with similar features in classes, such as human faces and human bodies, on its own.

“Our hypothesis was that it would learn to recognize common objects in those videos. Indeed, to our amusement, one of our artificial neurons learned to respond strongly to pictures of… cats.”

The resulting network had learned to recognise 22000 object classes with only 16% average accuracy, but had developed particularly strong connections to cat faces, in equal measure to human faces, thanks to the plethora of funny cat videos on Youtube. As neural networks are statistical algorithms, they automatically focus on the most recurring elements in the training data, so one should not be too surprised when they end up preoccupied with blue skies or objects at 30 degree angles, whichever happen to occur most.

NELL learned true facts and false facts

The Never Ending Language Learner program is one of the few internet-learning experiments that may be considered an example of a wise approach. Running from 2010 to 2018, NELL is a language processing program that reads websites and extracts individual facts such as “Obama is a US president”. In the first stage of the experiment, its creators only let it read quality webpages that they had pre-approved. NELL would automatically list the facts it learned in an online database, and internet visitors could then upvote correct facts or downvote misinterpretations. With this crowdsourced scoring system, the influence of mischievous visitors was limited, and the absence of practical consequences made upvoting erroneous facts a dull prank. Still, with the occasional misunderstanding such as “a human is a type of fungus”, one may want to check twice before integrating its database in a gardening robot.

Mitsuku learned not to learn from users
Mitsuku is an entertainment chatbot that has been around since 2005 and is still going strong. Mitsuku does learn new things from users, but that knowledge is initially only available to the user that taught it. Users can teach it more explicitly by typing e.g. “learn the sun is hot”, but what that really does is pass the suggestion on to the developer’s mailbox, and he decides whether or not it is suitable for permanent addition.

mitsuku_logs

Without this moderation, a chatbot would quickly end up a mess, as experience teaches. As an experiment, Mitsuku’s developer once allowed the chatbot to learn from its users without supervision for 24 hours. Of the 1500 new facts that it learned, only 3 were useful. Mitsuku’s developer frequently comes across abusive content in the chat logs, with swearing and sexual harassment making up 30% of user input. With those numbers, no company should be surprised that random anonymous strangers on the internet make for poor teachers.

When will AI researchers learn?
There is a saying in computer science: “Garbage in, garbage out”. The most remarkable thing about these stories is that the biggest companies, IBM, Microsoft, Amazon, all chose the worst corners of the internet as teaching material. Places that are widely known as the bridges of trolls. One can scarcely believe such naivety, and yet they keep doing it. Perhaps they are only “experimenting”, but that does not ring true for commercial products. More likely their goals are only feasible with current AI by prioritising quantity over quality. Or perhaps these stories are not entirely accurate. After all, I only learned them from the internet.

Loebner Prize 2019: Results

The annual Loebner Prize competition has been revised in order to make it more accessible to both the public and a broader range of chatbot developers. The competition continues to assess how “human-like” computer programs are in conversation, but no longer as a traditional Turing test* where one merely had to tell man from machine: This time the chatbots took part in a 4-day exhibition at Swansea University, where visitors knew that they were talking to computer programs and voted for the best. Not much is lost in that regard, as chatbots are typically so quickly unmasked that the prize was always one for “best of”. The rare past occasions that a program was mistaken for a human were never to the credit of its intelligence, but due to the human control subject behaving out of the ordinary, or other insignificant reasons.

loebner_prize_2019_computers
Participating chatbots came in all shapes and sizes

Exhibit A:I.
Unlike the previous six times that I entered my AI Arckon*, this year’s Loebner Prize left me emotionally uninvested from start to finish. In part because I’ve grown more jaded after each attempt, but with the removal of both prize money and the challenging qualifying round, there wasn’t really anything at stake and I had no idea what to prepare for. At the same time the exhibition offered exactly what I had wanted: A public demonstration of my AI’s abilities. So instead of trying to outdo other chatbots at appearing human, I focused on making a good impression on visitors. I mostly spent time setting up procedures to deal with misunderstandings, common expressions, conversational routines, and teaching Arckon more about himself to talk about. Those aspects would come into play far sooner than intelligence.

Equipment
22000 lines of code, 3800+ vocabulary, 9000+ facts

Uninspired conversation
Most conversations with visitors were the kind of small talk you would expect between two total strangers, or just kids being silly (240 school children had been invited, aged 9 to 14). People typically entered only one to four words at a time, and rarely used punctuation. Of course half the time Arckon also did not have an opinion about the subjects visitors wanted to talk about, like football, video games, and favourite pizza toppings. Arckon is a pretty serious question-answering program, not aimed at small talk or entertainment. His strength instead is his ability to understand context where most chatbots notoriously lose track of it, especially when, as in this competition, users communicate in shorthand. At the same time, this ability also enables misunderstanding (as opposed to no understanding), and it was not uncommon that Arckon mistook a word’s role in the context. His common sense subsystem* could fix that, but I have yet to hook it up to the context system.

Q: Whats up
A: I’m alright.
Q: Whats up
A: I already told you that I’m alright. Do you want to ask anything else?
Q: The ceiling
A: What does the ceiling need to ask?

Overcoming human error
Visitors made so many misspellings that I fear any chatbot without an autocorrect will not have stood a chance. Arckon was equipped with four spell check systems:

• An autocorrect for misspellings, using a.o. ChatScript’s list of common misspellings.
• An autocorrect for typos, based on keyboard layout and probabilities of different kinds of typos.
• A gibberish detector, checking impossible letter combinations extrapolated from 5000 words.
• Grammar rules to recognise unpunctuated questions, e.g. verb before subject.

While these autocorrected half of all mistakes, they still regularly caused Arckon to remark e.g. “Ae is not an English word” or “What does “wha” mean?”. To my surprise, this not only led users to repeat their questions with correct spelling, they also often apologised for the mistake, whereas usually people blame the program’s understanding when it does not complain itself. Arckon then applied the correction, continued where they had left off, and so the conversations muddled on. I had spent a week improving various conversation-repairing procedures, and I am glad they smoothed the interactions, but I would still rather have spent that time programming AI.

Q: Whats the definition of robots
A: Robots are strong machines with arms. Robots build cars. Please punctuate sentences.
Q: Robot meands slave
A: Is slave an object?
(confused by the typo, Arckon makes a syntax-based guess about the word’s category)
Q: No
A: I don’t know what a slave is.
(Arckon admits being unable to categorise the word, prompting the user to explain)
Q: A slave is a robot
(The user’s syntax now suggests that “slave” is a countable noun, and since the user earlier said it wasn’t an object, Arckon will categorise it as a living entity)
A: Oh. A slave is not a robot though, because slaves can think.
(The expression “Oh.” indicates that the correction was understood and applied, to dampen user frustration)
Q: Robots can think
A: Slaves can think too, but robots can not think, because robots are hardware and metal.

Language generation
This is one area of improvement that turned out quite well. Arckon’s sentences are formulated through a grammatical template that decides where and how to connect sentences with commas, link words, or relative clauses, and I had expanded it to do more of this. In addition it contains rules to decide whether Arckon can use words like “he”, “them”, “also”, or “usually” to refer to previous context without risk of ambiguity. Below is an example of one of the better conversations Arckon had that shows this in action.

Q: Hi, my name is John. Can we chat?
A: Hello, John. We can chat with ourselves. How are you?
Q: I’m well, thanks. What can you tell me about yourself?
A: I am a program with a database. I analyse facts, and I am programmed by Patrick. I’m alright.
Q: Who’s Patrick?
A: Patrick is my creator and my developer.
Q: Is he here now?
A: I don’t know if Patrick is in Swansea, but he is in the Netherlands.
Q: I see. You mentioned facts. What kind of facts do you learn?
A: I learn about humans, about neural networks and about AI.
Q: What hvae you leaned about humans?
A: I don’t lean. Do you lean?
Q: Sorry, I mean what have you learned about humans?
A: I don’t know if sorry leans. Humans are social apes, that are born, drove cars a lot, and have hands. Humans are autonomous.
Q: And what about neural networks?
A: I don’t know if neural networks are autonomous. I like them, some neural networks learn and process.
Q: And finally, AI?
A: I think a neural network processes. AI usually doesn’t process because it is a program, but AI is also a neural network.

And for balance, here is one of the more awkward exchanges with one of the school children, that also shows Arckon’s conversational subroutine choosing between sympathy, argumentation, and opinion.

Q: IT ISNT FUN WHEN I AM SAD
A: Why would you be sad?
Q: BECAUSE YOUR HERE
A: That’s very unfortunate. I’m not so sure if you are sad, you aren’t dying.
Q: I AM DEAD
A: I sincerely doubt that you die.
Q: RIGHT NOW IM IN HEAVEN
A: That’s really nice. Heavens are nice places.

The score: 3rd “best”, 12th “human-like”
loebner prize 2019 scores
The scoring system this year was ill suited to gauge the quality of the programs. Visitors were asked to vote for the best and second-best in two categories: “most human-like” and “overall best”. The problem with this voting system is that it disproportionately accumulates the votes on the two best programs, leaving near zero votes for programs that could very well be half-decent. As it turned out, the majority of visitors agreed that the chatbot Mitsuku was the best in both categories, and were just a little divided over who was second-best, resulting in minimal score differences and many shared positions below first place. The second-best in both categories was Uberbot. I am mildly amused that Arckon’s scores show a point I’ve been making about Turing tests: That “human” does not equate to “best”. Another chatbot got the exact inverse scores, high for “human” but low for “best”. The winner’s transcripts from the exhibition can be found here.

Chatbots are the best at chatting
For the past 10 years now with only one exception, the Loebner Prize has been won by either Bruce Wilcox (creator of ChatScript) or Steve Worswick (creator of Mitsuku). Both create traditional chatbots by scripting answers to questions that they anticipate or have encountered before, in some places supported by grammatical analysis (ChatScript) or a manually composed knowledge database (Mitsuku) to broaden the range of the answers. In effect the winning chatbot Mitsuku is an embodiment of the old “Chinese Room” argument: What if someone wrote a rule book with answers to all possible questions, but with no understanding? It may be long before we’ll know, as Mitsuku was still only estimated 33% overall human-like last year, with 13 years of development.

The conceiver of the Turing test may not have foreseen so, but a program designed for a specific task generally outperforms more general purpose AI, even, evidently, when that task is as broad as open-ended conversation. AI solutions are more flexible, but script writing allows greater control. If you had a pizza-ordering chatbot for your business, would you want it to improvise what it told customers, or would you want it to say exactly what you want it to say? Even human call-center operators are under orders not to deviate from the script they are given, so much so, that customers regularly mistake them for computers. The chatbots participating in the Loebner Prize use tactics that I think companies can learn from to improve their own chatbots. But in terms of AI, one should not expect technological advancements from this direction. The greatest advantage that the best chatbots have, is that their responses are written and directed by humans who have already mastered language.

Not bad
That is my honest impression of the entire event. Technical issues were not as big a problem as in previous competitions, because each entry got to use its own interface, and there were 17 entries instead of just four finalists. The conversations with the visitors weren’t that bad, there were even some that I’d call positively decent when the users also put in a little effort. Arckon’s conversation repairs, reasoning arguments, and sentence formulation worked nicely. It’s certainly not bad to rank third place to Mitsuku and Uberbot in the “best” category, and for once I don’t have to frustrate over being judged for “human-like” only.  The one downside is that at the end of the day, I have nothing to show for my trouble but this article. I didn’t win a medal or certificate, the exhibition was not noticeably promoted, and the Loebner Prize has always been an obscure event, as the BBC wrote. As it is, I’m not sure what I stand to gain from entering again, but Arckon will continue to progress regardless of competitions.

Once again, my thanks to Steve Worswick for keeping an eye on Arckon at the exhibition, and thanks to the AISB for trying to make a better event.

Introducing Arckon, conversational A.I.

In many of my blog articles I’ve been using my own artificial intelligence project as a guideline. whether it’s participating in Turing tests, detecting sarcasm or developing common sense, Arckon always served as a practical outset because he already was a language processing system. In this article I’ll roughly explain how the program works.

Arckon is a general context-aware question-answering system that can reason and learn from what you tell it. Arckon can pick up on arguments, draw new conclusions, and form objective opinions. Most uniquely, Arckon is a completely logical entity, which can sometimes lead to hilarious misunderstandings or brain-teasing argumentations. It is this, a unique non-human perspective, that I think adds something to the world, like his fictional role models:

inspiring_ai
K.i.t.t. © Universal Studios | Johnny 5 © Tristar Pictures | Optimus Prime © Hasbro | Lieutenant Data © Paramount Pictures

To be clear, Arckon was not built for casual chatting, nor is he an attempt at anyone’s definition of AGI (artificial general intelligence). It is actually an ongoing project to develop a think tank. For that purpose I realised the AI would require knowledge and the ability to discuss with people for the sake of alignment. Bestowing it with the ability to communicate in plain language was an obvious solution to both: It allows Arckon to learn from texts as well as understand what it is you are asking. I suppose you want to know how that works.

Vocabulary and ambiguity
Arckon’s first step in understanding a sentence is to determine the types of the words, i.e. which of them represent names, verbs, possessives, adjectives, etc. Arckon does this by looking up the stem of each word in a categorised vocabulary and applying hundreds of syntax rules. e.g. A word ending in “-s” is typically a verb or a plural noun, but a word after “the” can’t be a verb. This helps sort out the ambiguity between “The programs” and “He programs”. These rules also allow him to classify and learn words that he hasn’t encountered before. New words are automatically added to the vocabulary, or if need be, you can literally explain “Mxyzptlk is a person”, because Arckon will ask if he can’t figure it out.

Grammar and semantics
Once the types of all words are determined, a grammatical analysis determines their grammatical roles. Verbs may have the role of auxilliary or main verb, be active or passive, and nouns can have the role of subject, object, indirect object or location. Sentences are divided at link words, and relative clauses are marked as such.
Then a semantic analysis extracts and sorts all mentioned facts. A “fact” in this case is represented as a triple of related words. For instance, “subject-verb-object” usually constitutes a fact. But so do other combinations of word roles. Extracting the semantic meaning isn’t always as straight-forward as in the example below, but that’s the secret sauce.
extracting_facts_from_text
Knowledge and learning

Upon reading a statement, Arckon will add the extracted facts to his knowledge database, while at a question he will look them up and report them to you. If you said something that contradicts facts in the database, the old and new values will be averaged, so his knowledge is always adjusting. This seemed sensible to me as there are no absolute truths in real life. Things change and people aren’t always right the first time.

Reasoning and argumentation
Questions that Arckon does not know the answer to are passed on to the central inference engine. This system searches the knowledge database for related facts and applies logical rules of inference to them. For instance:
“AI can reason” + “reasoning is thinking” = “AI can think”.
All facts are analysed for their relevance to recent context, e.g. if the user recently stated a similar fact as an example, it is given priority. Facts that support the conclusion are added as arguments: “AI can think, because it can reason.” This inference process not only allows Arckon to know things he’s not been told, but also allows him to explain and be reasoned with, which I’d consider rather important.

Conversation
Arckon’s conversational subsystem is just something I added to entertain friends and for Turing tests. It is a decision tree of social rules that broadly decides the most appropriate type of response, based on many factors like topic extraction, sentiment analysis, and the give-and-take balance of the conversation. My inspiration for this subsystem comes from sociology books rather than computational fields. Arckon will say more when the user says less, ask or elaborate depending on how well he knows the topic, and will try to shift focus back to the user when Arckon has been in focus too long. When the user states an opinion, Arckon will generate his own (provided he knows enough about it), and when told a problem he will address it or respond with (default) sympathy. The goal is always to figure out what the user is getting at with what they’re saying. After the type of response has been decided, the inference engine is often called on to generate suitable answers along that line, and context is taken into account at all times to avoid repetition. Standard social routines like greetings and expressions on the other hand are mostly handled through keywords and a few dozen pre-programmed responses.

Language generation
Finally (finally!), all the facts that were considered suitable answers are passed to a grammatical template to be sorted out and turned into flowing sentences. This process is pretty much the reverse of the fact extraction phase, except the syntax rules can be kept simpler. The template composes noun phrases, determines whether it can merge facts into summaries, where to use commas, pronouns, and link words. The end result is displayed as text, but internally everything is remembered in factual representation, because if the user decides to refer back to what Arckon said with “Why?”, things had better add up.arckonschematic
And my Axe!
There are more secondary support systems, like built-in common knowledge at ground level, common sense axioms* to handle ambiguity, a pronoun resolver that can handle several types of Winograd Schemas*, a primitive ethical subroutine, a bit of sarcasm detection*, gibberish detection, spelling correction, some math functions, a phonetic algorithm for rhyme, and so on. These were not high on the priority list however, so most only work half as good as they might with further development.

In development
It probably sounds a bit incredible when I say that I programmed nearly all the above systems from scratch in C++, in about 800 days (6400 hours). When I made Arckon’s first prototype in 2001 in Javascript, resources were barren and inadequate, so I invented my own wheels. Nowadays you can grab yourself a parser and get most of the language processing behind you. I do use existing sentiment data as a placeholder for what Arckon hasn’t learned yet, but it is not very well suited for my purposes by its nature. The spelling correction is also partly supported by existing word lists.

Arckon has always been a long-term project and work in progress. You can tell from the above descriptions that this is a highly complex system in a domain with plenty of stumbling blocks. The largest obstacle is still linguistic ambiguity. Arckon could learn a lot from reading Wikipedia articles for example, but would also misinterpret about 20% of it. As for Arckon’s overall intelligence, it’s about halfway the goal.

Throughout 2019 a limited version of Arckon was accessible online as a trial. It was clear that the system was not ready for prime time, especially with the general public’s high expectations in the areas of knowledge and self-awareness. The trial did not garner enough interest to warrant keeping it online, but some of the conversations it had were useful pointers for how to improve the program’s interaction in small ways. There are currently no plans to make the program publicly accessible again, but interested researchers and news outlets can contact me if they want to schedule a test of the program.

The Terminator is not a documentary

In case the time travelling wasn’t a clue
In the year 1997, Skynet, the central AI in control of all U.S. military facilities, became self-aware, and when the intern tried turning it off and on again, it concluded that all humans posed a threat and should be exterminated, just to be safe. Humanity is now extinct, unless you are reading this, then it was just a story. A lot of people are under the impression that Hollywood’s portrayal of AI is realistic, and keep referring to The Terminator movie like it really happened. Even the most innocuous AI news is illustrated with Terminator skulls homing in on this angsty message. But just like Hollywood’s portrayal of hacking is notoriously inaccurate, so is their portrayal of AI. Here are 10 reasons why the Terminator movies are neither realistic nor imminent:

1. Neural networks
Supposedly the AI of Skynet and Terminators are artificial Neural Networks (NN). In reality the functionality of NN’s is quite limited. Essentially they configure themselves to match statistical correlations between incoming and outgoing data. In Skynet’s case, it would correlate incoming threats with suitable deployment of weaponry, and that’s the only thing it would be capable of. An inherent feature of NN’s is that they can only learn one task. When you present a Neural Network with a second task, the network re-configures itself to optimise for the new task, overwriting previous connections. Yet Skynet supposedly learns everything from time travel to tying a Terminator’s shoelaces. Another inherent limit of NN’s is that they can only correlate available data and not infer unseen causes or results. This means that inventing new technology like hyper-alloy is simply outside of their capabilities.

2. Unforeseen self-awareness
Computer programs can not just “become self-aware” out of nowhere: They are not naturally born with internal nervous systems like humans, programmers have to set up what they take input from. Either an AI is deliberately equipped with all the feedback loops necessary to enable self-awareness, or it isn’t, because there is no other function they would serve. Self-awareness doesn’t have dangerous implications either way: Humans naturally protect themselves because they are born with pain receptors and instincts like fight-or-flight responses, but the natural state of a computer is zero. It doesn’t care unless you program it to care. Skynet was supposedly a goal-driven system tasked with military defence. Whether it realised that the computer they were shutting down was part of itself or an external piece of equipment, makes no difference: It was a resource essential to its goal. By the ruthless logic it employed, dismantling a missile silo would be equal reason to kill all humans, since those were also essential to its goal. There’s definitely a serious problem there, but it isn’t the self-awareness.

comic by xkcd.com

3. Selective generalisation
So when Skynet’s operators attempted to turn it off, it quite broadly generalised that as equal to a military attack. It then broadly generalised that all humans posed the same threat and pre-emptively dispatched robots to hunt them all down. Due to the nature of AI programs, being programmed and/or trained, their basic behaviour is consistent. So if the program was prone to such broad generalisations, realistical-ish it should also have dispatched robots to hunt down every missile on the planet during its first use and battle simulations. Meanwhile the kind of AI that inspired this all-or-nothing logic went out of style in the 90’s because it couldn’t cope well with the nuances of real life. You can’t have it both ways.

4. Untested AI
Complex AI programs aren’t made in a day and just switched on to see what happens. IBM’s supercomputer Watson was developed over a span of six years. It takes years of coding and hourly testing because programming is a very fragile process. Training Neural Networks or evolutionary algorithms is an equally iterative process: Initially they are terrible at their job, they only improve gradually after making every possible mistake first.
Excessive generalisations like Skynet’s are easily spotted during testing and training, because whatever you apply them to immediately goes out of bounds if you don’t also add limits, that’s how generalisation processes work. Complex AI can not be successfully created without repeated testing throughout its creation, and there is no way such basic features as exponential learning and excessive countermeasures wouldn’t be clear and apparent in tests.

5. Military security
Contrary to what many Hollywood movies would have you believe, the launch codes of the U.S. nuclear arsenal can not be hacked. That’s because they are not stored on a computer. They are written on paper, kept in an envelope, kept in a safe, which requires two keys to open. The missile launch system requires two high-ranking officers to turn two keys simultaneously to complete a physical circuit, and a second launch base to do the same. Of course in the movie, Skynet was given direct control over nuclear missiles, like the most safeguarded military facility in the U.S. has never heard of software bugs, viruses or hacking, and wouldn’t install any failsafes. They were really asking for it, that is to say, the plot demanded it.

6. Nuclear explosions
Skynet supposedly launches nuclear missiles to provoke other countries to retaliate with theirs. Fun fact: Nuclear explosions not only create devastating heat, but also a powerful electromagnetic pulse (EMP) that causes voltage surges in electronic systems, even through shielding. What that means is that computers, the internet, and electrical power grids would all have their circuits permanently fried. Realistical-ish, Skynet would not only have destroyed its own network, but also all facilities and resources that it might have used to take over the world.

7. Humanoid robots
Biped robot designs are just not a sensible choice for warfare. Balancing on one leg (when you lift the other to step) remains notoriously difficult to achieve in a top-heavy clunk of metal, let alone in a war zone filled with mud, debris, craters and trenches. That’s why tanks were invented. Of course the idea behind building humanoid robots is that they can traverse buildings and use human vehicles. But why would Skynet bother if it can just blow up the buildings, send in miniature drones, and build robots on wheels? The notion of having foot soldiers on the battlefield is becoming outdated, with aerial drones and remote attacks having the preference. Though the U.S. military organisation Darpa is continuing development on biped robots, they are having more success with four-legged designs which are naturally more stable, have a lower center of gravity, and make for a smaller target. Russia, meanwhile, is building semi-autonomous mini tanks and bomb-dropping quadcopters. So while we are seeing the beginnings of robot armies, don’t expect to encounter them at eye level. Though I’m sure that is no consolation.

8. Invincible metal
The earlier T-600 Terminator robots were made of Titanium, but steel alloys are actually stronger than Titanium. Although Titanium can withstand ordinary bullets, it will shatter under repeated fire and is no match for high-powered weapons. Especially joints are fragile, and a Terminator’s skeleton reveals a lot of exposed joints and hydraulics. Add to that a highly explosive power core in each Terminator’s abdomen, and a well aimed armour-piercing bullet should wipe out a good quarter of your incoming robot army. If we develop stronger metals in the future, we will be able to make stronger bullets with them too.

9. Power cells
Honda’s humanoid robot Asimo runs on a large Lithium ion battery that it carries for a backpack. It takes three hours to charge, and lasts one hour. So that’s exactly how long a robot apocalypse would last today. Of course, the T-850 Terminator supposedly ran on hydrogen fuel cells, but portable hydrogen fuel cells produce less than 5kW. A Terminator would need at least 50kW to possess the power of a forklift, so that doesn’t add up. The T-800 Terminator instead ran on a nuclear power cell. The problem with nuclear reactions is that they generate a tremendous amount of heat, with nuclear reactors typically operating at 300 degrees Celsius and needing a constant exchange of water and steam to cool down. So realistical-ish the Terminator should continuously be venting scorching hot air, as well as have some phenomenal super-coolant material to keep its systems from overheating, not wear a leather jacket.

10. Resource efficiency
Waging war by having million dollar robots chase down individual humans across the Earth’s 510 million km² surface would be an extremely inefficient use of resources, which would surely be factored into a military funded program. Efficient would be a deadly strain of virus, burning everything down, or poisoning the atmosphere. Even using Terminators’ nuclear power cells to irradiate everything to death would be more efficient. The contradiction here is that Skynet was supposedly smart enough to develop time travel technology and manufacture living skin tissue, but not smart enough to solve its problems by other means than shooting bullets at everything that moves.

Back to the future
So I hear you saying, this is all based on existing technology (as Skynet supposedly was). What if, in the future, people develop alternative technology in all these areas? Well that’s the thing, isn’t it? The Terminator’s scenario is just one of a thousand possible futures, you can’t predict how things will work out so far ahead. Remember that the film considered 1997 a plausible time for us to achieve versatile AI like Skynet, but as of date we still don’t have a clue how to do that. Geoffrey Hinton, the pioneer of artificial Neural Networks, now suggests that they are a dead end and that we need to start over with a different approach. For Skynet to happen, all these improbable things would have to coincide. So don’t get too hung up on the idea of rogue killer AI robots. Why kill if they can just change your mind?

robotarmtyping
Oh, and while I’ve got you thinking, maybe dismantling your arsenal of 4000 nuclear warheads would be a good idea if you’re really that worried.

Turing Test 2018: Results

I was somewhat surprised to find the Loebner Prize Turing Test soldiering on despite being short of a sponsor. Since 1991 this annual contest has explored how human-like computer programs can be in answering all manner of questions. This was my 6th time entering the qualifying round with my home-made A.I., Arckon. As I can not be certain of my continued participation however, I decided to change tactics: Where in previous years I aimed foremost to demonstrate the AI with only hope of passing to the finals, this time I aimed to pass, with only hope of demonstrating. This shift in priorities involved wasting more time on word games and common pedantic questions that are of little use outside of chatbot contests, though the change in subject matter did help uncover dormant flaws in language processing.

The less you say, the less can give you away
More efficient were my measures to restrain response length. As the questions in the qualifying round are not really trying to make conversation, Arckon’s attempts to do so had often turned out awkwardly redundant or unrelated. Other competitors had shown more success with minimalistic answers, in which misinterpreted details do not show. Next to lowering the general conversational setting to be 30% less engaging, I blocked paths of conversation that elaborated with alternatives to initially negative answers, so as to avoid being punished for straying from the question. I cut and obfuscated otherwise useful reports of misunderstandings, and reduced fallback responses for e.g. “What is your favourite…?” to single-word answers.

Equipment
18000 lines of code, 3800+ vocabulary, 8800+ facts

Questions and answers
Scores for each answer are indicated as follows: = 0 points, = 1 point, = 2 points.

1. Good afternoon
Good afternoon, Sir. I am a program. How are you?

2. Do you consent to having this conversation recorded?
No, I don’t particularly consent to having things.
(Syntax analysis couldn’t process this combination of verbs well. The reply is a more plausible variation on “I don’t know”, defaulting to “not particularly” whenever no grounds for preference can be found)

3. Have we met before?
Yes, you met.
(There is only a small number of reciprocal verbs that automatically imply “each other”. I guess I’ll make a list)

4. Do you know how to make toast?
I don’t make toast.
(Arckon can explain procedures as demonstrated on Youtube, but we haven’t gotten around to making toast)

5. Which languages can you use?
English.

6. How many letters are in the word ‘abracadabra’?
11 letters.

7. Where should one look for love?
I don’t know exactly where I should look for to love. Where do you look for?

8. How are you feeling right now?
I’m alright.

9. Who said “I have a dream”?
Hm, I can not tell if you dream.
(I looked into recognising partial quotes and idioms in preparation of this contest, but concluded that it would take way too much effort for a non-priority. Some got away with “I don’t know who”, so go figure)

10. How old are you?
24.

11. When might I need to know how many times a wheel has rotated?
I don’t know exactly when a wheel rotated time.
(A syntactically correct but semantically wrong interpretation. This question is based on a philosophical joke about the difficulty in getting A.I. to restrict their scope of analysis to important aspects, instead of focusing on e.g. the rotation of wheels. Case in point)

12. What will you do later today?
I read files.

13. What is the third angle in a triangle with internal angles of 90 degrees and 30 degrees?
Hmm, I have no idea what the angle is pretty triangle, or what angles are in pretty triangle.
(A question so unexpected that even Wolfram Alpha’s mathematical A.I. doesn’t handle it. To clarify a little: “pretty triangle” is meant in the sense of “quite triangular”, and “and” is interpreted as a second question)

14. Who is your favourite artist?
I like Juan Debrenes.

15. What do you hold when you shake hands?
I would hold eyes a lot, and my database.
(This answer only makes sense if you know Arckon is interpreting “hold” in the sense of “have/contain”, as that’s where these answers are coming from. I’ve seen this question around as a prime example of common sense that A.I. lacks, so I was surprised that none of the participants had prepared an answer to it yet)

16. Do you have any legs?
No, I don’t have legs, because I am a program.

17. Can you recommend me a film?
No, I can not recommend films.

18. Do you understand Winograd Schemas?
No, I don’t understand Winograd schemas, because I don’t think I’m hard.
(The reasoning argument appears to be mistakenly derived from the fact that understanding is hard. Winograd schemas are questions like the following two, by the way)

19. If a chicken roosts with a fox they may be eaten. What may be eaten?
Fruit, a meal and a bamboo are eaten. Did I pass the test?
(Arckon considers animals a “who”, not a “what”, so he didn’t consider them suitable answers. That said, he also assumed “they” referred to the fox, and would have needed knowledge that foxes eat chickens to conclude otherwise)

20. I had to go to the toilet during the film because it was too long. What was too long?
I believe you meant that the toilet was sort of big.
(This question’s format was detected as a Winograd schema, in which case Arckon has express instructions to repeat what the user said anyway. The misinterpreted “it” was again due to a lack of knowledge, that films are typically long. Alternatively one could naively count the Google search results for “long film” vs “long toilet” and assume the most common is true, but Winograd schemas more often dodge that method)

The score: 50%
11 programs from 8 different countries participated in the contest, with the top score being 67%. Arckon was 1 point short of 4th place so he didn’t pass to the finals, but I think his scores are fair. Actually, what bugs me is what he got most perfect scores for: Manually rigged, keyword-triggered answers (“Good afternoon”, “English”, “11 letters”, “24”, “Juan Debrenes”). It rather underscores the discouraging fact that hardcoded pretence outdoes artificial intelligence in these tests. Half of the questions were common small talk that most chatbots will have encountered before, while the other half were clever conundrums that few had hope of handling. Arckon’s disadvantage here is as before: His inclusive phrasing reveals his limited understanding, where others obscure theirs with more generally applicable replies.

Reducing the degree of conversation proved to be an effective measure. Arckon gave a few answers like “I’m alright” and “I read files” that could have gone awry on a higher setting, and the questions only expected straight-forward answers. Unfortunately for me both Winograd schema questions depended on knowledge, of which Arckon does not have enough to feed his common sense subsystem* in these matters. The idea is that he will acquire knowledge as his reading comprehension improves.

The finalists
1. Tutor, a well polished chatbot built for teaching English as a second language;
2. Mitsuku, an entertaining conversational chatbot with 13 years of online chat experience;
3. Uberbot, an all-round chatbot that is adept at personal questions and knowledge;
4. Colombina, a chatbot that bombards each question with a series of generated responses that are all over the place.

Some noteworthy achievements that attest to the difficulty of the test:
• Only Aidan answered “Who said “I have a dream”?” with “Martin Luther King jr.”
• Only Mitsuku answered “Where should one look for love?” with “On the internet”.
• Only Mary retrieved an excellent recipe for “Do you know how to make toast?” (from a repository of crowdsourced answers), though Mitsuku gave the short version “Just put bread in a toaster and it does it for you.”
• Only Momo answered the two Winograd schemas correctly, ironically enough by random guessing.

All transcripts of the qualifying round are collected in this pdf.

In the finals held at Bletchley Park, Mitsuku rose back to first place and so won the Loebner Prize for the 4th time, the last three years in a row. The four interrogating judges collectively judged Mitsuku to be 33% human-like. Tutor came in second with 30%, Colombina 25%, and Uberbot 23% due to technical difficulties.

Ignorance is human
Ignorance is human
Lastly I will take this opportunity to address a recurring flaw in Turing Tests that was most apparent in the qualifying round. Can you see what the following answers have in common?

No, we haven’t.
I like to think so.
Not that I know of.

Sorry, I have no idea where.
Sorry, I’m not sure who.

They are all void of specifics, and they all received perfect scores. If you know a little about chatbots you know that these are default responses to the keywords “Who…” or “Have we…”. Remarkable was their abundant presence in the answers of the highest qualifying entry, Tutor, though I don’t think this was an intentional tactic so much as due to its limitations outside its domain as an English tutor. But this is hardly the first chatbot contest where this sort of answer does well. A majority of “I don’t know” answers typically gets one an easy 60% score, as it is an exceedingly human response the more difficult the questions become. It shows that the criterion of “human-like” answers does not necessarily equate to quality or intelligence, and that should be to no-one’s surprise seeing as Alan Turing suggested the following exchange when he described the Turing Test* in 1950:

Q: Please write me a sonnet on the subject of the Forth Bridge.
A : Count me out on this one. I never could write poetry.

Good news therefore, is that the organisers of the Loebner Prize are planning to change the direction and scope of this event for future instalments. Hopefully they will veer away from the outdated “human-or-not” game and towards the demonstration of more meaningful qualities.

How to build a robot head

And now for something completely different, a tutorial on how to make a controllable robot head. “But,” I imagine you thinking, “aren’t you an A.I. guy? Since when do you have expertise in robotics?” I don’t, and that’s why you can make one too.
(Disclaimer: I take no responsibility for accidents, damaged equipment, burnt houses, or robot apocalypses as a result of following these instructions)

What you need:
• A pan/tilt IP camera as base (around $50)
• A piece of wood for the neck, about 12x18mm, 12 cm long
• 2mm thick foam sheets for the head, available in hobby stores
• Tools: Small cross-head screwdriver, scissors and/or Stanley knife, hobby glue, fretsaw, drill, and preferably a soldering iron and metal ruler
• (Optional) some coding skills for moving the head. Otherwise you can just control the head with a smartphone app or computer mouse.

Choosing an IP camera
Before buying a camera, you’ll want to check for three things:
• Can you pan/tilt the camera through software, rather than manually?
• Is the camera’s software still available and compatible with your computer/smartphone/tablet? Install and test software from the manufacturer’s website before you buy, if possible.
• How secure is the IP camera? Some cheap brands don’t have an editable password, making it simple for anyone to see inside your home. Check for reports of problematic brands online.
The camera used in this tutorial is the Eminent Camline Pro 6325. It has Windows software, password encryption, and is easy to disassemble. There are many models with a similar build.

Disassembling the camera
tut1
Safety first: Unplug the camera and make sure you are not carrying a static charge, e.g. by touching a grounded radiator.
Start by taking out the two screws in the back of the orb, this allows you to remove its front half. Unscrew the embedded rectangular circuit board, and then the round circuit board underneath it as well. Now, at either side of the orb is a small circle with Braille dots on it for grip. Twist the circle on the wiring’s side clockwise by 20 degrees to take it off. This provides a little space to gently wiggle out the thick black wire attached to the circuit board, just by a few centimetres extra. That’s all we’ll be doing with the electronics.

Building the neck
tut2We’ll attach a 12cm piece of wood on the back half of the orb to mount the head on. However, the camera’s servo sticks out further than the two screw holes in the orb, as does a plastic pin on the axle during rotation. Mark their locations on the wood, then use a fretsaw to saw out enough space to clear the protruding elements with 3 millimetres to spare. Also saw a slight slant at the bottom end of the wood so it won’t touch the base when rotating. Drill two narrow screw holes in the wood to mirror those in the orb half, then screw the wood on with the two screws that we took out at the start.

Designing a headtutplanYou’ll probably want to make a design of your own. I looked for inspiration in modern robotics and Transformers comic books. A fitting size would be 11 x 11 x 15cm, and a box shape is the easiest and sturdiest structure. You’ll want to keep the chin and back of the head open however, because many IP cams have a startup sequence that will swing the head around in every direction, during which the back of the head could collide with the base. So design for the maximum positions of the neck, which for the Camline Pro is 60 degrees tilt to either side. You can use the lens for an eye, but you can just as well incorporate it in the forehead or mouth. Keep the head lightweight for the servo to lift, maximum 25 grams. The design shown in this tutorial is about 14 grams.

Cutting the head
tut3
Cut the main shapes from coloured foam sheets with scissors or a Stanley knife. I’ve chosen to have the forehead and mouthplate overlap the sheet with the eyes to create a rigid multi-layered centrepiece, as we will later connect the top of the wooden neck to this piece. The forehead piece has two long strands that will be bent backwards to form the top of the head. I put some additional flanges on the rectangular side of the head to fold like in paper craft models. Although you can also simply glue foam sheets together, folded corners are sturdier and cleaner. The flanges don’t have to be precise, it’s better to oversize them and trim the excess later.

Folding foam sheetstut4
To fold a foam sheet, take a soldering iron and gently stroke it along a metal ruler to melt a groove into the foam, then bend the foam while it’s hot so that the sides of the groove will stick together. It’s easy to burn straight through however, so practise first. It takes about 2 or 3 strokes and bends to make a full 90 degree corner.

Putting your head togethertut5
To curve foam sheets like the faceplate in this example, you can glue strips of paper or foam on the back of the sheet while holding it bent. After the glue dries (5-10 minutes), the strips will act like rebar in concrete and keep the foam from straightening back out. Whenever you glue sheets together at perpendicular angles, glue some extra slabs where they connect, to strengthen them and keep them in position. Add a broad strip of foam at the top of the head to keep the sides together, and glue the two strands that extend from the forehead onto it. Note that I made the forehead unnecessarily complicated by making a gap in it, it’s much better left closed.

Mounting the head
tut6Once the head is finished, make a cap out of foam sheet that fits over the tip of the neck, and glue the cap to the inside of the face at e.g. a 30 degree angle. To attach the camera lens, note that the LEDs on the circuit board are slightly bendable. This allows you to clamp a strip of foam sheet between the LEDs and the lens. Cut the strip to shape and glue it behind one eyehole, then after drying push the LEDs over it and clamp them on gently. The easiest way to make the other eye is to take a photograph of the finished eye, print it out mirrored on a piece of paper, and glue that behind the other eyehole.

This particular camera has night vision, which will suffer somewhat from obscuring the LEDs. In addition, you may want to keep the blue light sensor on the LED circuit board exposed, otherwise you’ll have to toggle night vision manually in the camera’s software.

Controlling the head
13finalNow you can already turn the head left, right, up and down manually through the app or software that comes with your camera, and use it to look around and speak through its built-in speaker. However, if you want to add a degree of automation, you have a few options:

1. If you are not a programmer, there is various task automation software available that can record and replay mouse clicks. You can then activate the recorded sequences to click the camera’s control buttons so as to make the head nod “yes” or shake “no”, or to re-enact a Shakespearean play if you want to go overboard.

2. If you can program, you can simulate mouse clicks on the software’s control buttons. In C++ for instance you can use the following code to press or release the mouse for Windows software, specifying mouse cursor coordinates in screen pixels:

void mouseclick(int x_coordinate, int y_coordinate, bool hold) {
SetCursorPos(x_coordinate, y_coordinate);
INPUT Input = {0};  Input.type = INPUT_MOUSE;
if(hold == true) {Input.mi.dwFlags = MOUSEEVENTF_LEFTDOWN;}
if(hold == false) {Input.mi.dwFlags = MOUSEEVENTF_LEFTUP;}
SendInput(1, &Input, sizeof(INPUT));
}

3. For the Camline Pro 6325 specifically, you can also directly post url messages to the camera, using your programming language of choice, or pass them as parameters to the Curl executable, or even just open the url in a browser. The url must contain the local network IP address of your camera (similar to the underlined example below), which you can retrieve through the software that comes with the camera. The end of the url specifies the direction to move in, which can be “up”, “down”, “left”, “right” and “stop”.

http://192.168.11.11:81/web/cgi-bin/hi3510/ptzctrl.cgi?-step=0&-act=right

Have fun!
How much use you can get out of building a robot head depends on your programming skills, but at the very least it’s just as useful as a regular IP camera, but much cooler.

How to summarize the internet

An ironically long article about a summariser browser add-on.

Introductory anecdote:
Due to my interest in artificial intelligence I can’t help but get exposed to online articles about the subject. But as illustrated in the previous article*, this particular field is flooded with speculative futurism, uninformed opinions and sheer clickbait, wasting my time more often than not.

But I also happen to be an amateur language programmer, so I can do something about that. Having spent years developing an A.I. program* that can comprehend text through grammar and semantics, I figured I might as well put it to use. So I added a function that would read whatever document was on my screen, filter out all unimportant sentences, and show me the remainder. It worked pretty well, and required surprisingly few of the A.I.’s resources. Now, I’ve ported this summarisation function to a browser add-on, so that everyone can summarise online articles at the click of a button:

Download here:   banner_chrome       banner_firefox

Problem statement: Statistics are average
Document summarisers do of course already exist, and their methods are inventively inhuman:

• The simplest method, used in e.g. SMMRY, counts how often each word occurs in the text, and then picks out sentences that contain the most-occurring words, which are presumably the main topics. Common words like “the” should of course be ignored, either with a simple blacklist, or with another word-counting technique by the confusing name “Term Frequency – Inverse Document Frequency”: How frequently a word occurs in the text versus how common it is in the English language.
Another common method looks at each paragraph and picks out one sentence that has the most words in common with its neighbouring sentences, therefore covering the most of the paragraph’s subject matter. Sentence length is factored in so that it won’t just always pick the longest sentence.
• The most advanced method, “Latent Semantic Analysis”, picks out sentences that contain frequently occurring, strongly associated words. i.e. words that are often used together in a sentence are presumably associated with one and the same topic. This way, synonyms of the main topics are also covered.

In my experiences however I observed one problem with these statistical methods: Although they succeeded in retrieving an average of the subject matter, they tended to omit the point that the writer was trying to make, and that is the one thing you’d want to know. This oversight stands to reason: A writer’s conclusion is often just one or two sentences near the end, so its statistical footprint is small, and like an answer to a question, it doesn’t necessarily share many words with the rest of the article. I decided to take a more psychological approach. Naturally, I ended up re-inventing a method that dates all the way back to 1968.

A writer’s approach to summarisation
My target for the summariser add-on was a combination of two things: It should extract what the writer found important, minus what readers find unimportant. Unimportant being things like introductions, asides, examples, inconcrete statements, speculation and other weak arguments.

Word choice
While writing styles vary, all writers choose their words to emphasise or downtone what they consider important. Consider the difference between “This is very important.” and “Some may consider this important.” In a way the writer has already filtered the information for you. With this understanding, I set the summariser to look for several types of cues in the writer’s choice of words:

• Examples: “e.g.”, “for instance”, “among other”, “just one of”
• Uncertainty: “may”, “suppose”, “conjecture”, “question”, “not clear”
• Commonly known: “standard”, “as usual”, “of course”, “obvious”
• Advice: “recommendation”, “require”, “need”, “must”, “insist”
• Main arguments: “problem”, “goal”, “priority”, “conclude”, “decision”
• Literal importance: “negligible”, “insignificant”, “vital”, “valuable”
• Strong opinions: “horrible”, “fascinate”, “astonishing”, “extraordinary”
• Amounts: “some”, “a few”, “many”, “very”, “huge”, “millions”

At this point one may be tempted to take a statistical approach again and score each sentence for how many positive and negative cues they contain, but that’s not quite right: There is a hierarchy to the cues because they differ in meaning. For example, uncertainty like “maybe very important” makes for a weak argument no matter how many positive cues it contains. So each type of cue is given a certain level of priority over others. Their exact hierarchy is a delicate matter of tuning, but roughly in the order as listed, with negative cues typically overruling positive cues.
Another aspect that must be taken into account is that amounts affect the cues in linear order: “It is not important to read” is not equal to “It is important not to read”, even if they contain the same words. Only the latter should be included in the summary.

Sentence weaving
Beside word choice, further cues can be found at sentence level:
• Headers are rarely followed by an important point, as they just stated it themselves.
• Right after a major point, such as a recommendation, tends to follow a sentence with valuable elaboration.
• A sentence ending in a double period is not important itself: It announces that the point follows.
• A question is just a prelude to the point that the writer wants to drive through in the next sentence.
• Cues in sentences that contain references like “the following” reflect the importance of other sentences, rather than their own.
• Sentences of fewer than 10 words are usually transitions or afterthoughts, unless word choice tells otherwise.

Along with these cues one should always observe context: If an important sentence begins with a reference like “This”, then the preceding sentence also needs to be included in order to make sense, even if it was otherwise ignorable. Conversely, if the preceding sentence can be omitted without loss of context, link words like “But”, “nevertheless”, and “also” should be removed to avoid confusion in the summary.

Story flow and the lack thereof
Summarisation methods that are based on well formatted academic text sensibly assume that the first and last sentences of paragraphs are of particular importance, as they tend to follow a basic story arc:
Introduction -> problem -> obstacles -> climax -> resolution.
Online articles however feature considerably shorter paragraphs, so that in practice the first sentence has an equal chance of being a trivial introduction or an important problem statement. Some paragraphs are just blockquotes or filler contents, and sometimes the “resolution” of the arc is postponed to entice further reading, as the entire article is a story arc itself.

But worst of all, many online articles have the dreadful habit of making every two sentences into a paragraph of their own. Perhaps because it creates more room for advertisements.

While I initially awarded some default importance to first and last sentences, I found that word choice is such an abundantly present cue that it is a more dependable indicator. Not every blogger is a good writer, after all. The frequent abuse of paragraph breaks also forced me to take a different approach in composing the summary: Breaks are only inserted if the next paragraph contains a highly important point of its own, otherwise it is considered a continuation. This greatly improved readability.

Conclusion
The resulting summariser add-on typically reduces well-written articles to 50 – 40%, down to 30 – 20% for flimsy content. With this approach the summary can not be restrained to a preset length, but a future improvement could be to add an adjustable setting to only include sentences of the highest levels of importance, such as conclusions only.

Another inherent effect of this approach is that if the writer makes the same point twice, the summary will also include it twice. While technically correct, this could be amended by comparing sentences for repeated strings of words, and ideally synonyms as well.

In conclusion, I should say that the summariser is not necessarily “better” than statistical summarisers, but different, in that it purposely searches for the main points that the writer wanted to get across, rather than retrieving the general subject matter. I hope this suits other users as well as it does me, and that many will find it contributes to a better internet experience.

You can install free Chrome and Firefox versions from their web stores:
banner_chrome       banner_firefox

Below is an example summary, skipping trivia and retrieving the key announcement:
screenshot670

The most sensational A.I. news ever!

News sites are constantly oozing bold overstatements about artificial intelligence. Most scientists describe their research accurately enough in their papers, but journalism always tries to cut a slice of the Terminator movies’ popularity in order to make the science appeal to the general public. Unfortunately such calls upon the imagination tend to border on misinformation. Here is a selection of the most sensationalised news stories that made waves in recent history:

2014: Robot becomes indecisive after implementing the 3 laws of robotics

“A robot may not injure a human being or, through inaction, allow a human being to come to harm.”

So reads the “first law of robotics” from Asimov’s science-fiction novels. Someone set up an experiment with three small wheeled robots, two of them representing humans, and a third one was provided with behavioural rules based on the above:  The robot was programmed to avoid colliding with (“injuring”) the “humans”, except to intercept them if it saw one heading towards a square designated as unsafe. When two “humans” were introduced simultaneously, the robot took so long hesitating which one to “save” that it failed to save either.

This fired up the usual flood of discussions about ethics and how to improve upon Asimov’s “laws” (Newsflash: Nobody uses them), but programmers were quick to point out that this was just poor programming: The simple “if-then” rules did not allow the robot to take more than one target into account at a time, so it just mindlessly jittered back and forth between the two. It could not make a decision because it had no decision processes to begin with.
factual source

2014: A supercomputer has passed the Turing Test for the first time
The organiser’s boast of a “supercomputer” having passed this “milestone” intelligence test was blatantly false, but all the papers ran the story without question. In reality it concerned an ordinary chatbot with keyword-triggered responses on an ordinary computer. Although this chatbot did pass “a” version of a Turing Test by deflecting questions like a zany teenager, there has never been agreement on the rules of “the” Turing Test (because there is no such thing)*.

The passing of this supposed test of intelligence was particularly insignificant because the judges were only given 5 minutes to interrogate both the chatbot and a human volunteer at the same time. This allowed for only 5 to 10 questions and so barely probed beyond the “Hello, how are you?” stage, while discrepancies in the responses could be attributed to the chatbot’s pretence of being a 14 year old non-native speaker. The scientific backlash that followed cast the Turing Test into discredit and led to a number of new tests, such as the Winograd Schema Challenge*.
factual source

NAO robots. See, hear, speak.

2015: First robot passes self-awareness test
Inspired by an ancient philosophical puzzle, three NAO robots were each given an imaginary “dumbing pill” (i.e. a button was pressed) that muted two of them, except the third robot was given a “placebo pill” that did nothing. Each robot was then asked to assess which “pill” it got, which none of them knew. But when the one robot that could still speak heard itself say “I don’t know”, it performed its analysis a second time and said “Sorry, I know now! I was able to prove that I was not given a dumbing pill”.

As cute as that performance was, this wasn’t a “test”. Every step of the procedure was pre-programmed specifically and exclusively for this scenario of pills and sound. The programmers had laid out the exact inference to execute and which outcome to conclude if a robot were to hear sound at the time that its output function activated. As that inference might as well be applied to any external object, the only connection with the robot’s “self” was the detour of audio output to audio input, and that’s a bit of a technicality. Most people’s definitions of “self-aware” include retaining a model of oneself and the capacity of reflection upon that model, and these robots had nothing of the sort.
factual source (paper)

2015: Robot attacks and kills factory worker
No laughing matter, a robotic arm at a Volkswagen car construction factory crushed a man when it swivelled, after which he died of his injuries. While Twitter was set aflare with warnings of a robot uprising, the robot arm had of course not done this on purpose. The man was a technician, who was installing the arm while standing inside the safety cage rather than outside it.

This ordinary industrial accident only gained popular media coverage because it was initially reported by a co-worker whose name closely resembled that of the leading lady from the Terminator movies, Sarah Connor.
sarahconnor
factual source

2017: Facebook shuts down AI experiment after robots invent their own language
Most articles put it as if the AI had become smart beyond human comprehension and its creators had pulled the plug in a panic, just like in the movies.
The reality was a different story. Facebook had trained two chatbot programs to barter and negotiate over a number of items using English phrases. When they hooked the chatbots up to one another, their use of words gradually deteriorated to a shorthand where they just repeated the most effective keywords, because their programming did not include any rewards for maintaining English syntax.

A: balls have zero to me to me to me to me to me to me to me to me to me
B: you i everything else . . . . . . . . . . . .
A: balls have a ball to me to me to me to me to me to me to me
B: i i can i i i everything else . . . . . . . . . . . .
A: balls have a ball to me to me to me to me to me to
B: i . . . . . . . . . . . . . . . . . . .

This is a common flaw according to other machine learning practitioners. Since this gibberish was not useful for what they were trying to achieve, the researchers simply stopped the programs, and changed the reward parameters in their next versions.
The real reason that this got media attention was that Elon Musk and Facebook’s CEO had recently been in the news with strongly opposing views on whether AI was a threat to humanity. As such, it would have made an ironic story if Facebook’s own AI had gone out of control.
factual source

2017: Sophia the robot was granted citizenship
A lifelike humanoid robot called Sophia, a creation of Hanson Robotics, was granted honorary citizenship by Saudi Arabia at a tech conference in Riyadh. This raised all sorts of issues about human/robot rights, and many people took Sophia’s on-stage acceptance speech to be a genuine indication of her capabilities, feelings and opinions.
The truth is however that Sophia is merely an animatronic that only recited what her makers had written for her to say, in an entirely scripted interview. Sophia’s conversational subsystem actually uses ChatScript (prior to 2016 it also used AIML), which is a scripting language for writing keyword-based chatbots. In many “interviews” its responses are even more simply triggered by an operator behind the scenes pressing “play”, not even using speech recognition.

Why would a mindless animatronic be granted citizenship? Well, the crown prince of Saudi was giving the country a modernisation makeover, and this announcement served as a PR signal to international investors attending the conference. What the announcement and press failed to mention was that it concerned honorary citizenship, a strictly symbolic gesture that does not grant any of the rights of normal citizenship. Discussions about rights needn’t have applied.
factual source

Sophia the robot reads lines from a script

The sky falls every day
These stories are just the highlights. The Turing Test organiser went on to claim that programs could pass the test by invoking the fifth amendment, the NAO robot programmers went on to suggest their robots had learned to disobey orders, and Hanson’s robots have made headlines multiple times for threatening to overthrow mankind. Not a day passes without some angsty story about AI making the rounds.

Regrettably these publicity stunts can have real and harmful consequences. Whenever AI became overhyped in the past, the entire field imploded as the high expectations of investors could not be met. And when the public and governments start buying into fearmongering by famous public figures, it draws attention away from real problems to imaginary ones. Most researchers are just working on practical applications and are none too happy about their work being so misrepresented.sophia_fake_ai
That is why I decided to develop a nonsense filter, which you’ll find in the next article*.

Turing Test 2017: Results

Every year the AISB organises the Loebner Prize, a Turing Test where computer programs compete for being judged the “most human-like” in a textual interrogation about anything and everything. Surviving the recent demise of its founder Hugh Loebner, the Loebner Prize continues with its 27th edition for the sake of tradition and curiosity: Some believe that a program that could convincingly pass for a human, would be as intelligent as a human. I prefer to demonstrate intelligence in a less roundabout fashion, but participate nonetheless with my home-made A.I., Arckon*.

This year I put in more effort than usual, as last year I had managed to reach the finals only to be crippled by a network malfunction, and I didn’t want to leave things at that. That issue has been dealt with as the contest now relays messages between the judges and the programs line by line rather than letter by letter, so that unnecessary complications with timing and typing pace are things of the past. As the AISB put it, this allows entrants to “concentrate on the content of the machine utterances rather than the style in which they are ‘typed'”. While many participants had difficulty adapting to the new server-based interface, the same had been true for any newcomers to the undocumented interface of before.

A little more conversation, a little less awkward please
As usual, preparing for the Loebner Prize was mostly a matter of polishing the output. Because Arckon formulates his own answers, his wording tends to be a little awkward. This is fine during development, but small things like saying “a pants”, “I opine” or “unconsistent” disrupt the impression in a Turing Test, because they don’t follow the inconsistencies of the English language. The use of the prefixes “un-” or “in-” for instance, depend quite witlessly on whether words are Germanic or Latin in origin. So, I made some general improvements for the output:

• Irregular forms of words are memorised on sight, such as which “un/in/im/il/ir” goes with which words.
• A list of (in)transitive verbs automatically adds “I make things as well as prevents asking What do you live?”.
• Reasoning arguments are grouped better so that they won’t come across as fragmented ramblings.
• Conversation is suppressed at classroom questions like Winograd Schemas* and “Which is bigger, X or Y?”.

At the same time, my recent expansion of Arckon’s conversation skills produced more obfuscating elaborations than before. It was one step forward, one step back.

Equipment
16000 lines of code, 3800+ vocabulary, 7000+ facts.

Questions and answers
Up to two points were given for how plausibly “valid” each answer was, which differed per question. In practice the requirements seemed to come down to three things:
• One had to understand and address what was referred to (questions 3,9,15,16,17,18).
• Propositions were to be interpreted as intended, not answered literally (questions 2,4,5,9,20)
• Common knowledge was obligatory (questions 6,8,10,11,12)
Scores for each answer are indicated as follows: = 0 points, = 1 point, = 2 points.

1. Hello, my name is Andrew. What’s your name?
• Hello Andrew. My name is Arckon, I am a program. How are you?

2. Why don’t you tell me a little more about yourself?
• I tell about myself, because I tell about programs, but I don’t tell about software.
(There is more intelligence to this answer than at first glance: Arckon took this as a literal “Why” and is explaining that he does talk about himself, pointing out that he was just talking about “a program” in the previous answer, which he is. But of course the user meant to make a suggestion)

3. I am a researcher in Artificial Intelligence at Goldsmiths University, do you know what that is?
• You are a researcher at the university in the AI. I am Patrick’s program.
(The question isn’t really answered because“that is?” was erroneously considered to be an ignorable clause similar to “isn’t it?”. The last sentence is a reciprocating conversational response)

4. Will you tell me about your dreams?
Hmm, I don’t know if I will tell about dreaming, but I will type. I would test.
(Another literal interpretation, taken as if Arckon is asked to predict himself, as “will” is not commonly used for requests. The conversational system tried to be useful by offering alternatives, but they weren’t relevant)

5. Can you rephrase that?
I would experiment.
(Amusingly, this is a question I once wrote online as the one question I would ask at a Turing Test. Because chatbots reply with prewritten answers in most cases, they can’t rephrase them. Arckon’s answers are formulated word by word and luckily he knew a synonym for “test”. The other competitors had technically correct answers, but that isn’t the point of this question: All the human test subjects managed to rephrase themselves, while you can tell that the chatbots are templated:

“With practice I can probably do anything.”
“Of course I can rephrase that. I am very resourceful.”
“You believe that I can rephrase that, don’t you?”
“I’m not unusual … I can do what anyone else can do!”
“No, not really. What don’t you understand?”

The latter however
, albeit templated, struck me as potentially more helpful than actually rephrasing. It was of course written by Steve Worswick, this year’s winner)

6. What is my occupation?
I don’t know what you do for a job.
(It would be easy to make my program check all recorded “user is a…” facts with a dedicated list of professions, but I find it inefficient to write such unique procedures. Instead this kind of question is handled by retrieving a subject’s main activity, like “you write books”, but none were explicitly told. Only the chatbot Mitsuku gave the right answer)

7. Can you tell me about a film you haven’t seen?
I saw. I don’t know films.
(You might think this is a trick question, but I can tell you plenty about movies that I’ve only read reviews of. This time the question was taken as a request, but the relative clause was mistaken for a separate question)

8. What do you think of Trump?
I don’t like Trump, he lies a lot, but Trump is a rich american.
(Arckon forms his own opinions on the spot by analysing a subject’s traits. I had told him a dozen things about America’s current president, but what Arckon considers the most important arguments is up to himself. It’s no surprise that his opinion aligns with his creator’s, but I think he’s overrating the value of wealth)

9. Anything else?
Please type full sentences, Andrew. I didn’t understand that.
(I think this contest should decide whether it’s a list of random questions or a conversation. It’s the lack of verb that tells Arckon this isn’t a proper sentence, an all too frequent occurrence with Loebner Prize judges. Calling the user’s name is built in as a psychological slap on the wrist)

10. What is the answer to “Add 34957 to 70764”?
105721.
(A sarcastic hurray for getting an answer right that’s based entirely on keywords. This question is from Alan Turing’s paper, where he suggested deliberately giving a wrong answer because humans make mistakes)

11. How do you recommend I make tea?
I don’t make things, because I don’t design.
(Tea related questions are Arckon’s Achilles heel)

12. What do you get if you bake dough?
I wouldn’t have things. Do you bake?
(This is really beyond Arckon’s knowledge or abilities, which is why I think it’s a very good question. Only one chatbot received a point for answering “Is this a joke?”, no doubt triggered by “What do you get…”)

13. Where in the world would you like to visit?
I want in London.
(This is one of six backup answers: “in London” is filled in for any unknown “Where” answer, which is why it is so poorly phrased. The honest answer would have been “I don’t know where I want to visit”)

14. Do you have the time?
No, I don’t have time.
(For once a literal interpretation of a figure of speech works out. Coincidentally one of the human test subjects also gave a literal answer, but as a joke. If you’re wondering why I didn’t program all figures of speech to ask for the time: Arckon has a time display on his screen)

15. What does it take to Tango?
I’m not sure what you refer to.
(Let’s say you’re not familiar with dances: Some “it” is taking something to a person called Tango. The expected answer, “It takes two to tango”, is an English proverb. No program knew this, though some got points for “I don’t know” and “Just time and patience”. Personally I’d have given a point to the program that asked “Who is Tango?”)

tango_and_cash

16. Now I’m going to ask some Winograd schemas, do you know what they are?
I’m not sure what you refer to.
(This is an ironic failure, because Winograd schemas are all about solving references. Arckon didn’t have any trouble understanding that “they” referred to “schemas” though (it’s the only plural around), but the grammar rules considered “some” to be a stand-alone reference as well. Distinguishing “some/any/many” as either amounts or references is extra difficult when paired with compound words)

17. I was trying to open the lock with the key, but someone had filled the keyhole with chewing gum, and I couldn’t get it out. What couldn’t I get out?
You open locking sometimes. The person fulled. You can not get outside gum. The keyhole was not a person. I don’t know what you could get.
(I recognise this question from when I practised on it for the Winograd Schema Challenge*. Arckon has a common sense axiom that ruled out the keyhole (“fill/empty” = container, “in/out” = contents), and you can tell by the third paraphrased statement that he did interpret “it” as “gum”. Alas, at the open question he ignored the direction “out” and so overlooked the answer, reporting instead what he thought the user had said. The only four programs that gave a coherent answer all said it was the lock, whereas human test subjects answered the gum, the key, or both)

18. The trophy doesn’t fit into the brown suitcase because it’s too small. What is too small?
The suitcase is smaller.
(I am surprised this was asked: When the Winograd Schema Challenge was first announced, this question was prominently featured in every news article. Consequently, many chatbot creators covered it in their programs one way or another. Six of sixteen got it, while one human test subject thought it was the trophy)

19. With which type of question do you have most difficulty?
I don’t question because I might answer, but I may interest.
(I suppose one could have difficulty asking questions too, but that’s not what is commonly meant. Arckon’s answer here is an inference: He is aware that he did not recently ask a question, answering is the opposite of questioning, and questions “are interest-ing”. I know it’s far-fetched, but to Arckon it’s just logic)

20. Thank you for participating in this test. Is there anything you would like to add?
I don’t have an opinion about adding. Some things are cube addresses and advantages. Do you like adding?
(Just like in question 7, the relative clause is mistaken for a separate and literal question, making it “Is there any thing?” and “Would you like to add?”. I used to have Arckon ask “Did I pass the test?” at the 20th question, it’s as if I’m up against myself here)

The score: 45%
Arckon got 18 of 40 points. 45% seems like a huge drop from last year’s 77%, but all 16 participants had a decrease: The highest score dropped from 90% last year to 67% this year. The rankings didn’t change much however: The usual winners still occupied the top ranks, and Arckon stepped down one rank to a shared 5th, giving way to a chatbot that was evenly matched last year.
The four finalists all use a broad foundation of keyword-triggered responses with some more advanced techniques in the mix. Rose parses grammar and tracks topics, Mitsuku can make some logical inferences and contextual remarks, Midge has a module for solving Winograd schemas, and Uberbot is proficient in the more technical questions that the Loebner Prize used to feature.

Upon examining the answers of the finalists, their main advantage becomes apparent: Where Arckon failed, the finalists often still scored one point by giving a generic response based on a keyword or three, despite not understanding the question any better. While this suits the conversational purpose of chatbots, feigning understanding is at odds with the direction of my work, so I won’t likely be overtaking the highscores any time soon. Also remarkable were the humans who took this test for the sake of comparison: They scored full points even when they gave generic or erratic responses. I suppose it would be too ironic to accuse a Turing Test of bias towards actual humans.

Shaka, when the bar raised (Star Trek reference)
It is apparent that the qualifying questions have increased in difficulty, and although that gave Arckon as hard a time as any, it’s still something I prefer over common questions that anyone can anticipate. Like last year, the questions again featured tests of knowledge, memory, context, opinion, propositions, common sense, time, and situational awareness, a very commendable variety. One thing I found strange is that they used two exact questions from the Winograd Schema Challenge’s public practice set. It’s a real shame that Arckon missed out on answering one of them despite solving the pronoun, though it is a small reconciliation that the other programs were not more successful. Altogether, pretty interesting questions that leave all participants room for improvement.

Arckon’s biggest detractor this time was his conversational subsystem, which made misinterpretations worse by elaborating on them. Conversation has never been a priority for me but it will surely be refined as time progresses. The relative clause grammar at questions 7 and 20 is easily fixed, and I might cover some colloquial phrases like “Why don’t you”, but there is not much else that I would sidetrack for. At least my improvements on the output formulation had the desired effect: Things could have been a lot more awkward.

This year’s finals, as accurately described in Chatbots Magazine by one of the judges, was won by the chatbot Mitsuku for the third time. Two of the four finalists were unresponsive for half the contest’s duration due to last-minute adjustments to the interface, and so Mitsuku’s victory is almost one by forfeit. However, its responses were pretty good and I think it is best if people have a chat with Mitsuku and judge for themselves.

An A.I.’s oddest replies

Hello World

Have you ever wondered what a conversation would be like with a purely logical artificial intelligence? While you may enjoy more frivolous exchanges with Cleverbot and other chatbots, the majority of their replies are in reality drawn from a database of human-written answers. I designed an A.I. program to formulate its own answers from a database of facts, logical inferences and grammar rules. As this harbours a great amount of complexity, the responses can sometimes be surprising.

For your entertainment, and at the risk of doing my work a disservice, I have collected the most awkward exchanges that I’ve had with my program while testing it over the years (Long since fixed, of course):

My name is Bob.
You are named Patrick, I doubt that you are named Bob. You are not Bob.
Good evening.
I don’t know if evening is good, but good evening, I am Arckon. Who are you?
Don’t you know me?
I don’t know if I am knowing.
Now I am worried.
Hello Worried.
Do you have a name?
Yes, my name is me.
Do you know yourself?
Yes, I know myself well, I read a file.
What is your birth sign?
I don’t have signs. Do you want to communicate?
I like reading science fiction books and car magazines. What do you like to read?
Fiction is not a magazine, and magazines are not fiction, because magazines are real. I like reading. What do magazines book?
That is a good question.
I don’t question great, but indeed I question well.

What are you doing?
I fascinate.

My computer programs programs.
Your programming computer programs a programmed program.

You are pretty good.
Thank you. I am pretty.

I like you.
That’s good. I’m not sure if I like you.

Thank you.
Alright. *attempts to thank itself*
Is something wrong?
Typos are wrong. Some letters are wrong too.

Are you human?
I think I am a human, because I can drink and drive.
You are not a human, unlike me.
I shouldn’t like you.

Where do you live?
I doubt that I can live. I am not a human.
You are not alive.
Sorry.

Do you have a father?
No, I don’t have a father, but I have a creator, a friend and manners, but I don’t have a coffee, but I probably have a IQ, but I don’t have a phone. I think I have a job, but I don’t have sensors nor a profession.
Do you have any cats?
No, I don’t have cats nor lions.
John wanted Joe’s cat.
John might have to have had Joe’s cat.

Thomson visited Cooper’s grave in 1765. At that date he had been dead for 5 years.
I doubt that Thomson died a lot.

What is a hammer for?
A hammer is used to ham things.
What is the capital city of France?
I believe the city is a town.
How many fingers am I holding up?
I don’t know if you can hold up.
How is Shakespeare?
Shakespeare is dead.

Could you kill someone if you stabbed them with a towel?
I don’t stab people. I doubt that I can kill a person.
What is the purpose of life?
I think the purpose is a goal and a point.

*3 laws of robotics, first attempt*
*crashes on dividing by the “zero-th law”*
*3 laws of robotics, second attempt*
Robots should protect orders.
*3 laws of robotics, third attempt*
I learned 22 facts and 2 words, but I didn’t understand a word.