Sunday, December 11, 2016

Hugh Loebner did a David Bowie: Died Surprisingly Quietly.

© Huma Shah, December 11, 2016

Hugh Loebner from 2006 Prize, UCL

 One week ago, December 4, 2016 a tweet from @eloebner announced “Hugh LOEBNER died peacefully in his sleep” [1].

Hugh Loebner’s ex-wife’s message didn’t sink in. That day, like many, I was in anticipation of Westworld TV show’s Season 1 finale [2] promised by Sky Atlantic to air at the same ‘time’ (2am UK 5 Dec) as for viewers in the US (9pm Dec 4), the mesmerising “dawning of consciousness” [3] robot epic based on the 1973 movie starring Yul Brynner as the refusing-to-die killer robot [4].

One week on, I’m reminded of John Sundman’s 2003 article in Salon Artificial Stupidity [5]:

All Hugh Loebner wanted to do was become world famous, eliminate all human toil, and get laid a lot. And he was willing to put up lots of good money to do so. He’s a generous, fun-loving soul who likes to laugh, especially at himself. So why does everybody dislike him so much? Why does everybody give him such a hard time?


I don’t think this is the time to go over his human failings, because we all have them, what I do want is to remind the academic world and beyond how approachable Hugh Loebner was. And yes he did have a sense of humour, read his cheeky retort claiming that the (now late) Marvin Minsky supported the Loebner Prize [6]. But more than that, Hugh had a profound understanding of Turing’s idea about how to build and measure whether a machine could think [7] - see his response to Stuart Shieber’s article Lessons from a Restricted Turing test [8].

I’m loathe to divulge personal experiences of interactions with Hugh Loebner, and there were many. I first saw him in 2003 at the 13th Loebner Prize contest at the University of Surrey. Then from 2005 discussions began for me to organise his 16th contest. I co-organised two Loebner Prizes: at UCL in 2006 [9], and at Reading University in 2008 [10]. Hugh allowed me to design his 18th contest as an original Turing test experiment different from all his previous Prizes– see Chapter 5 in my PhD thesis [11]. I can say as late as 2014, long after I’d gone my own way designing my own Turing test events, at Bletchley Park [12] on the 100th anniversary of Alan Turing’s birth June 23rd, 2012, and another on the 60th anniversary of Turing's passing away, at The Royal Society London [13], 6-7 June, 2014, Hugh remained approachable. He responded swiftly to my queries on the pass-rate in his 2014 Loebner Prize (I was concerned and have many emails making a plea not to use the ‘unsure’ score in my own 2014 Turing tests as part of a RoboLaw project dissemination event, and the media backlash following prove me right, but that’s another story).

Anyway, Hugh clarified that in his 2014 contest “The program must fool half the judges. That automatically means it will be compared to two different humans. Each judge meets each human once.” (Personal email to me: 18 May 2014 at 16:26).

Back to Hugh’s own words, here’s Hugh’s analysis of his own scholarship. It conveys his ability to self-mock. In early 2005 I had posed a question on Robby Garner’s Yahoo Robitron group message board: Knowledge and a Doctorate: to be a doctor or not, that is the question.

This is how Hugh Loebner replied:

16 March 2005:  "I pose the following question: Am I a fraudulent Ph.D.?

Affirmative: I submitted the dissertation for my Ph.D. knowing full well that the results were garbage.  I published (jointly with my chairman) the results of my dissertation in the Journal Demography (top journal in the field) knowing full well that the results were garbage.

Negative: I did not alter the data.  I performed my statistical calculations as I described.  I explicitly explained and described why the results were garbage.

Why were the results garbage?

I reanalyzed some data on fertility in Central India which the chairman of my committee had collected and previously analyzed using cross tabulations and published as a book.

Rather than perform cross tabulations as my chairman had done, I used the method of 'path analysis', the 'hot' new statistical analysis in sociology/demography at the time.

Path analysis proceeds in three steps.
1.  Take a piece of paper and mark dots on the paper representing variables of interest. 
2.  Based upon 'theory,' draw arrows from some dots to some other dots
representing causation 
3. After the arrows are drawn (based upon 'theory') perform a multiple regression analysis to put numbers next to the arrows indicating relative strengths of the 'causes.'

Note Well:  The choice of paths (arrows) is based upon "theory."  The data do not reveal the causal network.  The data reveal the strengths of the causes *given whatever particular path diagram is drawn.*

I asked the question (which apparently no one else has ever considered): given N variables, how many different possible path diagrams can be drawn?

Note that with N variable 1, 2, 3, 4 ... n, there are n*(n-1) = M  ( 1  -> 2; 2 -> 1; 1 -> 3; 3 ->1; ... etc) possible arrows (causes), only some of which, presumably, are correct.  We can have every path
diagram from the null case (nothing directly causes anything else eg no arrows on the page) to the complete case, (every variable causes every other variable eg every possible arrow on the page).

Given M objects, how many different subsets can be drawn.  The answer is "the Power Set" = 2^m .

The answer to the question "How many path diagrams are possible given M arrows?"  is 2^M.  M, remember, is the number of arrows, not variables.

My analysis had 22 variables(dots).  This means that the number of possible causes (arrows) was 22*21=462 possible arrows.  The number of possible path diagrams for my set of variables is 2^462, which is a very large number.  Only one path diagram can be correct, therefore, as a first approximation, the p that my path diagram was correct is 1/(2^462)  I pointed this out on page 3 of my dissertation and column 3 of the article, and then said "with this limitation in mind, lets analyze the results."

Ergo, the results were garbage.

Am I a fraud?  My results were garbage, but I said it first.  If anyone critiques my dissertation (or article) I can reply: "Yes, of course, but I already said that."    ;-) Hugh


And that’s how I will remember Hugh-the-original  :)

© Huma Shah, December 11, 2016

References

[1] Elaine Loebner on Twitter: https://twitter.com/eloebner

[2] Westworld 2016: http://www.hbo.com/westworld

[3] JJ Abrams at Westworld premiere: https://www.youtube.com/watch?v=jJQOXPbmjd8

[4] Westworld 1973: http://www.imdb.com/title/tt0070909/?ref_=nv_sr_2

[5] Artificial Stupidity, Salon, February 23, 2003: http://www.salon.com/2003/02/26/loebner_part_one/

[6] Home of the Loebner Prize: http://www.loebner.net/Prizef/loebner-prize.html  and
1995 Loebner Prize Announcement: http://loebner.net/Prizef/minsky.txt

[7] From the Buzzing in Turing’s Head to Machine Intelligence Contests:    https://www.academia.edu/226311/From_the_Buzzing_in_Turings_Head_to_Machine_Intelligence_Contests

[8] Hugh Loebner: In Response to Stuart Shieber: http://loebner.net/Prizef/In-response.html


[10] 2008 Loebner Prize:
Reading University page:
https://www.reading.ac.uk/15/research/ResearchReviewonline/featuresnews/res-featureloebner.aspx

[11]. Deception-detection and machine intelligence in Practical Turing tests. https://www.academia.edu/415888/Deception-detection_and_machine_intelligence_in_practical_Turing_tests

[12] Turing100in2012 at Bletchley Park:

[13] Turing2014 at The Royal Society, London: http://turingtestsin2014.blogspot.co.uk/

[14] RoboLaw: http://www.robolaw.eu/



Tuesday, November 22, 2016

Why the Winograd Schema Challenge is not an advance on the Turing test

© Huma Shah, 22 November, 2016

Charles Ortiz of Nuance Communication states: “The Winograd Schema Challenge provides us with a tool for concretely measuring research progress in commonsense reasoning, an essential element of our intelligent systems." (from here ).

So does the Turing test!


The Winograd Schema Challenge is not superior to the Turing test. This is because, the kinds of common-sense reasoning questions that Hector Levesque proposes are the kinds of ‘statements-followed-by-questions’ that judges already ask in English-language Turing test contests.

For example, in the 2008 Turing test experiment I designed (around the 18th Loebner Prize for AI) at Reading University, one of the judges (a psychologist), asked Eugene Goostman machine ‘My car is red. What color is my car?’. In 2008 Eugene Goostman replied correctly with ‘red’.

To me the Winograd Schema Challenge is a condescension and presumption that Turing test judges don’t have the intellectual faculty to ask smart questions to determine human from machine. The transcripts show that most judges do. Whether the judges are members of the public including teenagers, computer scientists or journalists, they ask all sorts of visceral questions during Turing test contests.

What a major Turing test challenge needs is not pittance prizes but a major few-million -sterling-pounds or 10million dollars award to address and build a truly Turing-conversational system.

Conversational commerce will drive question-answer systems to beyond common-sense reasoning. What is that anyway? Knowledge gained by experience :)

© Huma Shah, 22 November, 2016

Monday, October 10, 2016

Turing test is more than mere theatre: it shows us how we humans think.

© Huma Shah October 2016

Is the Turing test mere theatre?

You might question why anyone bothers to stage Turing test experiments when a computer programme achieved 50% deception rate in the first instantiation, the inaugural Loebner Prize for Artificial Intelligence in 1991. Back then judges were restricted to asking questions specific to each hidden entity, human or machine’s specialism. Whimsical conversation was the topic winner in 1991 from Joseph Weintraub’s PCTherapist III programme: http://www.loebner.net/Prizef/weintraub-bio.html

Since then machine simulation of natural language has moved on and we see chatbots with characters able to express opinions, share ‘personal information’ and tell it like it is! This from Elbot /Artificial Solutions  (http://www.elbot.com/) during a chat with this blogger on Oct 6, 2016: “You have quite an imagination. Next thing you know you'll say I needed batteries!

For me the interest comes from movie talking robots, recently sensationally experienced in Ex Machina where the female embodied robot Ava perpetrates the ultimate deception, or does she? Well it wasn’t quite what Ava’s programmer in the movie, Nathan had imagined. And of course HAL from 2001: A Space Odyssey who talked and lip-read to a menacing end in Kubrick’s truly glorious cinematic production.

Experiencing the 13th Loebner Prize at University of Surrey in 2003 I felt tweaking that format could produce some interesting data. After all, Turing’s imitation game was as much concerned with finding out how we humans think as it is about exploring the intellectual capacity of a machine through its ability to answer any question in a satisfactory and sustained manner – Turing’s words (Computing Machinery and Intelligence, 1950).

Two decades from the 1st Turing test instantiation developer Rollo Carpenter claimed his programme Cleverbot was considered to be 59.3% human, following 1334 votes cast at a 2011 event in Guwahati, India.

Up to the year 2003 the method of practical Turing tests involved a human interrogator interviewing one hidden entity at a time. This is what I have named the viva voce Turing test in my PhD, ‘Deception-detection and Machine Intelligence in Practical Turing tests’.  In 2008 I designed a Turing test experiment in which, for the first time, control pairs of 2machines and 2humans were embedded among pairs of human-machine set-ups. In this layout each interrogator simultaneously questioned a hidden pair and had to decide which was human and which was machine. The 2008 Turing tests were also the first time in which school pupils were given the opportunity to participate as judges and hidden humans.

The new book, ‘Turing’s Imitation Game: Conversations with the Unknown’ details that experiment and two follow up Turing test events, in 2012 at Bletchley Park held on the 100th anniversary of Alan Turing’s birth, June 23rd as part of the worldwide centenary celebrations, and at The Royal Society London in 2014, on the 60th anniversary of Turing’s untimely and sad death.

Each experiment had an incremental purpose, including to scale machine performance in dialogue and whether they were getting better at answering questions in a satisfactory manner. As readers will learn from the book, we humans do not always answer a question appropriately, so should we be harsh when machines don’t, especially as they are learning programmes and a lot ‘younger’ than some of the youngest human judges?

Implementing Turing tests is actually quite hard work. Finding open-minded human interrogators and human foils for the machines as well as motivating developers of computer programmes to participate, takes time and persuasion. Not everyone is happy at the conclusions, as can be evidenced by the many negative and angry comments across tech magazines and newspaper articles, especially after the 2014 experiment. The Turing test does this, it is one of those controversial areas of science that brings out the proprietorial impressions; everyone feels their interpretation is the one Turing intended.

I am really grateful for all the participants, humans and machine who have participated in our experiments – more than 80 judges, 70 hidden humans, and for the ingenuity, patience and collaboration of the developers: Fred Roberts for Elbot; Robby Garner for JFred-TuringHub; Rollo Carpenter for Cleverbot; Robert Medeksza for Ultra Hal, and Vladimir Veselov and his team for Eugene Goostman. You will meet these conversationalists, or chatbots in the book. I hope it encourages more school pupils and the general public to take interest in the Turing test and get involved in the challenge. There’s still more to be done here :)


Thursday, October 06, 2016

What is the Turing test?


For the October 2016 launch of Turing's Imitation Game: Conversations with the Unknown, publisher Cambridge University Press asked the authors for answers to fundamental questions on the Turing test. Co-author, Huma Shah answers hers below:

Image: Harjit Mehroke


CUP: The Turing test was originally devised by Alan Turing in 1950. Why write a book about it now?

Huma: Turing actually devised his imitation game in his 1948 paper, ‘Intelligent Machinery’, considered the first manifesto of artificial intelligence. Turing’s test aims to investigate the intellectual capacity of machines, so it is as relevant today as when he was developing his ideas more than 60 years ago, especially because we are building more and more computer programmes and robots to conversationally interact and collaborate with humans.

CUP: What reactions have you seen in people who have taken the test?

Huma: Judges and hidden humans have mostly enjoyed their participation. However when some judges who got it wrong learn they did not accurately categorise humans as humans and machines as machines they ask all sorts of questions to mitigate their error, such as ‘Were the humans told to act like machines?’ – they were not, all humans in our experiments have always been asked to be themselves. However, what these judges probably have not realised is that error-making is part of intelligent thinking, it’s one way of how we learn and improve.

CUP: Why has the Turing test been controversial?
Huma: Because it questions the very nature of what it means to be human, and conversation-natural language is most human. Different interpretations of Turing’ ideas exist as to the purpose of the test with lots of disagreements, but this is healthy and democratises science and empirical work.

CUP: There is a popular misconception that the Turing test is a test for human-like intelligence in machines. But what is it really?

Huma: No, it is not a test for human-like intelligence but an exploration of whether a machine can ever answer any question put to it in a satisfactory and sustained manner. Of course the judgement of whether an answer to a particular judge’s question is relevant rests with the interrogator who might feel a machine’s response is more appropriate than a human’s answer to the same question.

CUP: Has a machine passed the Turing test? What is the significance of that event?

Huma: No, not in the sense that Turing would have envisaged. What has been achieved in the 2014 Royal Society London held experiment could be said to be the first challenge being overcome, that of wrong identification by 30% of a panel of judges. But this is open to interpretation of one statement of Turing’s in his 1950 paper ignoring what he said before and after. We do not yet have in existence the kinds of machines Turing envisaged that would play his imitation game satisfactorily.

CUP: Can machines think?

Huma: It depends on what you mean by thinking J  In place of circular definitions Turing posed his imitation game and felt that if a machine could answer any question in a satisfactory and sustained manner then that would not be an easy contrivance.


Thursday, September 29, 2016

Turing's Imitation Game: Conversations with the Unknown

New book published by Cambridge University Press, September 2016:

Turing's Imitation Game: Conversations with the Unknown
co-authored by  Kevin Warwick and Huma Shah

Based on Shah's PhD thesis, Deception-detection and Machine Intelligence in Practical Turing tests, the book presents interviews with two contemporaries of Alan Turing, (the late) Professor John Westcott, and Sir Giles Brindley co-members of the Ratio Club. It tells the story of the origins of the ideas that gave rise to the Turing test and introduces you to Developers of computer programmes and the chatbots that attempt to answer any question in a satisfactory manner.

The book is appropriate for 'A' level and university students and teachers with interest beyond computer science: design, engineering natural language, linguistics, psychology, philosophy, anthropology, sociology, robotics, ethics and cybercrime/deception-detection.