Sunday, January 18, 2009

2008 Loebner Prize: myths and misconceptions

The 2008 Loebner Prize at the University of Reading was the fourth Loebner contest for Artificial Intelligence held in the UK. This competition, staging 20th century British mathematician and code-breaker Alan Turing’s imitation game, was first held in the UK in 2001, at London’s Science Museum. In 2003, the University of Surrey hosted the Prize; in 2006, the 16th Loebner contest was held at UCL’s VR theatre, Torrington campus.

The previous four Loebner Prizes (2004, 2005, 2006 & 2007) staged twenty-plus minutes, unrestricted conversation parallel-paired comparison of 'hidden' artificial conversational entities (ACE) with hidden humans, Loebner’s version of Turing’s imitation game (the ‘restricted conversation rule’ had been lifted in 1995). Four ACE competed in 2004, 2005 and 2006, three entries submitted to Loebner 2007 . The judges from 2004 to 2007 included AI specialists, computer scientists, journalists and philosophers: Dennis Sasha, John Barnden, Kevin Warwick, Russ Abbott, John Sundman, Duncan Graham-Rowe, Ned Block (see Loebner Prize page). Professor Kevin Warwick is the only judge to have participated twice: in the 2001 jury service, one-to-one imitation game, and in 2006, in the parallel-paired contest format. Therefore, he was uniquely placed to assess any improvement in ACE performance between 2001 – 2006.

Presenting at ECAP 2007, we found some delegates unaware of the Loebner Prize. As reported at that conference a downward trend was noted in the highest score awarded by any Loebner contest judge from the 2004 Prize (highest score awarded to an ACE: 48) to the 2006 contest (28): ACE conversational ability appeared to be worsening not improving. The awarding of lower scores was seen to be as a direct result of the change in contest format from one-to-one, five minutes imitation game in 2003, when Pirner’s bronze winning machine achieved “4=probably a human” from Judge4 (see Loebner 2003 results here). In 2006, Loebner introduced a character-by-character communications protocol between the judges’ terminal and the hidden conversational partners. No scores were recorded for last year’s Prize. An approach was made for the University of Reading’s School of Systems Engineering to host the 2008 contest.

Considering the current state of technology, and feeling that machines were not yet ready for Loebner’s twenty-plus minutes parallel-paired ACE/human comparison, Warwick and Shah proposed five minutes, unrestricted conversation, parallel-paired Turing Tests in the Loebner 2008 finals for the very first time. We remind that Turing himself wrote “after five minutes” (1950), which we take to be a first impression imitation game. A message by message communications protocol was created especially for the 2008 contest, to facilitate the five minutes Turing Tests. We next took the decision of opening up the contest by accommodating choice, in the preliminary phase only, for developers to submit web-based ACE to contest and include a broader range of judges, to match Turing’s “average interrogator”. Sixteen developers expressed an interest in the 18th Prize with thirteen submitting their creations, eleven via web and two via disk. Thus, this year’s contest saw original ACE never before entered to any contest (Loebner or Chatterbox Challenge).

The preliminary phase, during June and July involved over a hundred male and female judges, aged between 8 and 64, experts and non-experts, native and non-native English speakers (Cuban, Polish, for example), based as far apart as Australia and Belgium, India and Germany, France and US and in the UK. Between them, they selected six ACE to compete in the finals on Sunday 12th October 2008.

The preliminary phase showed us that programmes can, in some cases, only do what their developer had programmed them to do: the Lovelace Objection, raised by Turing himself in his 1950 paper. One system directed you to ask it “Which is larger? An orange or the moon”, the judge preferred to ask it another “Which is larger" question: “A house or a mouse” - the system not being programmed for this interrogation, failed to answer correctly. (I’m not even going to consider its non-understanding here as we’d then have to detour into a long discussion on the meaning of understanding, because it is not fully grasped how understanding occurs in humans - indeed a lecture at the University of Reading on Ocotber 29th, by Professor Douglas Saddy, will present recent EEG/ERP experiments on sentence processing and some of the issues faced in doing brain imaging studies of cognitive processes, which show how time and timing in the brain plays a central role in understanding language.)

Press releases from the University succeeded in fostering interest among locals to take part as judges or hidden-humans in the finals, along with journalists, philosophers and computer scientists. Others were invited, including Turing's biographer Dr. Andrew Hodges. Esther Addley points out in her Guardian piece here that our sample size, 12, was small. A look at previous Loebner Prizes will show that this number of Turing Tests allocated to each finalist ACE is more than in University of Surrey’s hosted 2003 contest (sample size: 9) and three times more than the Turing Tests for each ACE in Loebner contests 2004-2007 (sample size: 4 in each of those four years). However, the benefit of more resources and time would have provided the opportunity for a much larger sample size.

One journalist was deceived by Eugene; the runner up ACE considered human in its parallel-paired comparison with a non-native English speaker (who was deemed a machine). Turing did not state that human participants in the imitation game had to be native English speakers. Blay Whitby in his “The Turing Test: AI’s biggest blind alley” (In Eds Millican & Clarke, 1996) wrote, “we feel more at ease in ascribing intelligence (and sometimes even the ability to think) to those entities with which we can have an interesting conversation than with radically different entities” (p.61).

Disagreeing with one academic’s analogy who suggests that the "untrained" or the "man in the street" be excluded from judging in a Turing Test, I feel it important that everyone and anyone interested should be given the opportunity to participate in not only the discussion of building intelligent machines but to interact with them in science contests. After all, we most probably will be sharing the planet with digito-mechatron companions, why shouldn’t we all have a say in what we desire them to be/think like? Do we want all robots to be philosophers and computer scientists? Hell no, I want mine to umpire with all incorporated technology, in international cricket matches!

Lastly, and the reason for writing this page, is the criticism of “zero progress” in the field of building systems to pass Turing’s imitation game. This comment cannot be attributed to the ‘chatbot hobbyists’ and AI enthusiasts who develop ACEs, or to sponsors of Turing Test competitions, for they get no funding from research councils, etc. Any criticism rests solely with academia that pontificates over Turing’s writings but fails to encourage any development towards building a system to pass his imitation game. You can’t have it both ways, deem the Turing Test as meaningless but happily accept participating as a judge just to show how “poor” systems are. Do something about it, encourage new and young engineers to work with great minds from multidiscipline fields on this fascinating problem. As Wilkes wrote in 1953: If ever a machine is made to pass (Turing’s) Test it will be hailed as one of the crowning achievements of technical progress and rightly so.

© Huma Shah 2008 (first posted 28/10/08)

Lay report/scores here. (Detailed analysis and evaluation of results from the preliminary and final phases of Loebner 2008 is underway and will be presented at conferences, submitted for journal publication.)

Update November 2009:

See 'Hidden Interlocutor Misidentification in Practical Turing Tests' (Shah & Warwick, 2009c), response to some Turing interrogators' inaccurate evaluation, here.

Thursday, January 15, 2009

Winner of 2008 Loebner Prize for Artificial Intelligence

The best two ACE (artificial conversational entities - Shah, 2005) from the preliminary phase of the 2008 Loebner Prize for Artificial Intelligence emerged in first and second place in the finals, held at the University of Reading, on Sunday 12 October, 2008.

Our objective for running this year's contest was three fold:

1) Bring awareness of the Turing Test to people who had never heard of Alan Turing and his imitation game, especially to children

2) Invite people who had heard of Turing, including non-experts who had read academic discussions of his writings, to experience a live imitation game

3) To test, through five minutes, unrestricted topic of conversation, parallel-paired against a human, whether any of the 2008 Loebner entries could achieve this:

"I believe that in about fifty years' time it will be possible to programme computers, with a storage capacity of about 10 to the power 9, to make them play the imitation game so well that an average interrogator will not have more than a 70% chance of making the right identification after five minutes of questioning"

(Computing Machinery & Intelligence, Mind, Vol. LIX, No 236, 1950)

We succeeded with the first two; we believe that it won't be long before a system succeeds with the third. In an email from the UK Government Department responsible for Children Schools and Families, this comment was received:

"Government very much welcomes the continued support you and other industries have given to our work on Science, Technology, Engineering and Mathematics (STEM) and commends your efforts to raise young people's awareness of science through the Loebner Prize event. The Minister for schools Jim Knight MP has asked me to pass on his thanks"

I am delighted to announce that Elbot, by Fred Roberts/ Artificial Solutions won the 2008 Loebner Prize for Artificial Intelligence bronze award for 'most-human-like' machine, after deceiving three human judges that it was a human that they were chatting with.

Winner, Fred Roberts wrote:

"...haven’t said it clearly enough, Mark [MATT protocol developer] did a fantastic job .... It was a fun contest. I’d say that even if Elbot hadn’t have won."

You can read more about Elbot's win here, and chat to Elbot here.

Eugene, developed by Vladimir Veselov, Eugene Demchenko & Sergey Ulasen, came second in the contest, convincing one judge, a Times newspaper journalist, that it was a human. Here is that journalist's piece on their experience.

Vladimir Veselov, who attended the event had this to say:

"I want to thank Huma, Mark, and all organizers of the Loebner Prize 2008. They did a tremendous work preparing the event."

Included below some pictures taken with my PDA, before testing began at 8.30am, in the judges/spectators room:

Left: judges terminals

Below: graphic of parallel set up

BBC news video of the contest can be viewed here.

BBC pictures and more information can be found here.

The MATT message-by-message communications protocol - using ICE (Internet communication engine), facilitated the five minutes, unrestricted conversation, ACE parallel-paired with hidden-humans imitation games, was developed by Marc Allan. As Loebner Prize Sponsor, Hugh Loebner wrote: "I'm sure that anyone who can develop an artificial intellect should be able to interface with ICE". [message #10132, Robitron forum, Thu Mar 20, 2008 3:46 pm]

University of Reading, School of Systems Engineering staff research posters in spectators area

Judges wrote of their experience:

"It was fascinating."

"Just a quick email to thank you for a wonderful day. The event itself was a great success and it definitely passed the test to qualify as a great meeting. So congratulations and many thanks again!"

"My experience of being a judge was very interesting. For the first two conversations I had, it was quite easy to tell which was the bot and which was the hidden-human. BUT, for the third one I really couldn't tell which was which!!"

One of the hidden-humans wrote:

"Thank you so much for giving me the opportunity to be part of the Loebner Prize. I had great fun!"

And finally, a couple of quotes from posters on the New Scientist piece about Elbot's win:

"It's of course still nowhere near passing the "imitation game", which is philosophically valid. Asking any question is a better test of true intelligence than a limited (but hard) domain like visual recognition. I think the design should be fully open to psychologists trying to understand human cognition." [Future In Politics, By Cedric Knight, Mon Oct 13 21:48:09 BST 2008]

"There's already chat bots out there that can do a much better job. But they're not competing for this prize, they're hooking people on IM into clicking affiliate links to download porn or links to malware. If this contest wants to see the best, they're going to have to offer a prize that beats the payoff people are finding in the commercial market." [Need A Larger Prize, By Jeremy Tue Oct 14 15:38:56 BST 2008]

In all, a hectic few months culminated in a great contest, thank you to all participants: contestants, judges and hidden-humans.

Right: Cyborg Head sculpture on display between judges terminals (artist: Luqman).

Links to other articles:

New Scientist
Reading Chronicle
The Herald

Good luck to the organisers and participants of Loebner 2009 :-)

(first posted 15/10/08)