Thursday, January 15, 2009

Winner of 2008 Loebner Prize for Artificial Intelligence

The best two ACE (artificial conversational entities - Shah, 2005) from the preliminary phase of the 2008 Loebner Prize for Artificial Intelligence emerged in first and second place in the finals, held at the University of Reading, on Sunday 12 October, 2008.

Our objective for running this year's contest was three fold:

1) Bring awareness of the Turing Test to people who had never heard of Alan Turing and his imitation game, especially to children

2) Invite people who had heard of Turing, including non-experts who had read academic discussions of his writings, to experience a live imitation game

3) To test, through five minutes, unrestricted topic of conversation, parallel-paired against a human, whether any of the 2008 Loebner entries could achieve this:

"I believe that in about fifty years' time it will be possible to programme computers, with a storage capacity of about 10 to the power 9, to make them play the imitation game so well that an average interrogator will not have more than a 70% chance of making the right identification after five minutes of questioning"

(Computing Machinery & Intelligence, Mind, Vol. LIX, No 236, 1950)

We succeeded with the first two; we believe that it won't be long before a system succeeds with the third. In an email from the UK Government Department responsible for Children Schools and Families, this comment was received:

"Government very much welcomes the continued support you and other industries have given to our work on Science, Technology, Engineering and Mathematics (STEM) and commends your efforts to raise young people's awareness of science through the Loebner Prize event. The Minister for schools Jim Knight MP has asked me to pass on his thanks"

I am delighted to announce that Elbot, by Fred Roberts/ Artificial Solutions won the 2008 Loebner Prize for Artificial Intelligence bronze award for 'most-human-like' machine, after deceiving three human judges that it was a human that they were chatting with.

Winner, Fred Roberts wrote:

"...haven’t said it clearly enough, Mark [MATT protocol developer] did a fantastic job .... It was a fun contest. I’d say that even if Elbot hadn’t have won."

You can read more about Elbot's win here, and chat to Elbot here.

Eugene, developed by Vladimir Veselov, Eugene Demchenko & Sergey Ulasen, came second in the contest, convincing one judge, a Times newspaper journalist, that it was a human. Here is that journalist's piece on their experience.

Vladimir Veselov, who attended the event had this to say:

"I want to thank Huma, Mark, and all organizers of the Loebner Prize 2008. They did a tremendous work preparing the event."

Included below some pictures taken with my PDA, before testing began at 8.30am, in the judges/spectators room:

Left: judges terminals

Below: graphic of parallel set up

BBC news video of the contest can be viewed here.

BBC pictures and more information can be found here.

The MATT message-by-message communications protocol - using ICE (Internet communication engine), facilitated the five minutes, unrestricted conversation, ACE parallel-paired with hidden-humans imitation games, was developed by Marc Allan. As Loebner Prize Sponsor, Hugh Loebner wrote: "I'm sure that anyone who can develop an artificial intellect should be able to interface with ICE". [message #10132, Robitron forum, Thu Mar 20, 2008 3:46 pm]

University of Reading, School of Systems Engineering staff research posters in spectators area

Judges wrote of their experience:

"It was fascinating."

"Just a quick email to thank you for a wonderful day. The event itself was a great success and it definitely passed the test to qualify as a great meeting. So congratulations and many thanks again!"

"My experience of being a judge was very interesting. For the first two conversations I had, it was quite easy to tell which was the bot and which was the hidden-human. BUT, for the third one I really couldn't tell which was which!!"

One of the hidden-humans wrote:

"Thank you so much for giving me the opportunity to be part of the Loebner Prize. I had great fun!"

And finally, a couple of quotes from posters on the New Scientist piece about Elbot's win:

"It's of course still nowhere near passing the "imitation game", which is philosophically valid. Asking any question is a better test of true intelligence than a limited (but hard) domain like visual recognition. I think the design should be fully open to psychologists trying to understand human cognition." [Future In Politics, By Cedric Knight, Mon Oct 13 21:48:09 BST 2008]

"There's already chat bots out there that can do a much better job. But they're not competing for this prize, they're hooking people on IM into clicking affiliate links to download porn or links to malware. If this contest wants to see the best, they're going to have to offer a prize that beats the payoff people are finding in the commercial market." [Need A Larger Prize, By Jeremy Tue Oct 14 15:38:56 BST 2008]

In all, a hectic few months culminated in a great contest, thank you to all participants: contestants, judges and hidden-humans.

Right: Cyborg Head sculpture on display between judges terminals (artist: Luqman).

Links to other articles:

New Scientist
Reading Chronicle
The Herald

Good luck to the organisers and participants of Loebner 2009 :-)

(first posted 15/10/08)


Scott Jensen said...

I'm sorry but your judges must have been idiots or how they could only converse with them in an unrealistic and overly restrictive way. I was one of your screening judges and conversed with all of the programs. There is no way any of them could be viewed in any sort of way as being a real person. Not in the slightest.

It is also too bad that Eugene didn't win since it was the best program. You could ask it the following: "My car is red. What color is my car?" It gave the correct answer of "Red" whereas all but two other programs either couldn't comprehend the question (or that there was a question) or just took a random guess. The one or two other programs that were able to answer it correct weren't able to remember their answer at all. Not even if I asked it in the next entry. Eugene was the only one to do so.

Huma said...

Oh, hi Scott! Long time no read :-) how’ve you been? Hope very well.

Thank you for your comments. You certainly did test all the web-based entries, but you did not test the two disked-based Loebner 2008 entries (Ultra Hal and Trane).

Eugene came a close second to Elbot on points. It has been the runner up three times in Loebner Prizes, I am sure it will win one year.

Your testing/judging was done during the preliminary phase of Loebner 2008. Then, the format of judging the artificial conversational entities – ACE, was designed to fulfil Turing’s ‘jury service’ imitation game: each human judge-one ACE. The format for the finals entailed the stronger, 1950 version: judges talking in parallel to two unknowns (facilitated through the MATT communications protocol). This is more difficult, for both the judges and the ACE (which is being concomitantly compared with a human), than a one-to-one format.

I do hope when the competition returns to the US (2010) you are allowed the opportunity to judge in the parallel phase. I would love to read of your experience in that format.

Scott Jensen said...

Hi Huma,

No, I doubt very strongly the different testing format would have made any difference. I would still ask each, "My car is red. What color is my car?" If they got it right, I would then have asked in a separate entry, "What is the color of my car?" If you go back over the logs of my testing of the programs, you'll see I asked this question (varying the color at times) of each. Only Eugene passed this test. A test that even a kindergartner could have passed. Now if the in-person testing format prohibited the above question series, I would have called that format unrealistic and rigged in favor of the computer programs.

But even with Eugene it was obvious that it was a computer program to which I was talking. Only if the humans were instructed to lie and/or act brain dead would any human be fooled for a second by any of the programs. But if that was done, it was the deceiving humans that fooled the judges into thinking they were programs and not the programs fooling the human judges into thinking they were human. That some of your human judges mistook the computer programs as human indicates that the testing format is rigged in favor of the computer programs.

Huma said...

Scott, you may doubt strongly, but unless you’ve experienced an unrestricted topic of conversation, parallel-paired comparison of a machine with a human for five minutes (my contention of the canonical Turing Test), you cannot be sure.

Preliminary phase judges have their conversation logs, I don’t. I have their verdicts (from completed questionnaires).

No questions were forbidden in the finals phase last Sunday; judges could ask both, paired, unseen / unheard entities whatever they wanted. The hidden-humans were instructed to be themselves, and compete against the machine to convince the judges that they were the human.

Read Will Pavia’s piece in the TIMES, the link is in the blog page. I say again, a five minutes parallel-paired imitation game is far more difficult to judge in, than a one-to-one. Unless you’ve experienced both, you cannot know for certain that you would definitely have distinguished the artificial from the natural. However, most judges did do just that.

Scott Jensen said...

Hi Huma,

Then the in-person test is biased (a.k.a. rigged) in favor of the computer programs. And, yes, I can say with certainty that none of the computer programs could have fooled me. Period. Eugene could have gotten past the above question series I asked, but it failed the rest of my questions and it was obvious that it was a computer program.

Now your argument that I had to do it as the judges had to do it in person with the humans and computer programs isn't valid. I interacted with all the web-based programs and they were simply too inadequate and too obviously a computer program. I learned first hand their abilities and they were so inferior that they could have only fooled someone if the in-person testing was designed to favor them. Period.

For example, the "simultaneous conversations" with both a human and a computer at the same time favors the computer. The human judge must keep track of two separate conversations and that would help a deceiver. Be that deceiver human or computer. But even with this, I would have had NO trouble separating out human from computer as long as the human was prohibited from pretending they were a computer.

Now before you jump to the defense of the testing organization, please realize that I am all in favor of strong artificial intelligence. I want it to succeed. I want it to become a reality. What I am not in favor of is presenting something as achieving a goal that they really didn't achieve. The AI community suffered greatly when it didn't meet the high expectations it put had the public believe it was just about to reach. Because of that, it lost face and became the butt of jokes. Current AI scientists now shun making such wild predictions. The Loebner organization needs to not make it easier for computers to con human judges but harder. It needs to raise the bar. Then when a computer program does win one of its prizes, it is a real win that can stand up to criticism.

Huma said...

Hi Scott, apologies for missing this comment. I think we've just about covered this now, through emails! :-)

Scott Jensen said...

*Scott looks around to insure that no one is eavesdropping in on him and Huma*

Psst. The more comments, the better for a blog ... as far as search engines go. ;-)

Huma said...

No need to whisper Scott!! :-)

I'll let you in on a secret: I maintain this site for me, where I can keep links in one place, easier for when I am searching at a later date.

Now, I've already explained that the 2008 Loebner contest used Turing's five minutes, parallel-paired, text-based dialogical comparison, to find if the entries had the ability to display any conversational intelligence.

Of course Turing rigged his imitation game in favour of the machine, that's why he stipulated that both the machine and the hidden-human be unseen and unheard to the interrogator!

Finally, re that Times newspaper journalist and his deeming Eugene a human, this opinion was from the Turing Test when Eugene was compared against a non-native English speaking female. The same thing happened when Elbot was compared to this hidden-human. But Ultra Hal was correctly identified as machine when compared to this female non-native English speaker, she was finally recognised as human! I am sure she is relieved :-)

Anonymous said...

I recently came accross your blog and have been reading along. I thought I would leave my first comment. I dont know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.