Easy Questions Computers Are Terrible at Answering

charles taylor/Shutterstock.com

Computers still have less common sense than a toddler.

The way artificial intelligence is progressing, you might think a robot takeover of the human race is right around the corner. Artificial brains can now drive cars, do legal researchrecognize faces and beat the best human players at games like Go. Prominent technologists keep warning AI poses a fundamental threat to humanity.

The good news for us humans is that computers still have less common sense than a toddler.

Just look at computer programs that are supposed to understand and process human language. These bots, like Siri, Alexa and others, frequently trip over sentences a 3-year-old would have no trouble with.

“I sneezed the other day and Alexa thought I was saying her name,” said my colleague Mike Murphy. The 3-year-old would have said “bless you.”

Thus, the Winograd Schema Challenge was developed. It’s a way to test just how much common sense these kinds of bots have. And as expected, the bots aren’t doing so well.

A Winograd Schema question is one that is extremely easy for humans to answer, but defies the cold logic of computers. Take the following example: “The man couldn’t lift his son because he was so weak. Who was weak, the man or his son?” In this case, “he” could logically refer to either the man or his son. But as humans, we know it would be silly to mention the son was weak in this context. For computers, the “he” is equally valid for both.

The challenge, therefore, is to build programs that can answer these kinds of questions with a success rate around that of a human’s. Six programs, submitted by independent students and researchers, competed in the latest challenge, held last month at the International Joint Conference on Artificial Intelligence in New York. These six were right no more than half of the time. That’s the same as guessing randomly, since the questions only have two possible answers. The human subjects asked the same set of questions got over 90 percent of the questions right.

None of the heaviest hitters in language processing—Google, Baidu and the like—participated, but one of the submissions did employ the sophisticated machine-learning techniques these companies have embraced for such tasks. So, it’s not likely even Google has bridged the gap between the robot and human scores.

Here is a sampling of a set of Winograd questions published by Ernest Davis, a professor of computer science at New York University who specializes in “common sense testing.” These easy questions should make you feel better about your prospects in the “robot economy” we’ve all been warned about.

* * *

#1. The city councilmen refused the demonstrators a permit because they advocated violence.

Q: Who advocated violence?
Answers: The city councilmen/the demonstrators

#2. The trophy doesn’t fit into the brown suitcase because it’s too small.

Q: What is too small?
Answers: The suitcase/the trophy

#3. Joan made sure to thank Susan for all the help she had received.

Q: Who had received help?
Answers: Susan/Joan

#4. Paul tried to call George on the phone, but he wasn’t successful.

Q: Who was not successful?
Answers: Paul/George

#5. The lawyer asked the witness a question, but he was reluctant to answer.

Q: Who was reluctant to answer the question?
Answers: The witness/the lawyer

#6. The delivery truck zoomed by the school bus because it was going so slow.

Q: What was going so slow?
Answers: The truck/the bus

#7. Frank felt vindicated when his longtime rival Bill revealed that he was the winner of the competition.

Q: Who was the winner of the competition?
Answers: Frank/Bill

#8. The man couldn’t lift his son because he was so weak.

Q: Who was weak?
Answers: The man/the son

#9. The large ball crashed right through the table because it was made of steel.

Q: What was made of steel?
Answers: The ball/the table

#10. John couldn’t see the stage with Billy in front of him because he is so tall.

Q: Who is so tall?
Answers: John/Billy