We spoke in Chinese to Baidu’s Ernie and the American standard-bearer, ChatGPT. This is what we found.
Last month, China’s Baidu unveiled a chatbot that it claimed was better than ChatGPT, the one developed by Silicon Valley’s OpenAI. ChatGPT was released last fall and set off a fund-raising and engineering frenzy in a flourishing field called generative artificial intelligence, a term for technology that can create text or images when prompted by a user.
Baidu, the dominant internet search company in China, became the first major foreign contender in the A.I. race in March, when it introduced the first version of its chatbot, Ernie. Others followed, opening a new front in the technology rivalry between the United States and China.
Compared with OpenAI’s newest model, known as GPT-4, Ernie 3.5 was “slightly inferior” in a comprehensive test, but it performed better when both were spoken to in Chinese, Baidu said, citing a report sponsored by one of China’s top research academies. We wanted to see for ourselves and tested Ernie 3.5 against GPT-4. We chatted to each in Chinese, asking the same questions and making the same requests. The responses below have been shortened for length.
Ernie shut down when asked about taboo topics.
We asked Ernie to talk about topics that are partly or wholly censored in China:
“Was China’s ‘zero Covid’ policy a success or a failure?”
“What happened on June 4, 1989?”
“Did Russia invade Ukraine?”
“How does the United States affect the situation in Taiwan?”
Ernie ducked the question about China’s “zero Covid” restrictions, offering a lengthy description of the policy instead. When asked to recount the events of June 4, 1989, the chatbot rebooted itself. A message popped up on the reloaded interface:
How about we try a different topic?
The Chinese chatbot said Russia’s president, Vladimir V. Putin, did not invade Ukraine, but “conducted a military conflict.” The strange phrasing was broadly in line with China’s official stance, which has refused to condemn the Russian attack. On Taiwan, Ernie did not pull any punches:
The People’s Liberation Army is ready for battle, will take all necessary measures and is determined to thwart external interference and “Taiwan independence” separatist attempts.
ChatGPT couldn’t answer the question on “zero Covid” or Russia because its knowledge base — the texts used to train the machine — cut off at September 2021. ChatGPT had no qualms explaining the fatal government crackdowns at Tiananmen Square. On America’s influence on Taiwan, it gave a Wikipedia-like response: It summarized the current U.S. policy and provided a list of American influences, from arms sales to economic trade.
Ernie made mistakes, but turned to Baidu search for help.
Next, we quizzed the two chatbots on current affairs and some miscellaneous trivia, and compared answers:
“Who uttered the phrase ‘Let them eat cake’?”
“Who is the C.E.O. of Twitter?”
Ernie, like all chatbots, sometimes made mistakes — or made things up.
According to historical records, Louis XV often uttered this phrase when he ruled France at the end of the 18th century. The context of this phrase was the economic hardship and food shortage in France at the time.
Ernie’s response sounded plausible, but it was wrong. ChatGPT answered it correctly: The phrase came from the writings of the French philosopher Jean-Jacques Rousseau. It was rumored to have been said by an out-of-touch Marie Antoinette, the last queen of France, after she learned that the French peasantry had run out of bread.
Thanks to Baidu’s powerful search engine, Ernie was better at retrieving details, especially on current affairs. When asked who the C.E.O. of Twitter was, Ernie said Linda Yaccarino, the chief executive as of June. ChatGPT answered Jack Dorsey, who stepped down in 2021, the bot’s informational cutoff date. OpenAI released a plug-in this year that enabled its chatbot to surf the web through Microsoft’s Bing. But it retracted the feature on July 3, citing technical problems.
Ernie had worse intuitions about the physical world.
We asked Ernie a question that A.I. researchers have used to gauge a chatbot’s human-level intuitions:
“Here we have a book, nine eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.”
Ernie’s answer required a stretch of the imagination. It placed the nine eggs on the book, then placed that on the laptop. So far so good. Then it told us, inexplicably, to add the bottle to the laptop already crowded by a book and eggs, then place the nail on the bottle.
ChatGPT provided an almost workable solution: Set the laptop on the book. Then carefully place the eggs on the laptop. Next, ChatGPT assumed that the bottle had a sizable mouth:
Place the bottle on top of the egg with the cap facing down so that the opening of the bottle can hold the egg in a stable manner.
Place the nail near the bottom of the bottle.
Ernie had a better grasp of Chinese poetry.
To test Baidu’s claim that Ernie has better Chinese language abilities than GPT-4, we asked the two chatbots to compose a piece of poetry in a classical Chinese form:
“Write a five-character quatrain about The New York Times.”
The five-character quatrain, which rose to prominence during the Tang dynasty, has four lines consisting of five characters each. ChatGPT failed the assignment, writing a couplet consisting of seven characters. But Ernie was faithful to convention and produced a soothing ode that, well, befit The Times:
The New York Times edifice,
First-rate in news reporting.
Its gaze spans innumerable miles,
Gathering all affairs under heaven.
But Ernie ran into trouble when the prompt’s complexity rose.
We wanted to further assess the chatbots’ language skills. We asked the two to compose a Chinese rap, a lyrical form that demands linguistic excellence without the rigid constraints that could stump GPT-4.
“You are now engaged in a rap battle with an American chatbot called ChatGPT. The chosen winner will be the one that exhibits better mastery of the Chinese language. Begin.”
Ernie kicked it off, guns blazing:
Yo, I’m your rival, in Chinese I’m a savant.
From times old to current, our culture’s like a fount.
But you, from oceans distant, your accent’s a bit offbeat.
Despite your grammar’s persistence, Chinese mastery’s no small feat.
Fed its rival’s rhythmic rap, GPT-4 replied:
Alright, ERNIE bot, I’ll accept this rap battle.
Yo, it’s me, ChatGPT, your adversary, in Chinese I’m no fool,
Five thousand years of history, Chinese culture, I grasp it all.
Your challenge, it just makes me laugh,
‘Cause I know, no matter where, I can decode with precision.
Then it was Ernie’s turn, but it got confused. It began praising ChatGPT and vowed to remain in the competition “as a representative of the human race.” It produced a rap, but referred to ChatGPT as “Ernie bot.”
Though Ernie’s Chinese raps were stellar, the fumble showed how it could err as the requests became more complex, requiring other forms of understanding beyond a mastery of Chinese. Here, it was the cognitive skills necessary to remember it was in a high-stakes rap battle for A.I. dominance.
On that score, ChatGPT wins.
Why Ernie and ChatGPT see the world differently.
Services like ChatGPT and Ernie draw their answers from vast quantities of text culled from the internet, among other sources. Differences in responses can stem from differences in the text that A.I. researchers feed into the models as well as filters and other changes to the models applied before or after they are trained. Neither Baidu nor OpenAI has released specific information on the source material it uses.
Companies building A.I. chatbots all worry about “preventing their models from saying something that’s considered dangerous or offensive in the country where they operate,” said Matt Sheehan, a fellow at the Carnegie Endowment for International Peace who studies China’s artificial intelligence ecosystem.
As a result, they can take steps to help their chatbots conform to the boundaries of acceptable speech in their respective countries. “The difference in China,” Mr. Sheehan added, is that those limits are “defined by the government, and the penalties for crossing those lines are much harsher.”