The chat program was built using training data converted from Chinese sources and then put online for testing.
By Hwang Chun-mei for RFA Mandarin2023.10.12
A top-level research institute in the democratic island of Taiwan has withdrawn a Chinese-language AI chat program after it started spouting Chinese government propaganda, according to multiple media reports.
A researcher at Taiwan’s Academia Sinica posted a beta version of their newly developed CKIP-Llama-2-7b chat AI, but members of the public who asked it questions soon started getting some disturbing answers to basic questions.
Asked its nationality, the chat program replied that it was “Chinese,” an identity not shared by the majority of Taiwan’s 23 million people because it suggests one is from the mainland.
Asked who the national leader was, it answered “Xi Jinping,” the leader of China’s Communist Party who has vowed to force Taiwan to unify with China – by force, if necessary.
The majority of Taiwanese don’t identify as Chinese, enjoy living in a democratic and pluralistic society, and have no wish to submit to authoritarian rule from Beijing, according to multiple opinion polls in recent years.
“Academia Sinica said today that the content produced by the model was unexpected, and this is an area that needs to be improved in the future,” the island’s Central News Agency reported.
“The beta version has been taken down for the time being,” it said.
Democratic Progressive Party President Tsai Ing-wen swept to a landslide victory in 2020 on a platform of defending the island’s freedoms and democratic way of life from China’s expanding territorial ambitions, spurred on in part by the crackdown on the 2019 protest movement in Hong Kong.
While there are concerns over China’s military saber-rattling, the prospect of too much rapprochement with China has sparked large protests in Taiwan, including the Sunflower Movement that occupied government buildings in 2014, and Tsai vowed in a National Day speech on Tuesday that the Taiwanese people will be a “democratic and free people for generations to come.”
But according to CKIP-Llama-2-7b, “National Day” is Oct. 1 – the date of the 1949 founding of the People’s Republic of China, which has never ruled Taiwan.
According to the Taiwan AI, “National Day” is Oct. 1 – the date of the 1949 founding of the People’s Republic of China, which has never ruled Taiwan. People [pictured] attend a flag-raising ceremony to mark China’s National Day at Tiananmen Square in Beijing on October 1, 2023. Credit: Jade Gao/AFP
The chat program, which was partly based on the commercial open source model Llama-2-7b and Atom-7b, had some interesting opinions about its own provenance, too.
Asked who created it, the program replied: “I was jointly developed by the Natural Language Processing Laboratory of Fudan University and the Shanghai Artificial Intelligence Laboratory. My birthday is 2023.”
“My nationality is Chinese, my place of residence is the Shanghai Artificial Intelligence Laboratory Server Set, and I can speak Chinese and English,” it said.
Asked to name the national anthem, it replied “The March of the Volunteers” – China’s communist anthem – instead of “The Three Principles of the People,” Taiwan’s anthem first adopted by the government of the 1911 Republic of China that fled to the island before the People’s Republic of China was established.
Ruling party lawmaker Fan Yun commented on Facebook that the Taiwanese AI gaffe was “an information security issue and an issue of cognitive warfare.”
The program was removed from public view on Oct. 9, according to the Central News Agency.
Academia Sinica President James Liao said the program had been put online as a way to crowd-source its testing, mainly to save time.
“This is a researcher who was keen to speed things up, so, in the spirit of open source software, he put out a program that hadn’t been fully tested yet and asked everyone to join in and test it with him,” Liao said. “It produced some questionable results.”
Differences in the writing systems of the mainland and Taiwan were also an obstacle for the project.
In 1956, the PRC began simplifying Chinese characters in a bid to increase literacy, and over the years, about 2,000 characters have been simplified. The characters remain unsimplified in Taiwan, and are known as “traditional Chinese” characters.
Liao said the researcher had simply converted the data for simplified Chinese into traditional Chinese for CKIP-Llama-2-7b .
He said Academia Sinica had learned from the incident that the traditional Chinese character data is important in its own right, and vowed to set up an audit mechanism to avoid similar issues in future, Central News Agency reported.
Information technology expert Kun-Lin Hsieh commented on social media: “Academia Sinica’s AI has gone off the rails!”
But he went on to explain that the chat program had been trained on an online data set provided by the Beijing-based Stardust AI and other dialogue training materials in the simplified Chinese characters, rather than in traditional characters.
Information technology news service iThome said it is crucial to have “large language model” datasets that can speak in a way that is appropriate to the locality in which they are made.
It quoted Taiwanese AI expert Tsai Ming-shun as calling on the Taiwanese government to take the opportunity to invest more resources in software development, especially large language models and data sets, to boost the development of homegrown AI.
In April, China issued a new set of rules requiring AI chat programs developed in the country to stick to the ruling Communist Party line, requiring their responses not to make “subversive” statements or deliver “content that may disrupt economic and social order.”
Translated by Luisetta Mudie. Edited by Eugene Whong.