Stanford AI team accused of copying China model

By LIA ZHU in San Francisco | chinadaily.com.cn | Updated: 2024-06-05 10:56

Share - WeChat

An artificial intelligence team from Stanford University has apologized for copying an open source large language model developed by Tsinghua University and tech firm ModelBest in China.

Two members of the Stanford team issued an apology on social media site X on Monday, formerly Twitter, while waiting for a response from another member, Mustafa Aljadery.

"We apologize to the authors of miniCBM (a misspelling for miniCPM) for any inconvenience that we caused for not doing the full diligence to verify and peer review the novelty of this work," wrote Siddharth Sharma and Aksh Garg in the post.

Sharma said he and Garg posted the project Llama3V online, and Aljadery was the person who wrote the code. "Our role here was to help him promote the model on medium and twitter," he said.

"After seeing the twitter posts about this topic yesterday (June 2), we asked Mustafa about proof of originality for Llama3V and asked for the training code but we haven't seen any response so far. We were waiting for Mustafa to take the lead but instead we are releasing our own statement," he said.

He said all references to Llama3V have been taken down and promised that they will be "cautious and diligent" in the future.

Sharma and Garg are students of computer science at Stanford, according to their webpages; Aljadery, a graduate of computer science from the University of Southern California, is based in San Diego, according to his LinkedIn page. His account at X has been set as private and his website has been deleted.

China Daily sent emails to Sharma and Garg, requesting comments, but hadn't yet received a response.

The team's act was first discovered by Chinese internet users when the team announced its project online on May 29, according to the tech news website TechNode.

Chinese firm ModelBest confirmed on June 2 that the Stanford team's large model project Llama3-V, similar to their MiniCPM, is able to recognize Tsinghua Bamboo Slips, a collection of Chinese texts dating to the Warring States period (475-221 BC) and written in ink on slips of bamboo. The slips were donated to Tsinghua University in 2008.

The Stanford team's project not only replicates the Chinese model's newly developed recognition ability for those ancient Chinese texts but also keeps the mistakes consistent, according to the report.

The development of AI is inseparable from the open source sharing of global algorithms, data and models, Liu Zhiyuan, co-founder of Model Best, told the Yical Global, an English-language Chinese news website.

However, what Stanford's Llama3-V team did seriously undermined the foundations of open source sharing, including adherence to open source protocols, trust in other contributors and respect for previous achievements, he said.

Stanford University's Honor Code defines plagiarism as using another person's original work without giving proper credit to the author or source, including ideas and code.

A recent research-misconduct scandal at Stanford involves its former president Marc Tessier-Lavigne, who resigned in August after an investigation found serious flaws in studies he had supervised going back decades.

Photos