任何关注科技行业的人都知道在软件盗版这件事情上诉讼比比皆是。然而本月针对微软(Microsoft)拥有的Github提出的一项新诉讼,挑战了支撑过去30年一些最重要的人工智能进步的基本原则。
这起诉讼由程序员兼律师Matthew Butterick领导,主要针对Github的Copilot,这是一种人工智能助手工具,可以在程序员编码时为他们提供建议的代码片段,有点像谷歌Docs或Gmail中的自动补全功能。副驾驶从开放的互联网上搜集了大量公开的代码行后,了解了该编写哪种类型的代码。在此过程中,拟议的集体诉讼指控副驾驶公然无视或删除软件工程师提交的许可证,并实际上依赖于“规模空前的软件盗版”。
“这是不公平的,不允许的,也不合理的,”诉讼写道。“相反,Copilot的目标是通过将大量开源软件保留在github控制的付费墙内来取代它们。它违反了开源程序员选择的许可证,并将他们的代码货币化,尽管GitHub承诺永远不会这样做。”
在另一篇博客文章中,Butterick认为微软对Copilot的做法创建了一个“围墙花园”,使传统开源社区的程序员更加困难。他认为,如果这种情况继续下去,开源社区将会挨饿,随着时间的推移,最终将会消亡。
Butterick的诉讼没有指控微软和Github违反版权法,而是指控Copilot违反了这些公司自己的服务条款和隐私法,并违反了联邦法律,这些法律要求公司显示他们使用的材料的版权信息。虽然这一诉讼主要针对的是副驾驶,但其论点的原则可能适用于许多其他使用类似抓取方法来开发工具的工具。
Butterick在最近的一篇博客文章中说:“如果像微软、GitHub和OpenAI这样的公司选择无视法律,他们不应该指望我们公众会坐视不管。人工智能需要对每个人都是公平的。如果不是,那么它就永远无法实现它所吹嘘的人类的目标。这只会成为少数特权阶层从多数人的劳动中获利的另一种方式。”
Github发言人在给Gizmodo的电子邮件中表示:“我们从一开始就致力于用Copilot进行负责任地创新,并将继续发展产品,以最好地服务于全球的开发人员。”到发布文章为止微软没有回应记者的置评请求。
这些对人工智能版权和报酬的担忧并不仅限于程序员。近年来,作家、音乐家和视觉艺术家都表达了这些担忧,尤其是在越来越流行和有效的生成式AI图像和视频工具(如Open AI的DALL-E和Stable Diffusion)出现之后。不像以前的人工智能训练那样,把数十亿单位的数据笨拙地塞进人工智能系统的学习集,像dal – e这样的新生成方法将从Pablo Picasso那里获取图像,然后根据用户的描述将其转换成新的东西。这种重新利用数据的行为使传统的版权思维更加复杂。像Butterick一样,越来越多的艺术家和创意作家最近公开表达了可以理解的担忧,即即将成熟的人工智能系统可能会让他们失业。
一些公司正在探索新的方法来表彰那些最终影响算法的工作人员。例如,Shutterstock上个月宣布,它将开始在其网站上直接销售DALL-E的人工智能生成的艺术作品(也是在人类身上训练的)。作为这一举措的一部分,Shuttersock表示,它将启动一个“贡献者基金”(Contributor Fund),以补偿那些使用Shutterstock图片帮助开发该技术的贡献者。Shutterstock表示,当DALL-E使用他们的作品时,它也有兴趣向贡献者支付版税。
不过,这个计划在实践中是否有效仍不确定,与微软等大型科技巨头相比,Shutterstock只是一家相对较小的公司。在整个行业范围内,关于补偿无意中训练人工智能系统的创造者的拟议标准仍然不存在。
Butterick对Copilot的不满几乎从产品一发布就开始了。在2021年6月的一篇题为《这个副驾驶很愚蠢,想杀了我》的博客文章中,律师表示,他同意其他人对该工具的描述,“主要是一个违反开源许可证的引擎”。律师们将副驾驶写代码的效率与一个12岁的孩子在一天内学会Javascript的效率进行了比较。它也不总是准确的。
“副驾驶本质上就是让你一遍又一遍地批改一个12岁孩子的家庭作业,”Butterick写道。谈到他最近的诉讼,Butterick承认这一投诉很新奇,并表示未来可能会进行修改。虽然这可能是第一次从根本上打击人工智能培训的法律努力,但这位程序员兼律师表示,他认为这是未来让人工智能创造者承担责任的重要一步。
Butterick说:“这是漫长旅程的第一步。据我们所知,这是美国第一起挑战人工智能系统训练和输出的集体诉讼案。这不会是最后一次。人工智能系统也不能免于法律的制裁。”
原文:
Anyone following the tech industry knows lawsuits at this point are a dime a dozen, however, a new entry filed this month against Microsoft owned Github challenges the fundamental foundational principles underpinning some of the most important artificial intelligence advancements in the past three decades.
The lawsuit, led by programmer and lawyer Matthew Butterick, specifically takes issues with Github’s Copilot, an AI assistant tool that offers programmers suggested snippets of code while they’re coding, sort of like the autocomplete function in Google Docs or Gmail. Copilot learned which types of lines to code after scraping huge swatches of publicly available lines of code on the open internet. During this process, the proposed class action lawsuit alleges Copilot blatantly ignores or removes licenses presented by software engineers and effectively relies on “software piracy on an unprecedented scale.”
“It is not fair, permitted, or justified,” the suit reads. “On the contrary, Copilot’s goal is to replace a huge swath of open source by taking it and keeping it inside a GitHub-controlled paywall. It violates the licenses that open-source programmers chose and monetizes their code despite GitHub’s pledge never to do so.”
In a separate blog post, Butterick argues Microsoft’s approach with Copilot creates a “walled garden” making it more difficult for programmers in traditional open source communities. If that continues, he argues, will starve open source communities and, over time, eventually kill them.
Rather than accuse Microsoft and Github of violating copyright laws, Butterick’s suit accuses Copilot of violating the companies’ own terms of service and privacy laws and of violating federal laws that require companies display the copyright information of materials they use. And while this particular suit zeroes in on Copilot in particular, the principles of the argument potentially apply to many, many other tools in place that use similar scraping methods to develop their tools.
“If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still,” Butterick said in a recent blog post. “AI needs to be fair & ethical for everyone. If it’s not, then it can never achieve its vaunted aims of elevating humanity. It will just become another way for the privileged few to profit from the work of the many.”
“We’ve been committed to innovating responsibly with Copilot from the start and will continue to evolve the product to best serve developers across the globe,” a Github spokesperson said in an email to Gizmodo.
Microsoft did not respond to a request for comment.
These concerns over AI copyright and compensation aren’t limited to programmers. Writers, musicians, and visual artists have all echoed these concerns in recent years, particularly in the wake of increasingly popular and effective generative AI image and video tools like Open AI’s DALL-E and Stable Diffusion. Unlike previous AI training which inelegantly stuffs billions of units of data into a learning set for an AI systems, newer generative approaches like DALL-E will take images from Pablo Picasso and then transform that into something new based on a users’ description. That act of repurposing the data complicates traditional copyright thinking even further. Like Butterick, a growing chorus of artists and creative writers have gone public recently expressing understandable fears the coming maturity of the AI system threatens to put them out of job.
Some companies are exploring novel ways to credit people whose work ends up influencing the algorithm. Last month for instance, Shutterstock announced it would start selling DALL-E’s AI generated art (also trained on humans) directly on its website. As part of that initiative, Shuttersock said it would launch a first-of-its-find “Contributor Fund” to compensate contributors whose Shutterstock images were used to help develop the tech. Shutterstock said it was also interested in compensating contributors with royalties when DALL-E uses their creations.
Whether or not that plan actually works in practice remains uncertain though and Shutterstock’s just one, relatively small company compared to Big Tech giants like Microsoft. Industry wide, proposed standards around compensating creators for inadvertently training AI systems remain nonexistent.
Butterick’s beef with Copilot in particular began almost as soon as the product was released. In a June, 2021 blog post titled, “This Copilot is Stupid and Wants to Kill Me” the lawyer said he agreed with others who described the tool as, “primarily an engine for violating open-source licenses.” The lawyers compared Copilot’s effectiveness at writing code to that of a 12-year-old who learned Javascript in a day. It’s also not always accurate.
“Copilot essentially tasks you with correcting a 12-year-old’s homework, over and over,” Butterick wrote.
Speaking of his recent suit, Butterick acknowledged the novelty of the complaint, and said it would likely be amended in the future. While likely the first legal effort of its kind to strike at the root of AI training, the programmer and lawyer said he believes it’s an important step to hold AI creators accountable in the future.
“This is the first step in what will be a long journey,” Butterick said. “As far as we know, this is the first class-action case in the US challenging the training and output of AI systems. It will not be the last. AI systems are not exempt from the law.”
本文由数字化转型网(www.szhzxw.cn)翻译而成,作者:Mack DeGeurin;翻译:数字化转型网郑亚茹;翻译审核:数字化转型网默然。

免责声明: 本网站(https://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。
本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。
