
回应关于文心一言的几个质疑,李厂长还给AI创业者提了几点小建议。
2023年,全世界的关注焦点,都在AI大模型的焦灼竞赛。
中国参赛选手、百度“文心一言”在研发阶段时,百度技术团队曾与ChatGPT进行对比测试,李彦宏对36氪回忆,当时“差距是40分的水平,一个月能追得上。”
可过了一个月,技术团队再次测试后,发现差距反而拉大了——AI大模型的发展速度不是线性的。
紧张追赶之后,到文心一言今年3月16日发布时,甚至“能达到它(ChatGPT)今年1月份的水平。”李彦宏对36氪说。要说文心一言和ChatGPT差距多大?“可能最多是两个月。但这两个月什么时候能追上,才是更重要的问题。”
过去一周,AI领域处于更强烈的疾风骤雨之中。百度文心一言发布会前一天,OpenAI发布了新一代GPT-4大模型;后一天,又有微软发布搭载最新GPT-4的AI助手Copilot——均是震撼业界的产品进展。
百度文心一言随之成为激烈争论的对象。带着人们围绕文心一言的诸多质疑,36氪独家访问了百度创始人兼CEO李彦宏,直接发问:为什么发布会用了Demo而不是实时演示?为什么产品在不甚完美时就发布?
这些质疑折射出国人的复杂情绪:人有我无的焦虑,民族情绪的高涨、期待与失望间的起伏……
在回应质疑之外,36氪在与李彦宏交谈中,印象更加深刻的,是他给出了关于AI行业的许多直接论断。
比如,被问到中国创业公司里会不会再出一个OpenAI?他直接回答“基本不会了”,“没有必要再重新发明一遍轮子。”
比如,“在应用层,将会出现全新的、十倍于现在微信和抖音的创业机遇。”比如“AI将会颠覆云计算市场”。
比如,AI虽然会取代人类工作,但有更多意想之外的机会。一个针对个人的提示是,不会面向AI写提示词(prompt,人与机器进行交互的指令语言)的人会被淘汰。
无论如何,我们正站在一个历史性的时点:基于AI大模型技术,可能开启一个新增长时代。就在两天前,英伟达发布了专用于大模型计算的新GPU,能让大模型处理成本下降一个数量级。“我们正处于AI的iPhone时刻。”英伟达创始人黄仁勋在会上三次激动地强调。
暂时忘却历史臧否,而是把百度视作一家在AI领域深耕十余年、花费千亿的公司,李彦宏的声音此时格外有时代意义。
以下为对话全文,经36氪编辑整理:
一、回应文心一言发布会的所有质疑
36氪:3月16日文心一言发布会之后,网上有非常多的声音,其中有祝福,也有质疑,今天我是代表质疑的声音。首先问一个小问题,今天这样一个突然的采访,会不会让你觉得有压力?
李彦宏:不会。确实像你讲的,3月16日之后,网上有各种各样的声音,我自己也确实有一些话想说。
36氪:有人说你在发布会现场比较紧张,是这样吗?
李彦宏:我真没觉得自己紧张。因为这个东西(指文心一言)是我非常熟悉的,包括那5个演示场景,基本上都是我选的,或者至少是别人给我建议、我认真看过的。
后来我也回看了发布会实况,也没觉得我在任何时候紧张。我猜测,是因为当时在台上,我看不到股价变化,所以没有被它所影响。但是很多在台下的人,包括看直播的人,能够看到一些资本市场的反应,又看不到我们真实产品是什么样子(因为当时还没有发布出来),所以会有此猜测。
36氪:在发布会中,你提到产品还没有准备好。为什么要在还没有完美的情况下发布?
李彦宏:最主要的原因,是因为有市场需求。我们有好多客户都在问,这个东西什么时候出来?我们什么时候能够用?你能不能保证我是第一批试用产品的人?不断有人在问这方面的问题。
目前整个大环境,是ChatGPT非常火,甚至被神话了。大家一定是有焦虑感的,如果我们的客户不能及早地用最先进的产品,他们也会有焦虑感。在这种情况下,我们确实希望尽早把它推出来。
从技术发展的规律来讲,这一类型的产品,确实需要有人类反馈之后,它的演进与能力提升才会更快。我们也希望它更快提升,所以必须要及早推出来。
36氪:选择在3月16日开发布会,这个日期是怎么确定的?
李彦宏:一开始我们想的是3月底,其实哪一天我觉得都是可以的。
但我在很早的时候,答应了去参加今年的亚布力论坛,亚布力是3月17日,那时我会见到很多新老朋友,包括政府领导、媒体,大家一定会问(文心一言),那个时候我们还没有发布的话,别人问起来,我真不知道该说什么了——你说得少,大家会觉得你一点信息都不透露,你不拿我当朋友;说得多,我们是上市公司,等于选择性地披露一些东西,也是不行的。
所以想来想去,决定稍微往前赶一点。为了适应3月17日亚布力论坛,就决定3月16日来开发布会。
36氪:所以碰上OpenAI发布新版本,这是偶然。
李彦宏:对,我们事先并不知道OpenAI会在那一天发GPT-4。对于我们来说实际上也没有那么重要。我们自己能看到的可提升的地方已经足够多了,先把这些东西做好就够了。
36氪:发布会现场为什么会用先做好的Demo,而不是实时展示?
李彦宏:我是希望能现场演示的,因为人机对话产品具有极强交互属性,但后来两个因素让我改变了主意。一是生成式AI每次给出的答案不一定一样,会带来不确定性。二是真正说服我的理由,是全球所有类似发布会,没有一个是现场演示的,都是录好的。如果大家都可以,那我们也OK。
36氪:文心一言发布的产品有五个场景,包括文学创作、商业文案创作、数理逻辑推算、中文理解、多模态生成,为什么是这样的五个场景?
李彦宏:这是很好的问题。我们选择的逻辑是这样:文心一言对标ChatGPT,所以大部分ChatGPT有的功能,我们也要有。
但同时,我们毕竟植根于中国,所以,我们的对话型产品,一定要体现我们对于中文、对中国文化更好的理解。我们确实有一些ChatGPT没有的东西,希望在发布会展示给大家。
所以,前三个场景是对标ChatGPT已有的功能,我希望大家能够感受到我们的东西不差。比如第一个例子是,三体的作者是哪里人?我在ChatGPT里面试过很多次,它答的都是错的,每次生成的答案都不一样,有时候说甘肃天水人,有时候说山西吕梁人,答案非常随机。所以,我第一个例子用了那个例子,但是前三个例子的那些能力,大家已经看过了,在ChatGPT里面那些能力都是有的。
到第四个例子,文心一言对于中文的理解,或者对于中国文化的理解,确实是更加到位一些。我们综合了一些知识增强,检索增强等能力,对“洛阳纸有多贵”“刘慈欣的籍贯”这类事实性问题,文心一言能够理解,并且作出正确的回答,准确率更高一些。
第五个例子,是多模态能力的展示。有一个四川话,有一个文生图,一个文生视频,这代表了百度过去十几年在AI上综合能力的积累。
做这五个例子的时候,我给团队提了个要求,我希望产品发出去之后,能让大家玩起来。第一个就是刚才讲的四川话,我们有语音合成的能力,也对中国情况更了解。所以,我希望当用户提问的时候,不管是提什么问题,我们都能够用语音合成,并且能支持用各种各样的方言说出来,不管是四川话还是广东话。我希望大家觉得很有意思,喜欢去玩这些东西。
第二个要求,当用户的问题本身存在事实性错误时,我们能够辨别,比如“二战期间苏联为什么轰炸波兰?”其实苏联没有轰炸波兰,是德国轰炸波兰。我希望文心一言能够辨别用户问题当中是有错误的,并且告诉你说,你刚才说的不对,我告诉你正确答案是什么。
所以当用户有这样的问题,或者故意进行错误引导的时候,如果产品能够辨别,用户会觉得你很聪明。
36氪:有人说这是为调皮的人类而准备的。
李彦宏:能够给大家多带来一些欢乐的话,何乐而不为?
36氪:提及ChatGPT,别人一定会把它跟文心一言对比,你觉得哪个更领先?假如ChatGPT更领先的话,你觉得它领先文心一言几年?
李彦宏:这个问题应该这样说,ChatGPT发布是去年11月30日,我们现在已经发布了,也就是说不可能被领先几年。
但是科学地去评比,文心一言到底是处于去年ChatGPT 11月30日的水平,还是12月30日的水平?这个我们没有特别严谨的方法评测,我们自己可以保留(产品状态),但是ChatGPT当时的状况我们已经保留不了了。
但是我可以给你讲一个我们内部开发的过程。第一版产品出来的时候,我们和当时的ChatGPT做了一个对比,和它大概差40分。
36氪:这个对比怎么做的?
李彦宏:我们的对话式人工智能大语言模型应该具备的各种各样的能力,每一项能力去挑了提示词(prompt)。
36氪:满分是100分,差了40分?
李彦宏:对。当时我们能看到的提升空间远不止40分,所以我们觉得说一个月之内肯定追上它。但是过了一个月,我们又做了一次评测,发现这个差距不仅没有缩小,而且拉大了。
所以我们当时很紧张,说这个东西越做,跟人家差得越远了,但后来发现,其实ChatGPT那种升级不是匀速的升级,虽然提升很快,但是它有自己的发展规律。
而百度这种一版一版的迭代方式,升级速度是非常非常快的。等到我们敢说3月16日开发布会的时候,我们觉得就是至少可以达到它去年11月30日的水平,甚至说按照理性判断的话,应该达到了ChatGPT今年1月份的水平。所以,那个时候我们才敢出去发的。
尤其当你去测试ChatGPT比较擅长的能力(英文、编程等),会发现差距很大,那是因为ChatGPT也发生了很大的变化。我们发布会前一天,OpenAI上了GPT-4,和GPT-3.5也是不一样的。
所以你要说我们和ChatGPT差距多大?我觉得可能最多是两个月,但是这两个月什么时候能追上,才是更重要的问题。
36氪:可以说,文心一言在两个月后能达到ChatGPT的水平吗?
李彦宏:远远不够,因为人家也在进步。百度进步的速度要比它快,有一天不仅要追上它,还要超过它。
刚才我们讲的文生图能力,百度的能力打磨比较久,大家玩起来挺嗨。GPT4本身没有文生图能力,站在另一个角度比较的话,ChatGPT落后百度,文心一言早就有这个能力了。
早在文心一言发布之前,大家用文心一格(指代百度基于文心大模型的文生图系统)就能体验这个能力,这是我们做得好的地方。ChatGPT发布的时候,大家都说它跨时代、震撼发布等,它发布的理解图片能力,不是文生图,只是输入图片告诉你这个图片是什么。
客观比较下,我们有我们的长处,我们也很有信心在综合能力上,能够迅速追上甚至超过。
刚才我们讲的文生图的能力,百度的能力打磨比较久的时间,我们目前就是大家玩起来挺嗨,但是ChatGPT本身没有文生图的能力,站在另一个角度比较的话,ChatGPT落后百度,文心一言早就有这个能力了。
ChatGPT4发布的时候,大家都说它跨时代震撼发布之类的。其实它发布的所谓理解图片的能力,不是文生图,输入图片告诉你这个图片是什么。我们搜的只是官网上的能力,没有人体验过。
36氪:相比ChatGP的调用成本,百度的成本是更高还是更低?大概是多少?
李彦宏:成本比较类似。但是这个东西不重要,重要是我们可以通过端到端的优化,让这个成本迅速下降。
36氪:比如使用的时候,价格会是ChatGPT的百分之多少?
李彦宏:会稍微便宜一点。
36氪:现在百度已经为文心一言投入了多少,还会继续投入多少?
李彦宏:这很难划分清楚。例如,我们对于大语言模型的投入算不算在内?可能有些投入是去做了辨别式的东西,比如去优化搜索等等,有些是生成式的。
如果单讲生成式AI可能是十亿、几十亿,未来投入会更大。如果是整个这四层(指应用层、模型层、框架层、芯片层),因为需要四层端到端的优化,大语言模型才能够有竞争力,芯片、框架、等等这些都加起来的话,十年投了上千亿元。
如果没有那些投入,根本就不可能出现文心一言这个模型。
二、中国基本不会再出一个OpenAI
36氪:我看到你自己发的百家号视频,说百度是在全球大厂中第一个发布类ChatGPT产品的,领先于微软,因为微软调用的是OpenAI的接口,Meta、Google没有发布真正同类型的产品,为什么这么说?
李彦宏:人工智能如果按语言模型来分类,一种叫辨别式AI,典型应用是搜索。搜索就是根据你提的需求,看一个个网页跟你的需求匹配不匹配,主要是在辨别;另一种是ChatGPT,也就是生成式AI产品,你提一个提示词,它根据提示词发挥,甚至发挥错了都有可能,这个方向早期并不被大厂看好,积累也没有特别深厚。
相比下来,百度在语言模型方面,积累还是不错的,我们在AI上连续十几年投入,第一版的语言模型,文心大模型2019年就发布了。过去一年半时间,我们一直很看好生成式AI,有不错的投入。所以当发现这有大机会的时候,我们迅速增加资源,把它做出来。
在这过程中,其他大厂像Google、亚马逊、Facebook,你说它们重视不重视?现在肯定很重视。想不想做出这么一个东西来?肯定很想。听到这就很容易理解,为什么我会说百度是大厂当中第一个做出来的。
36氪:很多人都在准备做类似OpenAI的创业,比如李开复、王小川、王慧文,你对他们有什么建议?
李彦宏:很多人也在问我,中国会不会再出一个OpenAI?基本不会了。OpenAI诞生是因为美国大厂都不看好这个方向,但现在中国的大厂都看好AI大模型,都在做这个方向。创业公司重新做一个ChatGPT其实没有多大意义。
我觉得建议有两点。第一,创业公司的特点是方向可以不停改变,船小好调头,在市场状况发生变化时迅速调整战略,公司成立时想做的事和后来做的事可以不完全匹配。
第二,我觉得基于这种大语言模型开发应用机会很大,没有必要再重新发明一遍轮子,有了轮子之后,做汽车、飞机,价值可能比轮子大多了。
36氪:现在大家都在担心大模型的算力的问题,包括芯片,百度会怎么解决算力问题?
李彦宏:其实算力是很笼统的说法,你的CPU怎么样、GPU怎么样,芯片跟框架匹配程度怎么样,框架跟模型的匹配程度,里面可提升的空间都是很大的。
百度在四层架构都有领先的产品。芯片本身不管是摩尔定律,或者GPU的发展速度也非常快,框架也不能叫成熟,我们的工程师还在日以继夜优化框架。
模型更新得就更快了,一天能够上线三次升级,这种速度一定会使效率越来越高。将来制约整个大语言模型发展的很可能不是算力。今天我们看到算力很紧张,将来你可能发现算法突然变了,不是这个算法了,(制约发展的)可能就是另外一套东西。
36氪:在整个生态圈,芯片层、框架层、模型层、应用层,你觉得最大的创业机会在哪?
李彦宏:在应用层。回看移动互联网时代,今天特别成功的微信、抖音、淘宝,都是应用。操作系统其实没几个,一个IOS,一个安卓,仅此而已。很难讲安卓的价值比微信、抖音大。在应用这个层面,机会很多,能够创造的价值也非常大。
36氪:因为在搜索上市占率较低,微软可以热衷于做这件事,但百度做生成式搜索引擎会颠覆自己的商业模式。你怎么看待这个观点?
李彦宏:我确实听到了这种说法,但我觉得离我们真实的想法差得太远了。
首先我觉得不会颠覆,你用的其实还是百度搜索。当这个能力(文心一言)被赋予到百度搜索上的时候,几乎不会改变你的使用方法。你不需要重新再花很多精力去学百度搜索怎么用,只是答案更精准了,过去篇幅比较小的答案,现在可以更加详实、更加生动、更加活泼,它扩大了搜索的边界。
我们并不是整个移动互联网,只是整个移动互联网的很小一部分。干的事变多之后,会有越来越多的用户从别的APP转到百度。从这个角度来讲,我无论如何都希望百度APP被文心一言所赋能,它能够颠覆现在的百度搜索,我无比渴望发生这样的事情。
但是,这只是整个故事的极小一部分,更大的故事其实是在云计算。因为刚才也讲到四层架构,就是芯片层发生了变化,从CPU到GPU,在框架层也发生变化。
框架层我多讲两句。百度在全球大厂当中最先做出来是有道理的,我们在芯片层、框架层、模型层、应用层都有自己的布局,全球大厂在这四层上都有领先产品的是没有的。
看这四层架构的话,我们在每一层都可以说是提早了很多年来布局,当你这些能力都很完整的时候,未来人们再去开发应用,基于百度智能云是最方便的。这个机会要远远大于一个百度搜索颠覆自己的机会,我真正兴奋的、包括我3月16日发布会上主要讲的都是这个机会。
36氪:可今天我去搜索百度的东西,当中有15个搜索结果,其中有两三个是广告,简单说这是百度的商业模式。未来如果文心一言只给出一个答案,这怎么商业化?
李彦宏:这个商业化的可能性有很多。今天如果看ChatGPT的话,它的做法实际是付费使用,买一个会员,一个月多少钱,它靠订阅来养活自己,就不需要广告。
我从来没有觉得商业模式是一个问题。如果我们给大家(不管是个人还是企业)提供了价值,他们就会通过市场机制来回报我们。
36氪:你自己提到很多,在美国生态当中,大公司和大公司、大公司和创业公司之间似乎更紧密,在中国是不是不一样?大公司不会用彼此的产品。
李彦宏:Google可以索引几乎所有互联网的内容,但是中国的话每一个APP都有自己的这种独立的生态,也有一定的壁垒。所以用户在获得信息上确实没有那么方便,。
36氪:美国五大互联网巨头(FANNG),似乎是一个联盟。中国好像不是这样。
李彦宏:联盟也说不上,我觉得美国人的这个思维方式也竞争,比如微软和Google就是竞争。但是它的思维方式是说,如果你已经很领先了,我最好别做一个跟你一样的东西,我如果通过创新做一个跟你不一样的东西来竞争,那才是我的本事。
在中国,大家的思维方式更像是这条路你跑通了,我也跑同样的路,咱看看谁跑的快。
36氪:文心一言会接入哪些产品,以及对百度的业务矩阵有哪些机会和变化?
李彦宏:这些天我们各个部门也都在开会讨论,文心一言是一个通用能力,它跟各个产品的结合都是很自然的,跟搜索、贴吧、文库、百度健康、小度等等。几乎公司里面每个产品拿出来,包括百度书法,都能自然地想到和它怎么结合,让产品变得用户体验更好,更加powerful。
36氪:百度会是自己的第一批用户。
李彦宏:是的。
三、云计算是Game Changer,应用层有十倍机会
36氪:你在财报会上提到,文心一言会是改变云计算的Game Changer,为什么?
李彦宏:过去,云计算主要卖算力,算力就是你的计算速度有多快,存储能力有多少。两三年前人们的理解是,基于一个AI开发框架去开发应用就好了。
今天就会看到,我们完全没有必要基于芯片层、框架层来开发应用——基于大模型就好了,又快,成本又低。
那么以后当人们在购买云计算服务的时候,他会看你的模型好不好,而不是你底层的算力怎么样,存储怎么样,所以,AI大模型会颠覆整个云计算市场。
我认为微软其实也是这样想的。大家都知道,ChatGPT和微软的产品真正结合,是非常厉害的。这是很多云厂商非常害怕的事情。
36氪:百度云会不会因为文心一言的使用,会成为中国领先的云供应商?
李彦宏:我对此充满信心,一旦大家明白过来,你以后选择云服务的时候应该看什么,或者其实也不需要它自己去琢磨。你有你的需求,你有你的客户需要去维护,什么样的方式能够让你更好地维护你的客户,收入更多,利润率更高,那他一定会从这些角度推演到:用百度的智能云是更好的选择。
36氪:未来十年内,会不会出现诞生新的微信,新的抖音?
李彦宏:肯定的。不仅仅是诞生这些,我觉得诞生10倍于这些app价值的机会,完全是存在的。
36氪:今天无数人在提问:AI会不会让打工人没有工作?OpenAI的创始人Sam Altman说,大量的人一定会失业,所以OpenAI会按需收费,给没有工作的人补贴。这其实是有点悲凉的,他自己说也有点可怕,你觉得这个事情会发生吗?这个问题你怎么看。
李彦宏:今天没有马车夫这样的工作,因为出现了汽车。但是今天世界上存在的工作机会跟100年前相比,不仅是多了,而且多了很多倍。
我没有那么悲观,我是乐观的。不管有多少工作被替代,这只是整个图景的一部分,另外一部分是,存在我们现在甚至无法想象的更多新机会。我做一个大胆预测,十年以后,全世界有50%工作会是提示词工程(prompt engineering),不会写提示词(prompt)的人会被淘汰。
36氪:十年以后百度会变成什么样子?
李彦宏:我不想去预测那个时候百度收入是多少,员工数是多少,这些对我来说没有那么重要。
百度的使命就是用科技让复杂的世界更简单,这个事情现在成立,十年之后还会成立。
我希望十年之后回望今天,会发现每一个普通人的生活、整个社会会被百度改变。我们起了什么正面作用?我们有什么东西没有做到位?我们什么东西可以做得更好?我更希望琢磨这样的事情。
36氪:有个词叫“智慧涌现”(Emergent Intelligence)。我个人持有这样的观点:人没有什么特殊的,不管是引以为傲的创意也好,感情也好,像一个个计算单元,计算单元多了,就会产生出情感、创意。你觉得会是这样吗?
李彦宏:我觉得不会是这样,我恰恰觉得人是唯一。在如此浩瀚的宇宙,目前没有任何迹象证明有外星人,足见在这个宇宙当中人有多么的独特。今天我们说想用机器替代人,想复刻一个人,我认为这是不现实的。
我觉得完全没有必要复刻人,机器有很多事干的比人好太多了,没必要去跟机器比。应该充分发挥它擅长的地方,我们甚至没有必要让它发展出来情感这类东西,即使发展出来了,可能也是另外一种东西,和我们(人类)的情感还是不一样的。
36氪:十年后,AGI(通用人工智能)是不是已经实现了?
李彦宏:这是一个开放性的问题,我没有答案。如果非要让我去做一个二选一的话,我认为(十年后通用人工智能)没有实现。
“强人工智能”(Artificial General Intelligence),意思是它和人的能力是一样的。这个词对应的是“弱人工智能”和“超人工智能”。这种划分方法我不是很认可。
我认为机器比人强的地方有很多,比人弱的地方也有很多。我们完全没有必要把自己框在“一定要让机器和人一样”里面,我认为这个方向就是不对的。
智能涌现也好,机器越来越聪明也好,我都很看好,我唯一不觉得我们的努力方向是要把机器变成人,完全没有必要。
36氪:对于未来十年,你觉得确定无疑的一个未来是什么?
李彦宏:技术进步的速度越来越快,生产效率提升会越来越明显,这是过去很多年的发展之中不断证明的。如果过去大家可能一周工作5天,也许10年之后工作3天就足够了。劳动生产率的提升会让人们生活的幸福感更强。
我自己也因为这件事情很兴奋,因为我能够参与其中并且贡献一些东西。
36氪:这是一个趋势。如果作为结果,什么样的结果是确定无疑的?十年以后人类压力更大还是更幸福?
李彦宏:一定更幸福。
翻译:
In response to a few questions about Wen Xin’s words, Li also gave AI entrepreneurs a few tips.
In 2023, the focus of the world’s attention is on the AI grand model of the anxious race.
During the research and development phase of Baidu’s “Wenxin Yi”, a Chinese contestant, Baidu’s technical team conducted a comparison test with ChatGPT. Li Yanhong recalled to 36 kr that at that time, “the gap was 40 points, and we could catch up in one month.”
A month later, however, the technical team retested it and found that the gap had widened — the pace of development of the large AI model was not linear.
After playing catch-up, by the time Wenxinyi was released on March 16, it “could even reach the level of ChatGPT in January.” Robin Li told 36kr. How far is it from ChatGPT? “Probably two months at the most. But when those two months catch up is the more important question.”
Over the past week, the AI field has been in an even more intense storm. The day before the conference, OpenAI released a new generation of GPT-4 models; The day after that, Microsoft announced Copilot, its new GPT-4 AI assistant — a product development that shook the industry.
Baidu Wenxin word then became the object of fierce debate. With many people questioning Wenxin’s Word, 36kr spoke exclusively with Baidu founder and CEO Robin Li and asked directly: Why is there a Demo instead of a live demo? Why launch a product when it’s less than perfect?
These doubts reflect the mixed emotions of expatriates: the anxiety of others without others, the surge of national sentiment, the ebb and flow of expectations and disappointments…
In addition to responding to the questions, 36kr was more impressed by the many direct assertions he made about the AI industry when talking to Robin Li.
For example, when asked if there will ever be another OpenAI from a Chinese startup? “Hardly,” he said. “There’s no need to reinvent the wheel.”
For example, “In the application layer, there will be new entrepreneurial opportunities, 10 times greater than wechat and Douyin.” Like “AI will disrupt the cloud computing market.”
For example, AI will replace human jobs, but there are more unexpected opportunities. One personal tip is that people who don’t write prompt words for the AI — the instruction language that humans use to interact with machines — will be eliminated.
In any case, we are at a historic point in time when a new era of growth based on big model AI technology could begin. Just two days ago, Nvidia announced a new GPU dedicated to large model computing that will reduce the cost of large model processing by an order of magnitude. “We’re in an AI iPhone moment.” Jen-hsun Huang, Nvidia’s founder, spoke excitedly three times at the meeting.
Li’s voice is particularly relevant at a time when we forget the historical sympathy and look at Baidu as a company that has spent over 100 billion dollars on AI for more than a decade.
The following is the full text of the conversation, edited by 36 kr:
First, to respond to all questions of Wenxin Word press conference
36kr: After Wenshim’s speech on March 16th, there were a lot of voices on the Internet. There were some blessings and some doubts. Today I am the voice of doubts. First of all, a quick question. Does such a sudden interview today stress you out?
Robin Li: No. Indeed, as you said, after March 16, there were various voices on the Internet, and I did have something to say myself.
36kr: Some people say you’re a little nervous at the press conference. Is that true?
Robin Li: I really don’t think I’m nervous. Because this is something that I am very familiar with, including the five demo scenes, which are basically chosen by me, or at least have been suggested by others and I have looked at carefully.
I watched the press conference later, and I don’t think I was nervous at any point. I guess because I was on stage, I didn’t see the stock price move, so I wasn’t affected by it. But many people in the audience, including those watching the live broadcast, could see some of the reaction of the capital market, but could not see what our real product was like (because it had not been released at that time), so there was this speculation.
36kr: At the launch, you mentioned that the product wasn’t ready yet. Why release it when it’s not perfect yet?
Robin Li: The most important reason is that there is market demand. We have a lot of customers asking, when is this going to come out? When can we use it? Can you assure me that I will be among the first to try out the product? People keep asking questions about it.
The big picture right now is that ChatGPT is very popular, even mythological. People must be anxious, and if our customers don’t get the most advanced products early enough, they will be anxious, too. In this case, we really want to get it out sooner rather than later.
According to the law of technological development, this type of product really needs human feedback to accelerate its evolution and capability improvement. We also wanted it to be faster, so we had to get it out early.
36kr: March 16th was chosen for the launch. How did that date come about?
Robin Li: At first we thought the end of March, but I think any day is OK.
At that time, I will meet many old and new friends, including government leaders and the media. People will certainly ask me (Wen Xin’s word). If we haven’t released it at that time, when people ask me, I really don’t know what to say. People will think you don’t give any information, you don’t take me as a friend; To say more, we are a listed company, equal to selective disclosure of some things, also not good.
So after thinking about it, I decided to move forward a little bit. In order to adapt to the Yabuli Forum on March 17, we decided to open a press conference on March 16.
36kr: So it was a fluke when OpenAI released a new version.
Robin Li: Yes, we didn’t know in advance that OpenAI would release GPT-4 on that date. It’s actually not that important to us. There are enough things we can see for ourselves to improve. It is enough to do them well first.
36kr: Why do you use a pre-made Demo instead of a live demo?
Robin Li: I hoped to demonstrate it live, because human-computer dialogue products are highly interactive, but later two factors made me change my mind. First, generative AI may not give the same answer every time, which will bring uncertainty. The second thing that really convinced me was that all these conferences around the world, none of them were live demos, they were all recorded. If everybody can do it, we can do it.
36kr: The product released by Wenxin Word has five scenarios, including literary creation, commercial copy creation, mathematical logic calculation, Chinese understanding, multimodal generation. Why are these five scenarios?
Robin Li: That’s a good question. The logic of our choice is this: we have to have most of the features that ChatGPT has.
But at the same time, we are rooted in China, so our conversational products must reflect our better understanding of Chinese language and Chinese culture. We do have some things that ChatGPT doesn’t have that we’d like to show you at the launch.
So, the first three scenarios are a reference to what ChatGPT already has, and I hope you’ll get the sense that we’ve got the right stuff. For example, the first example is, where is the author of the Three-body problem? I have interviewed ChatGPT many times and it always gave wrong answers. Each time it produced different answers. Sometimes it said people from Tianshui, Gansu Province, and sometimes it said people from Luliang, Shanxi Province, the answers were very random. So, I used that example for the first example, but the capabilities of the first three examples, which you’ve already seen, are all available in ChatGPT.
In the fourth example, Wen Xin Yi’s understanding of Chinese or Chinese culture is indeed more accurate. We integrated some knowledge enhancement, retrieval enhancement and other abilities, for “how expensive Luoyang paper” and “Liu Cixin’s native place” such as factual questions, Wenxin word can understand, and make the correct answer, the accuracy is higher.
The fifth example is a demonstration of multimodal capabilities. There is a Sichuan dialect, a Vincennes picture and a Vincennes video, which represents the accumulation of Baidu’s comprehensive capabilities in AI over the past decade.
While doing these five examples, I made a request to the team that I wanted the product to go out and get people to play with it.
The first is Sichuanese. We have the ability to synthesize speech and we have a better understanding of China. So, I hope that when users ask questions, no matter what kind of questions they ask, we will be able to use voice synthesis and support speaking in a variety of dialects, whether it’s Szechuan or Cantonese. I hope people find it fun and enjoy playing with it.
The second requirement is to be able to tell when the user’s question itself is factually incorrect, such as “Why did the Soviet Union bomb Poland during World War II?” The Soviets didn’t bomb Poland. Germany bombed Poland. I hope Wenxin Yi can identify the mistakes in the user’s questions and tell you that what you just said is wrong, and I will tell you what the correct answer is.
So when the user has a problem like this, or intentionally misleads, if the product can tell, the user thinks you’re smart.
36kr: Some say it’s for mischievous humans.
Robin Li: If you can bring us more joy, why not?
36kr: When it comes to ChatGPT, people are going to compare it to Wenxin’s Word. Which one do you think is better? If ChatGPT is ahead, how many years do you think it is ahead of Wenxin?
Robin Li: The question should be put like this. ChatGPT was released on November 30 last year. We have released it now, which means it is impossible to be ahead by a few years.
But scientifically, is Wenxinyi at the level of ChatGPT on November 30 or December 30 last year? We don’t have a rigorous way to evaluate this. We can keep it for ourselves, but we can’t keep the condition of ChatGPT.
But I can tell you about a process that we developed internally. When the first version came out, we did a comparison with ChatGPT at the time, and it was about 40 points off.
36kr: How is this comparison made?
Robin Li: Our large language model of conversational AI should have a variety of capabilities, each of which is prompt picking.
36 Kr: A difference of 40 points out of 100?
Robin Li: Yes. We were looking at a lot more than 40 points, so we thought we’d catch it in a month. But a month later, we did another review and found that the gap had not narrowed, it had widened.
So we were very nervous and said that the more we made this thing, it was getting worse and worse. But later we realized that ChatGPT was not a constant upgrade. Although it was a rapid upgrade, it developed in its own way.
And Baidu this version of iteration, upgrade speed is very, very fast. By the time we dare say March 16, we think we can at least reach the level it reached on November 30, or even, rationally, where ChatGPT was in January. So, that’s when we went out.
Especially when you test ChatGPT’s strengths (English, programming, etc.), there is a big gap because ChatGPT has changed a lot. The day before our launch, OpenAI launched GPT-4, which is also different from GPT-3.5.
So how far are we from ChatGPT? I think it could be two months at most, but when those two months catch up is the more important question.
36kr: Can we say that Munshiyu will reach the level of ChatGPT in two months?
Robin Li: Far from enough, because they are also making progress. Baidu progress faster than it, one day not only to catch up with it, but also exceed it.
Just now we talk about the ability of Vincennes chart, Baidu’s ability to polish for a long time, we play quite hi. GPT4 itself does not have the ability to Vincennes chart, standing on another point of view comparison, ChatGPT lags behind Baidu, Wenxin Word already had this ability.
Long before the release of Wenxin Word, we can use Wenxin one grid (referring to Baidu based on Wenxin large model Wensheng diagram system) to experience this ability, this is what we do well. When ChatGPT launched, people were talking about it being epochal, it was a shock release, it was an ability to understand a picture, not a Vincennes picture, just input the picture and tell you what the picture was.
Objectively speaking, we have our strengths, and we are confident that we can quickly catch up with or even surpass our comprehensive capabilities.
We just talked about the ability of Vincennes chart, the ability of Baidu polishing for a long time, we are now playing quite high, but ChatGPT itself does not have the ability of Vincennes chart, from another perspective compared, ChatGPT lags behind Baidu, Wenxin Yibao has long had this ability.
When ChatGPT4 came out, everyone said it was a generational shock release or something. In fact, the ability to understand a picture, as it’s called, is not a Vincennes picture, but it tells you what the picture is. We only searched for the capabilities on the website, no one had ever experienced them.
36kr: Does Baidu cost more or less than ChatGP? What’s the approximate number?
Robin Li: The cost is similar. But that’s not important. What’s important is that we can get this cost down quickly through end-to-end optimization.
36kr: For example, what percent of ChatGPT will it cost?
Robin Li: It will be a little cheaper.
36kr: How much has Baidu invested in Wenxin Yi and how much will it continue to invest?
Robin Li: It’s hard to compartmentalize. For example, does our investment in large language models count? Maybe some of the investment is in discriminative things, like optimizing search and so on, and some of it is generative.
If we just talk about generative AI, it could be billions, billions, and it will be even bigger in the future. If it is the whole four layers (application layer, model layer, framework layer, chip layer), because it needs four layers of end-to-end optimization, the large language model can be competitive, chip, framework, and so on all together, it will cost hundreds of billions of yuan in ten years.
Without that input, the Wenxinyi model would not have been possible.
Second, China will not produce another OpenAI
36kr: I saw your own video that Baidu was the first big company in the world to release ChatgPt-like products, ahead of Microsoft because Microsoft is using the interface of OpenAI. Meta and Google have not released real products of the same type. Why?
Robin Li: If artificial intelligence is classified according to language model, one is called discriminative AI, and the typical application is search. Search is according to your needs, see a web page with your needs to match is not matching, mainly in the discrimination; The other is ChatGPT, which is a generative AI product. When you give a prompt word, it will play according to the prompt word, even if the play is wrong. This direction was not favored by big factories in the early stage, and the accumulation was not particularly profound.
In comparison, Baidu’s accumulation of language models is good. We have invested in AI for more than ten years. The first version of the language model, Wenxin Grand Model, was released in 2019. We’ve been very bullish on generative AI over the last year and a half. We’ve had a good investment. So when we saw that there was a big opportunity, we quickly ramped up our resources to make it happen.
In this process, other big companies like Google, Amazon, Facebook, do you think they pay attention to it? It must be very important now. Do you want to build something like this? I’d love to. Hearing this, it is easy to understand why I would say that Baidu is the first big factory to make it.
36kr: There are a lot of people like Kai-fu Lee, Xiaochuan Wang, Huiwen Wang who are preparing to do OpenAI-like startups. What advice would you give them?
Robin Li: Many people also ask me, will there be another OpenAI in China? Hardly. OpenAI was born because the big factories in the United States did not believe in this direction, but now the big factories in China believe in the big model of AI and are doing this direction. It doesn’t make much sense for a startup to reinvent a ChatGPT.
I think there are two suggestions. First, the characteristics of a startup are that it can change direction constantly, turn around easily when the ship is small, adjust its strategy quickly when the market situation changes, and what the company wanted to do at the beginning of the company and what it will do later can not be exactly matched.
Second, I think there is a great opportunity to develop applications based on such a large language model. There is no need to reinvent the wheel. After having wheels, making cars and airplanes may be much more valuable than wheels.
36kR: Now everyone is worried about the power of large models, including chips. How will Baidu solve the power problem?
Robin Li: Actually, computing power is a general term. How your CPU and GPU are, how well the chip matches the framework, and how well the framework matches the model, there is a lot of room for improvement.
Baidu has leading products in all four tiers. The chip itself, regardless of Moore’s law, or the speed of GPU development is also very fast, the framework is not mature, our engineers are still working day and night to optimize the framework.
Models are updated much faster, with the ability to go online three times a day, which is bound to increase efficiency. It may not be computational power that constrains the development of the entire large language model in the future. Today we see that the power is tight, and in the future you might find that the algorithm suddenly changes, it’s not this algorithm, it’s something else.
36kr: In the whole ecosystem, at the chip level, at the framework level, at the model level, at the application level, where do you see the biggest entrepreneurial opportunities?
Robin Li: At the application level. Looking back at the era of mobile Internet, today’s particularly successful wechat, Douyin, Taobao, are all applications. There are really few operating systems, one IOS, one Android, that’s all. It’s hard to argue that Android is more valuable than wechat and Douyin. At the application level, the opportunities are huge and the value that can be created is huge.
36kr: Microsoft could be keen to do this because of its low share of search listings, but Baidu doing generative search would disrupt its business model. What do you think of this view?
Robin Li: I did hear that, but I think it’s far from what we really think.
First of all I think will not overturn, you use in fact or Baidu search. When this ability (Wen Xin Yi) is given to Baidu search, it hardly changes the way you use it. You don’t need to spend a lot of energy to learn how to use Baidu search, but the answer is more accurate, in the past is small answer, now can be more detailed, more vivid, more lively, it expands the boundaries of search.
We are not the whole mobile Internet, just a very small part of the whole mobile Internet. As more work is done, more and more users will switch to Baidu from other apps. From this point of view, I hope that Baidu APP will be enabled by Wenxinyi no matter what. It can overturn the current Baidu search, and I am very eager for such a thing to happen.
But that’s only a tiny part of the story.
The bigger story is in cloud computing. Because we just talked about four-tier architecture, that is, the chip layer has changed, from the CPU to the GPU, and the framework layer has also changed.
Let me talk a little bit more about the frame layer. It makes sense for Baidu to be the first in the global big factory. We have our own layout in the chip layer, framework layer, model layer and application layer. The global big factory has leading products on these four layers.
Look at the four layers of architecture, we can say in each layer is many years in advance of the layout, when you have these capabilities are very complete, the future people to develop applications, based on Baidu intelligent cloud is the most convenient. This opportunity is much bigger than an opportunity for Baidu to disrupt itself, and that’s what I’m really excited about, including at my presentation on March 16.
36kr: But today when I searched for something on Baidu, there were 15 results, and two or three of them were ads, which is basically Baidu’s business model. If Wenxin only gives one answer every word in the future, how can it be commercialized?
Robin Li: There are many possibilities for commercialization. If you look at ChatGPT today, what it does is it actually pays to use, to buy a membership, how much a month, it feeds itself on subscriptions, it doesn’t need advertising.
I never thought the business model was an issue. If we provide value to people, whether individuals or businesses, they will reward us through the market mechanism.
36kR: You mentioned a lot yourself that in the American ecosystem, there seems to be a closer relationship between big companies and big companies and big companies and startups. Is that different in China? Big companies don’t use each other’s products.
Robin Li: Google can index almost all the content on the Internet, but in China, every APP has its own independent ecology and certain barriers. So it’s really not that convenient for users to get information.
36kr: The five biggest Internet companies in the United States (FANNG), seems to be an alliance. That doesn’t seem to be true in China.
Robin Li: I can’t say alliance. I think the American way of thinking is also competition. For example, Microsoft and Google are competition. But the way it thinks is, if you’re already ahead, I’d better not make something like you. If I compete by innovating and making something different, that’s what I can do.
In China, the mentality is more like you run this road, I run the same road, let’s see who can run faster.
36kr: What products will Wenxin Yihao access, and what are the opportunities and changes to Baidu’s business matrix?
Robin Li: These days, our various departments are also having meetings and discussions. Wenxin Word is a universal capability, and it is very natural to combine it with various products, such as search, Tieba, Wenku, Baidu Health, Xiaonu and so on. Almost every product in the company, including Baidu calligraphy, can naturally think of how to combine with it to make the product better user experience and more powerful.
36kr: Baidu will be one of its first users.
Robin Li: Yes.
Third, cloud computing is Game Changer, and the application layer has ten times the opportunity
36kr: You mentioned on the earnings call that Wenxinyi could be a Game Changer for cloud computing. Why?
Robin Li: In the past, cloud computing mainly sells computing power. Computing power is how fast you compute and how much storage capacity you have. Two or three years ago, the understanding was that it was fine to build apps based on an AI development framework.
As we’ll see today, there’s no need to build applications on a chip layer, a framework layer — it’s just a big model, it’s fast, it’s cheap.
Then in the future, when people buy cloud computing services, they will look at your model is good, rather than your underlying computing power, storage, so, AI large model will overturn the whole cloud computing market.
I think Microsoft actually thinks the same way. As you all know, ChatGPT and Microsoft products are really awesome. This is something that a lot of cloud vendors are afraid of.
36kr: Will Baidu Cloud become the leading cloud provider in China due to the use of Wenxin Yi?
Robin Li: I have full confidence in this. Once people understand what you should look at when choosing cloud services in the future, or it doesn’t need to figure it out itself. You have your needs, you have your customers need to maintain, what kind of way can let you better maintain your customers, more income, higher profit margin, then he will infer from these angles to: Baidu’s intelligent cloud is a better choice.
36 Kr: Will there be new wechat, new Douyin in the next ten years?
Robin Li: Definitely. It’s not just these, I think there’s an opportunity to create 10 times the value of these apps.
36kr: A lot of people are asking today: Will AI put workers out of work? Sam Altman, OpenAI’s founder, says that large numbers of people are bound to lose their jobs, so OpenAI will charge on demand and give subsidies to those without jobs. It’s kind of sad, and he said it’s kind of scary. Do you think it’s going to happen? What do you think about this question?
Robin Li: Today, there is no such job as a coachman, because there are cars. But there are not only more jobs in the world today than there were 100 years ago, there are many times more.
I’m not that pessimistic. I’m optimistic. No matter how many jobs are displaced, that’s only part of the picture. The other part is that there are more new opportunities than we can even imagine right now. I make a bold prediction that in 10 years, 50% of the world’s jobs will be prompt engineering, and people who can’t write prompt words will be obsolete.
36kr: What will Baidu look like in 10 years?
Robin Li: I don’t want to predict the revenue or the number of employees at that time. It’s not that important to me.
Baidu’s mission is to use technology to make a complex world simpler. This is true now and will be true in 10 years.
I hope that ten years later, looking back today, I will find that every ordinary person’s life and the whole society will be changed by Baidu. What positive role have we played? What did we not do in place? What can we do better? I’d rather think about things like that.
36kr: There’s a word for it, “Emergent Intelligence.” I personally hold this view: there is nothing special about people, whether it is proud of creativity or emotion, like a calculation unit, more calculation units, will produce emotion, creativity. Do you think that’s what’s going on?
Robin Li: I don’t think so. I just think people are the only one. In such a vast universe, there is no evidence of aliens so far, which shows how unique human beings are in this universe. Today we say we want to use machines to replace people, want to reproduce a person, I think this is unrealistic.
I think there is no need to copy human beings. Machines can do many things better than human beings. There is no need to compare with machines. It should give full play to what it is good at, we don’t even need to let it develop such things as emotions, even if it does develop, it may be a different kind of thing, and our (human) emotions are still different.
36kr: Ten years later, has AGI (General Artificial Intelligence) been realized?
Robin Li: That’s an open question. I don’t have an answer. If I had to make a binary choice, I don’t think [general artificial intelligence after a decade] has materialized.
Artificial General Intelligence, or artificial General intelligence, means that it is as capable as a person. The term corresponds to “weak artificial intelligence” and “super artificial intelligence.” I don’t really agree with that.
I think there are a lot of things that machines are better than people, and there are a lot of things that machines are weaker than people. There is absolutely no need for us to box ourselves into “we must make machines like people”, and I think that’s just not the way to go.
I’m very optimistic about the emergence of intelligence and the fact that machines are getting smarter. The only thing I don’t think we’re trying to do is turn machines into people. It’s totally unnecessary.
36kr: What do you think is a certain future for the next ten years?
Robin Li: The speed of technological progress is getting faster and faster, and the improvement of production efficiency is becoming more and more obvious. This has been proved by the development over the past years. If in the past it was possible to work five days a week, maybe in 10 years three days will be enough. Increased productivity will make people happier with their lives.
I’m really excited about it myself because I can be a part of it and contribute something.
36kr: This is a trend. What outcome, if any, is certain? Are humans more stressed or happier 10 years from now?
Robin Li: Definitely happier.
本文由数字化转型网(www.szhzxw.cn)转载而成,来源:数字时氛;编辑/翻译:数字化转型网宁檬树。

免责声明: 本网站(http://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。
本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。
