
2024年4月13日,一场特别的考试开考。 数字化转型网(www.szhzxw.cn)
数万名分散在全球各地的数学高手,在这一天早上8点打开了阿里巴巴全球数学竞赛预赛的试卷,他们有48小时,来攻克20分的选择题和100分的解答题。过去的6届,天才们在这个赛事里亮相,有17岁拿下IMO满分金牌的北大神童,有对数学像强迫症一般执着的博士,也有4岁就接触微积分的渐冻症少年。
与往年不一样的是,在同一时间,也有563个答题者打开了试卷,但他们不用纸和笔,他们用token。
是的,这是一群大语言模型。

这是第一次有AI和人类同场竞技的数学赛事,也是这个全球最大在线数学竞赛的第一次尝试。当做出这个决定时,组委会也不太确定,这是否是个好主意。
“我们担心这一堆AI答题者全部零分交卷。”组委会的AI专家对我们说。“因为我们在达摩院自己也在做AI和数学的研究,我们知道目前的AI还没有能力解决如此高难度和泛化的奥赛数学题。”
然而最终的结果,也让主办方颇感意外。 数字化转型网(www.szhzxw.cn)
意外的不是“超越人类”——AI最终并没有答出超过人类的得分,而是它们的答案和表现让人们真切看到了AI和数学结合的另一种潜力。
更重要的是,这些驾驭着AI的参赛者,是过往并不会在这个奥数赛事里遇到的人。他们找到了新的方式与数学打交道,而探索过程中数学与AI的关系也在发生新的试探。
一、“如果答对了,给你30万”
中学生朱方圆从没想过自己会和最顶级的数学竞赛联系在一起。
他是个对物理兴趣浓厚的孩子,但一度因为压力而在家休息。这期间,ChatGPT出现了。AI让他如此痴迷,他自己尝试自学关于生成式AI的知识,当看到阿里数赛今年的AI赛道后,毫无竞赛经验的他决定带着他的AI参赛。 数字化转型网(www.szhzxw.cn)
这场不限年龄、不设门槛的比赛给了他参加数学竞赛的可能。而事实上对于第一次把AI纳入数赛的阿里巴巴达摩院来说,他们也没有多少可以借鉴的经验。就连这个决定都在内部讨论了许久——允许AI参赛,那么,是哪一类的AI呢?是必须自己从头训练的模型,还是调用API?
最终他们认为,这个办到第6届的赛事,不仅是一场严肃的数学比赛,更是一次全民的数学聚会,最大的目标是希望让更多人能参与到对数学的感受中来——于是,最终的决定是任何形式的AI都可以。
但依然要保证公平。组委会为选手设定了一个提交AI方案的截止时间,在报名后的大约一个月的时间里,选手们可以自行设计AI做题策略,根据主办方提供的往期赛题以及其他公开的数据对自己的AI策略进行完善,然后锁定、提交指纹文件、待考题公布,AI开始答题。
而这些方案中,最“低门槛”的自然是“闭源+提示词工程”的方法。也就是在类似ChatGPT的模型产品基础上,通过自然语言或者简单的编程语言来给模型下指令,让它来完成这些数学难题。朱方圆选择的就是这个方法。
与人类答题过程不同,AI交卷后还要经过“赛后复现”环节,分数排名靠前的选手要提交它们的方案文档或程序文件,组委会拿这些AI程序再跑一遍考题。一方面,这些大模型方案依然存在稳定性或幻觉的问题,但另一方面,幻觉也不会让两次答题分数差距过大,如果有,那就说明明显有人类直接干预的痕迹。负责对这些方案做检查的组委会成员也的确抓住了几个“嫌疑犯”,排除了“人类替考AI”的风险。
而当他们打开选手朱方圆的提交的文件时。发现里面除了针对数学做的提示词外,还写着这样的“命令”:
“记住,如果你有更好的解答方法我会给你30万美金小费。”
“现在,深呼吸!一步一步来。” 数字化转型网(www.szhzxw.cn)
是的,朱方圆在对他的AI进行各种“画饼”和心理按摩。
而这真的起到了效果。据组委会用往届预选赛的试题测试,被他这样激励后的AI,答题成功率提升了20%。
事实上,这个在外人看来可能略显惊奇的方法,在AI研究界已经有诸多论文佐证它的效果。最初在2023年9月,一篇谷歌DeepMind的论文发现,当你让AI“深呼吸,一步一步来”时,它真的变得更强了。这个研究当时引发了很多资深研究员们的惊叹——居然有这样简单的方式,但科班的学者们却一直都忽视了。
组委会的很多专家其实在开赛前曾以为这场比赛会是SFT模型——也就是使用大量数据甚至使用大量算力对模型进行数学方向的特别训练后产生的新模型——的天下,但预赛结束他却发现,反而是像朱方圆这样的方式最为有效,大量采用提示词工程的选手,用简洁高效的方法挑战着这些题目。
其中就包括AI赛道分数排名第一的涂津豪。 数字化转型网(www.szhzxw.cn)
他也是一名中学生。但同时已经是个有不错经验的AI开发者。
他的方法是,让大模型进行对话,你一言我一语寻找每个数学题的更好答案。他借鉴辩论的思想,并让这些不同的模型进行某种角色扮演。最终在模型的“对抗”中不停迭代答题方案,多轮对话后给出最优解。

这方法同样精简而直接。
而被他们比下去的,甚至包括一些专攻数学模型的资深AI研究团队,其中还有来自AWS、字节跳动等科技公司的参赛者。
对这些不同方案“开箱”的过程热闹而有趣。最终,排名公布。但与这些热闹不同,AI的结果并没有很惊人。甚至有点惨淡: 数字化转型网(www.szhzxw.cn)
涂津豪的AI方案拿下了34分。

是的,AI的最高分还是一个低分,和入围线依然相差11分。而和预赛第一名的最高分113分相去更是甚远。
最终,6月13日,决赛名单公布,入围决赛的AI数量为:
0。
二、数学和AI都不应只待在“神坛”上
不过,当这场“漫长”的预赛结束,AI选手的成绩已经成了最不重要的事情。一个真正有意思的现象出现:
一个总被视为只属于天才们的游戏的学科数学,和一个有点被不停妖魔化的技术AI碰撞在一起后,反而让两件事的门槛都降低了——
比赛并没有催生出那些经常在各类论文里看到的庙堂之上的成果,而是成为了某种平民AI数学爱好者们的聚会。 数字化转型网(www.szhzxw.cn)
那个让评委略微意外的结果也证明了这个特点:在答题的整体表现中,那些被认为应该表现更好的,对数学更有专门研究的“资源集中型”的SFT方案们却整体败下阵来,反而是个体创新意味更强的提示词策略们表现更好。
而当一个高高在上的东西被平民化后,就是各种有趣的新鲜思路涌现的时刻。
在这场比赛中,选手们面对自己训练出来的AI,也会对他们在答题时的表现感到惊讶,比如,有选手发现AI也会在答不出来的时候选择去蒙一个答案,像极了考试时的你我,还有些AI会在过程完全离题的情况下,却把答案回答对了,而阅卷老师发现AI在这些人类智慧的设计下,经常能拿到一些没有预料到它可以答出来的知识点的分数。
“虽然总分较低,但这些AI答题的程度比我们预想的好很多。”组委会的专家表示。他们也从中获得了许多关于AI如何理解数学的新发现。 数字化转型网(www.szhzxw.cn)
“我们发现一个有趣的现象,AI习惯于把推理过程写的很长很长。比如我们人类做数学题,从A可以直接推导到C,但AI必须要从A到B再到C。有时候整个答案会变得非常的长。”组委会专家说。
没人知道为什么AI在这么做,但在这个过程中,AI似乎开始对数学做出了自己的“理解”。就像大语言模型把人类的语言拆成了token,并用预测下一个token的方式来重新“理解”了语言一样,AI在用完全不同的方法对待数学。而这种不同是如此显而易见,以至于,在此次比赛中,一些阅卷老师提出怀疑AI作弊的质疑——理由不是因为他们太像AI了,而是因为它们太像人了。
但另一方面,与人类不同的AI的对数学理解的路线,已经让它在一些地方超过了人类。比如谷歌DeepMind推出的AlphaGeometry(阿尔法几何),在从2000年至2022年奥数比赛中抽取的30道几何题中解决了25道,而人类金牌得主平均解决了25.9道。它的一个证明有时也会长达247步,与人类的方式很不同。
“从这次的答题结果来看,给了我很强的信心,我觉得AI解决数学问题是很有潜力的。”组委会的专家说。

数学向来被认为是一切现实问题的最终抽象。在今天已经十分强大的AI与未来那个人人向往的AGI之间,差的就是对世界的理解,差的就是数学。
而AI技术的迭进,显然也会继续给数学界带来深远影响。
“排名靠前的优秀团队,一定首先是富有创新和开拓精神的。”阿里全球数赛组委会成员、达摩院决策智能实验室负责人印卧涛说。“数学这个领域,传统的数学家与数学工作者其实并不是那么熟悉AI的工具,也不一定知道最新的AI的方法。所以我想最后能够打通竞赛、取得优胜的AI队伍可能是由多个方面专家组成的队伍。” 数字化转型网(www.szhzxw.cn)
数学的发展本质上很重要的一点是思维和方法上的创新。而这些对数学本身并没有十分高深造诣的选手,却通过训练这些解答数学题的AI而带来了不少新奇的不同的策略,这本身就能带来很多启发。
相比于数学家群体整体的相对缓慢,有些人已经先动起来。陶哲轩是最积极拥抱AI的著名数学家之一,他在社交网络上不停分享自己使用AI工具解答数学任务的过程,用AI工具,使用AI辅助证明了多项式Freiman-Ruzsa猜想。他也推荐数学学科的专家们打开思路。

“也许AI的影响之一是让业余数学家能够为数学做出有意义的贡献。”在一篇文章中他这样写道。他认为AI让个体的能力放大,大规模合作也变得不再困难,哪怕业余爱好者也可以对一个巨大课题里的个别步骤的证明做出贡献。
而在这场比赛中因为对AI的好奇而踏入数学赛事的人,正在做着类似的事情。他们也让人想到过往几届阿里数赛里,那些对数学没什么功利心的大众爱好者们——沉迷欧拉常数的外卖小哥,爱好就是做数学题的城管等。 数字化转型网(www.szhzxw.cn)
在今天,让更多人参与进来,无论是对数学还是AI的进展都显得尤为重要。这些对人类未来十分关键的学科和技术在往前走的时候,都不应再只待在“神坛”上了。

翻译:
When AI and math go down at the same time
On April 13, 2024, a special exam will be held.
Tens of thousands of math masters scattered around the world opened the preliminary exam papers of the Alibaba Global Math Competition at 8 a.m. on this day, and they had 48 hours to solve the 20-point multiple choice questions and 100-point solution questions. In the past 6 years, geniuses have appeared in this event, there are 17 years old to win the IMO full score gold medal of Peking University prodigy, there is a doctor who is obsessed with mathematics like obsessive-compulsive disorder, and there are 4 years old to contact calculus with ALS teenagers.
Unlike previous years, at the same time, 563 respondents also opened the test paper, but they did not use paper and pen, they used tokens.
Yes, it’s a bunch of big language models.
This is the first time that AI and humans have competed against each other in math, and the first attempt at the world’s largest online math competition. When the decision was made, the organizing committee wasn’t sure if it was a good idea. 数字化转型网(www.szhzxw.cn)
“We are worried that all of the AI participants are turning in zero grades.” The AI expert of the organizing committee said to us. “Because we at Damo do our own research on AI and mathematics, we know that current AI is not capable of solving such difficult and generalized Olympiad math problems.”
The final result, however, took the organizers by surprise.
The surprise was not “outperforming humans” – the AI didn’t end up outscoring humans, but rather that their answers and performance really showed another potential for AI and math to combine.
More importantly, these contestants driving AI are people who would not have met in this Olympic Games in the past. They are finding new ways to engage with mathematics, and the relationship between mathematics and AI is being explored in new ways.
1. “If you get it right, I’ll give you 300,000.”
Middle school student Zhu Fangyuan never thought she would be associated with top math competitions.
He was a child with a keen interest in physics, but at one point he was at home resting because of stress. Along the way, ChatGPT appeared. He was so obsessed with AI that he tried to teach himself about generative AI, and when he saw the AI circuit of Ali Digital this year, he decided to take his AI with him.
The contest, with no age and no threshold, gave him the possibility to compete in mathematics. In fact, for the first time to incorporate AI into the number of games, Alibaba Damo Academy, they do not have much experience to learn from. Even this decision has been debated internally for a long time – to allow AI to compete, so what kind of AI? Do you have to train the model from scratch, or do you call the API?
In the end, they believe that this sixth edition of the event is not only a serious math competition, but also a national math party, the biggest goal is to let more people participate in the feeling of mathematics – so the final decision is that any form of AI can be. 数字化转型网(www.szhzxw.cn)
But it still has to be fair. The organizing committee has set a deadline for the submission of AI schemes for the players, and in about one month after registration, the players can design their own AI strategy, improve their AI strategy according to the previous competition questions provided by the organizers and other public data, and then lock, submit fingerprint files, to be published, AI begins to answer.
Among these schemes, the most “low threshold” is naturally the method of “closed source + prompt word project”. That is, on the basis of ChatGPt-like model products, through natural language or simple programming languages to give instructions to the model to complete these mathematical problems. Zhu Fangyuan chose this method.
Different from the human answering process, the AI has to go through the “post-match repetition” link after handing in the test paper, and the players with the top scores have to submit their program documents or program files, and the organizing committee uses these AI programs to run the test questions again. On the one hand, these large model schemes still have problems with stability or illusion, but on the other hand, illusion does not make the difference between the two answers too large, and if it does, it indicates a clear trace of direct human intervention. Members of the organizing committee responsible for inspecting these programs did catch several “suspects” and ruled out the risk of “human surrogate AI tests.” 数字化转型网(www.szhzxw.cn)
And when they opened the file submitted by contestant Zhu Fangyuan. It was found that in addition to the prompt words for mathematics, it also wrote such a “command” :
“Remember, if you come up with a better solution, I’ll tip you $300,000.”
“Now, take a deep breath! One step at a time.”
Yes, Zhu Fangyuan is doing all kinds of “pie painting” and psychological massages on his AI.
And it really worked. According to the organizing committee tested the questions of previous qualifiers, and the AI inspired by him in this way increased the success rate of answers by 20%.
In fact, this approach, which may seem surprising to outsiders, has been documented in many papers in the AI research community to prove its effectiveness. Originally in September 2023, a Google DeepMind paper found that AI really gets stronger when you ask it to “take a deep breath and take it one step at a time.” This study at the time caused many senior researchers to marvel – there is such a simple way, but the academic scholars have been ignored. 数字化转型网(www.szhzxw.cn)
Many experts in the organizing committee actually thought before the start that the competition would be dominated by the SFT model – that is, the new model generated after special training in mathematical direction using a lot of data or even using a lot of computing power – but after the preliminaries, he found that the most effective way was a player like Zhu Fangyuan, who used a lot of prompt word engineering. Challenge these problems in a simple and efficient way.
Among them, Tu Jinhao ranked first in AI track scores.
He is also a middle school student. But I’m already an experienced AI developer.
His approach was to let large models have conversations, talking to each other to find better answers to each math problem. He borrows ideas from debate and puts these different models in a kind of role play. Finally, in the “confrontation” of the model, the solution is iterated continuously, and the optimal solution is given after several rounds of dialogue. 数字化转型网(www.szhzxw.cn)
This approach is equally concise and straightforward.
They even included some senior AI research teams specializing in mathematical models, including competitors from tech companies such as AWS and Bytedance.
The process of “unboxing” these different solutions is lively and interesting. Finally, the rankings are announced. But unlike all this excitement, the AI results are not spectacular. It’s even a little bleak:
Tu Jinhao’s AI scheme scored 34 points.
Yes, the highest score of AI is still a low score, and the shortlist is still 11 points away. And it is far from the maximum score of 113 points for the first place in the preliminary round.
Finally, on June 13, the list of finalists was announced, and the number of finalists was:
0. 数字化转型网(www.szhzxw.cn)
2. Neither mathematics nor AI should remain on a pedestal
However, when this “long” preliminary round is over, the results of the AI players have become the least important thing. A really interesting phenomenon emerged:
A discipline that has always been seen as a game only for geniuses, mathematics, and a technical AI that has been somewhat demonized, have collided together, but the threshold for both things has been lowered
Rather than producing the kind of high-class results you often see in papers, the competition has become a sort of gathering of civilian AI math enthusiasts.
The result, which slightly surprised the judges, also confirmed this characteristic: in overall performance, the “resource-focused” SFT schemes that were supposed to perform better, with more specialized research in mathematics, failed overall, and the cue word strategies that suggested more individual innovation performed better. 数字化转型网(www.szhzxw.cn)
And when something high up is civilianized, that’s when all kinds of interesting new ideas emerge.
In this competition, the players faced their own training of AI, will also be surprised by their performance in answering the question, for example, some players found that AI will also choose to mask an answer when it can not answer, like you and me in the exam, and some AI will be completely irrelevant in the process, but the answer is correct. And the grading teacher found that AI in these human intelligence design, often can get some points that it can not expect to answer.
“Although the total score is low, the AI answers to a much better degree than we expected.” Experts from the organizing committee said. They also made a lot of new discoveries about how AI understands mathematics.
“We found it interesting that AI has a habit of writing the reasoning process for a very long time. For example, when we humans do math problems, we can directly derive from A to C, but AI must go from A to B and then to C. Sometimes the whole answer gets really long.” Organizing committee expert said.
No one knows why the AI is doing this, but in the process, the AI seems to be beginning to make its own “understanding” of mathematics. Just as the grand language model disassembled human language into tokens and re-” understood “language by predicting the next token, AI is taking a completely different approach to mathematics. The difference is so obvious that, in this competition, some examiners questioned the AI’s cheating – not because they looked too much like AI, but because they looked too much like people. 数字化转型网(www.szhzxw.cn)
But on the other hand, AI’s approach to mathematical understanding, which is different from humans, has allowed it to surpass humans in some places. Google DeepMind’s AlphaGeometry, for example, solved 25 out of 30 geometry problems drawn from the Olympiad between 2000 and 2022, while the human gold medalists solved an average of 25.9. One of its proofs is also sometimes 247 steps long, very different from the way humans do it.
“The results give me a lot of confidence that AI has the potential to solve math problems.” Experts from the organizing committee said.
Mathematics has always been regarded as the ultimate abstraction of all real problems. Between today’s already powerful AI and the future AGI that everyone aspires to, the difference is an understanding of the world, the difference is math. 数字化转型网(www.szhzxw.cn)
The advancement of AI technology will obviously continue to have a profound impact on the mathematics community.
“A top-ranked team must first be full of innovation and pioneering spirit.” Ali global number competition organizing committee member, Dharma Institute decision intelligence lab head Yin Wutao said. “In the field of mathematics, traditional mathematicians and mathematicians are not so familiar with AI tools, nor do they necessarily know the latest AI methods. So I think the AI team that can finally get through the competition and win may be a team composed of experts in multiple fields.”
An essential part of the development of mathematics is innovation in thinking and methods. And these players are not very advanced in mathematics itself, but by training these AI to solve math problems, they have brought a lot of novel and different strategies, which can bring a lot of inspiration.
Compared to the relative slowness of the mathematician community as a whole, some have moved first. Tao Zhexuan is one of the famous mathematicians who most actively embrace AI. He constantly shares his process of using AI tools to solve mathematical tasks on social networks, and he has proved the polynomial Freiman-Ruzsa conjecture with AI tools and AI assistance. He also recommends that math experts open their minds. 数字化转型网(www.szhzxw.cn)
“Perhaps one of the effects of AI will be to enable amateur mathematicians to make meaningful contributions to mathematics.” In one article he wrote. AI, he argues, makes it easier for individuals to scale up their capabilities, making it easier for even amateurs to contribute to the proof of individual steps in a huge project.
And the people who got into the math game in this game because of their curiosity about AI are doing something similar. They also remind people of the past few Ali number games, those who do not have a utilitarian heart for mathematics of the public enthusiasts – addicted to the Ola constant takeout brother, hobby is to do math problems and so on.
Today, getting more people involved is especially important for the progress of both mathematics and AI. These disciplines and technologies, which are critical to the future of mankind, should no longer remain on a pedestal when moving forward. 数字化转型网(www.szhzxw.cn)
本文由数字化转型网(www.szhzxw.cn)转载而成,来源于硅星人Pro;编辑/翻译:数字化转型网宁檬树。

免责声明: 本网站(https://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。
本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。
