本文为数字化转型网(szhzxw.cn)为大模型专题系列文章之一,本文讲述了大语言模型是什么以及从NLP到大语言模型的进阶历程是怎样的?数字化转型网www.szhzxw.cn
一、大语言模型是什么?
大型语言模型,也叫大语言模型、大模型 (Large Language Model, LLM; Large Language Models, LLMs)数字化转型网www.szhzxw.cn
大型语言模型 (LLM)是指包含数千亿(或更多)参数的语言模型,这些参数是在大量文本数据上训练的,例如模型 GPT-3、PaLM、Galactica 和 LLaMA。具体来说,LLM 建立在 Transformer 架构之上,其中多头注意力层堆叠在一个非常深的神经网络中。现有的LLM 主要采用与小语言模型类似的模型架构 (即 Transformer) 和预训练目标(即语言建模)。作为主要区别,LLM 在很大程度上扩展了模型大小、预训练数据和总计算量(扩大倍数)。他们可以更好地理解自然语言,并根据给定的上下文(例如 prompt) 生成高质量的文本。这种容量改进可以用标度律进行部分地描述,其中性能大致遵循模型大小的大幅增加而增加。然而根据标度律,某些能力(例如,上下文学习)是不可预测的,只有当模型大小超过某个水平时才能观察到。
2023年3月6日,来自谷歌与柏林工业大学的人工智能研究人员小组推出了一个多模态具象化视觉语言模型(VLM)-PaLM-E,该模型的参数规模达到了5620亿个,集成了用于控制机器人的视觉与语言。研究人员称,这是有史以来规模最大的VLM,无需重新训练即可执行各种任务。数字化转型网www.szhzxw.cn
二、从NLP到大语言模型的进阶历程
新浪微博资深算法专家张俊林认为,要想探导大型语言模型未来怎么走,需要先回顾此前是怎么一路变迁的。他将自然语言处理发展到大型语言模型的历程分为五个阶段:规则、统计机器学习、深度学习、预训练、大型语言模型。数字化转型网www.szhzxw.cn
机器翻译是NLP中难度最高,综合性最强的任务。因此张俊林以机器翻译任务为例来对比不同阶段的特点以及技术栈、数据的变化,以此展示NLP如何一步步演进。
1、规则阶段
大致以1956年到1992年,基于规则的机器翻译系统是在内部把各种功能的模块串到一起,由人先从数据中获取知识,归纳出规则,写出来教给机器,然后机器来执行这套规则,从而完成特定任务。数字化转型网www.szhzxw.cn
2、统计机器学习阶段
大致从1993年到2012年,机器翻译系统可振成语言模型和翻译模型,这里的语言模型与现在的GPT-3/3.5的技术手段一模样。该阶段相比上一阶段突变性较高,由人转述如识变成机器自动从数据中学习知识,主流技术包括SVM、HMM、MaxEnt、CRF、LM等,当时人]标注数据量在百万级左右。数字化转型网www.szhzxw.cn
3、深度学习阶段
大致从2013-2018年,相对上一阶段突变性较低,从离散匹配发展到embedding连续匹配,模型变得更大。该阶段典型技术栈包括Encoder-Decoder、LSTM、Attention、Embedding等,标注数据量提升到千万级。数字化转型网www.szhzxw.cn
4、预训练阶段
是以2018年到2022年,相比之前的最大变化是加入自监督学习,张俊林认为这是NLP领域最本出的贡献,将可利用数据以标注数据拓展到了非标注数据。该阶段系统可分为预训练和微调两个阶段,将预训练数据量扩大3到5倍,典型技术栈包括Encoder-Decoder、Transformer、Attention等。
5、大型语言模型阶段
从2023年起,目的是让机器能听懂人的命令、遵循人的价值观。其特性是在第一个阶段把过去的两个阶段缩成一个预训练阶段,第二阶段转换成与人的价值观对齐,而不是向领域迁移。这个阶段的突变性是很高的,已经从专用任务转向通用任务,或是以自然语言人机接口的方式呈现。数字化转型网www.szhzxw.cn
随后他介绍了一个研究工作的结论:在高源语言上,ChatGPT机器翻译效果与商用MT系统效果接近;在低资源语言上,目前ChatGPT机器翻泽效果与商用MT系统相比差得比较远。
从这些阶段中数据、算法、人机关系的变化,可以观察到NLP的发展趋势。
1、数据方面
从少量标注数据、大量标注数据、海量非标注数据+少量标注数据到海量非标注数据,越来越多数据被利用起来,人的介入越来越少未来会有更多文本数据、更多其它形态的数据被用起来,更远的未来是任何我们能见到的电子数据,都应该让机器自己从中学到知识或能力。
数字化转型网www.szhzxw.cn
2、算法方面
表达能力越来越混,规模越来越大,自主学习能力越来越强,从专用向通用,沿着这个趋势往后,未来Jransformr预计够用,同时也需要替代Transformer的新型模型,逐步迈向通用人工智能。数字化转型网www.szhzxw.cn
3、人机关系方面。
人的角色逐渐从教导者转向监督者,未来可能会从人机协作、机器向人学习,发展成人向机器学习,最后由机器拓展人类。
更多有关大模型相关文章,例如大模型资料、大模型成功案例、大模型方案等可在数字化转型网大模型专栏里进行阅读;如果您需要大模型相关资料包报可点击蓝字填写问卷与我们取得联系;更多精彩内容可进入读者群与百万读者一同交流~数字化转型网www.szhzxw.cn

大模型专题栏目文章(二):从NLP到大型语言模型的进阶历程是怎样的?翻译:
This article is one of a series of articles on Digital Transformation network (szhzxw.cn) for Big Model. This article describes what is big language model and what is the progression from NLP to big language model?数字化转型网www.szhzxw.cn
What is a big language model?
Large Language Model, also known as large Language model (LLM); Large Language Models, LLMs)数字化转型网www.szhzxw.cn
A large language model (LLM) refers to a language model that contains hundreds of billions (or more) of parameters trained on large amounts of text data, such as the models GPT-3, PaLM, Galactica, and LLaMA. Specifically, LLM is built on the Transformer architecture, where multiple layers of attention are stacked in a very deep neural network. Existing LLMS mainly employ a model architecture similar to small language models (i.e. Transformer) and pre-training objectives (i.e. Language modeling). 数字化转型网www.szhzxw.cn
As a major difference, LLM greatly expands the model size, pre-training data, and total computational effort (scaling multiplier). They can better understand natural language and produce high-quality text based on the given context (such as prompt). This capacity improvement can be described in part by the scaling law, where performance roughly follows a large increase in the size of the model. However, according to the scaling law, certain abilities (for example, contextual learning) are unpredictable and can only be observed when the model size exceeds a certain level.
On March 6, 2023, a team of artificial intelligence researchers from Google and the Technical University of Berlin unveiled a multimodal Embodied Visual language model (VLM), PaLM-E, with a parameter scale of 562 billion, integrating vision and language for controlling robots. The researchers say it is the largest VLM ever built and can perform a variety of tasks without retraining.数字化转型网www.szhzxw.cn
Second, the advanced process from NLP to large language model
Zhang Junlin, a senior algorithm expert at Sina Weibo, believes that in order to explore the future of large-scale language models, we need to look back at how they have changed all the way before. He divides the journey from natural language processing to large language models into five stages: rules, statistical machine learning, deep learning, pre-training, and large language models.数字化转型网www.szhzxw.cn
Machine translation is the most difficult and comprehensive task in NLP. Therefore, Zhang Junlin uses the machine translation task as an example to compare the characteristics of different stages and the changes in the technology stack and data to show how NLP evolves step by step.
Rule stage
Roughly from 1956 to 1992, rule-based machine translation systems strung together modules of various functions internally, with humans first acquiring knowledge from data, generalizing rules, writing them out and teaching them to machines, and then machines executing the rules to complete specific tasks.数字化转型网www.szhzxw.cn
Statistical machine learning stage
Roughly from 1993 to 2012, machine translation systems were able to generate language models and translation models, the language models here are similar to the current GPT-3/3.5 technology. Compared with the previous stage, this stage is highly mutational, from human reporting such as knowledge to machines automatically learning knowledge from data, mainstream technologies include SVM, HMM, MaxEnt, CRF, LM, and so on, when the amount of human annotation data is about one million.数字化数字化转型网www.szhzxw.cn转型网www.szhzxw.cn
Deep learning stage
Roughly from 2013 to 2018, the mutation is relatively low compared to the previous stage, and the model becomes larger from discrete matching to continuous matching of the embedding. The typical technology stack in this stage includes Encoder-Decoder, LSTM, Attention, Embedding, etc. The amount of annotated data increases to tens of millions.
Pre-training stage数字化转型网www.szhzxw.cn
From 2018 to 2022, the biggest change compared to the previous is the addition of self-supervised learning, which Zhang Junlin believes is the most original contribution in the field of NLP, extending the available data to labeled data to unlabeled data. This stage system can be divided into two stages of pre-training and fine-tuning, which will expand the amount of pre-training data by 3 to 5 times. Typical technology stacks include Encoder-Decoder, Transformer, Attention, etc.数字化转型网www.szhzxw.cn
Large language model stage
From 2023 onwards, the aim is for machines to understand human commands and follow human values. Its characteristic is that in the first stage the past two stages are shrunk into a pre-training stage, and the second stage is transformed into an alignment with the values of the person, rather than a migration to the field. The mutability of this stage is very high, and there has been a shift from specialized tasks to general-purpose tasks, or in the form of natural language human-machine interfaces.数字化转型网www.szhzxw.cn
Then he introduced a conclusion of the research work: ChatGPT machine translation effect is close to that of commercial MT system in high source languages; In low-resource languages, ChatGPT machines are far less effective than commercial MT systems.
From the changes of data, algorithms and human-machine relationship in these stages, we can observe the development trend of NLP.数字化转型网www.szhzxw.cn
Data
From a small amount of annotated data, a large amount of annotated data, a large amount of non-annotated data + a small amount of annotated data to a large amount of non-annotated data, more and more data will be used in the future, more text data, more other forms of data will be used, and the further future is any electronic data we can see, we should let the machine learn knowledge or ability from it.数字化转型网www.szhzxw.cn
Algorithm
The expression ability is getting more and more mixed, the scale is getting larger and larger, the self-learning ability is getting stronger and stronger, from the special to the general, along this trend, the future Jransformr is expected to be enough. But also need to replace Transformer new model, and gradually move towards general artificial intelligence.
Man-machine relationship.数字化转型网www.szhzxw.cn
The role of human gradually shifted from instructor to supervisor, and in the future. There may be human-machine collaboration, machine-to-person learning, adult to machine learning, and finally, human expansion by machine.数字化转型网www.szhzxw.cn
More articles about big model, such as big model data, big model success stories, big model solutions, etc. Can be read in the Big Model column of the Digital Transformation network. If you need a large model related information package. You can click the blue character to fill in the questionnaire and contact us. More exciting content can enter the readership group and communicate with millions of readers.
本文由数字化转型网(www.szhzxw.cn)转载而成,来源:MBA百科;编辑/翻译:数字化转型网默然。

免责声明: 本网站(http://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。
本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。
