数智化转型网szhzxw.cn 大模型 大模型专题栏目文章(三):大语言模型涌现的能力有哪些?大语言模型的关键技术有哪些?

大模型专题栏目文章(三):大语言模型涌现的能力有哪些?大语言模型的关键技术有哪些?

本文为数字化转型网(szhzxw.cn)为大模型专题系列文章之一,本文讲述了大语言模型涌现的能力有哪些?大语言模型的关键技术有哪些?数字化转型网www.szhzxw.cn

一、大语言模型涌现的能力有哪些

大语言模型涌现的能力有哪些?数字化转型网www.szhzxw.cn

LLM 的涌现能力被正式定义为在小型模型中不存在但在大型模型中出现的能力,这是LLM 与以前的 PLM 区分开来的最显著特征之-。当出现这种新的能力时,它还引入了一个显著的特征:当规模达到一定水平时,性能显著高于随机的状态。

以此类推,这种新模式与物理学中的相变现象密切相关。原则上,这种能力也可以与一些复杂的任务有关,而人们更关心可以应用于解决多个任务的通用能力。这里简要介绍了 LLM 的三种代表性的涌现能力:数字化转型网www.szhzxw.cn

1、上下文学习。

GPT-3 下式引入了上下文学习能力:假设语言模型已经提供了自然语言指今和多个任务述,它可以通过完成输入文本的词序列来生成测试实例的预期输出,而无需额外的训练或梯度更新。数字化转型网www.szhzxw.cn

2、指令循。

通过对自然语言描述(即指令)格式化的多任务数据集的混合进行微调,LLM 在微小的任务上表现良好,这些任务也以指令的形式所描述。这种能力下,指令调优使 LM 够在不使用显式样本的情况下通过理任务指令来执行新任务,这可以大大提高泛化能力。

3、循序渐进的推理。数字化转型网www.szhzxw.cn

对于小语言模型,通学很准解决涉及多个推理步骤的复杂任务,例如数学学私单词问题、同时,过思维游推理第略,山M回以通过利用涉及中间推理步骤的 prompt 机制来解决此类任务得出最终答案。据推测,这种能力可能是通过代码训练获得的。

二、大语言模型的关键技术有哪些

大语言模型的关键技术有哪些?数字化转型网www.szhzxw.cn

LLMs 的关键技术,包括了缩放、训练、能力激发、对齐调优、工具利用等。

1、缩放。

缩放是增加 LLMs 模型容量的关键因素,最开始 GPT-3 将模型参数增至 1750 亿,随后 PaLM 进一步将模型参数增至 5400 亿。大规模参数对于涌现能力至关重要。缩放不仅针对模型大小,还与数据大小和总计算量有关。数字化转型网www.szhzxw.cn

2、训练。数字化转型网www.szhzxw.cn

由于规模巨大,成功训练一个具备强大能力的 LLMs 非常具有挑战性。因此需要分布式训练算法来学 LLMs 的网络参数,经常联合使用各种共行策略。为了支持分布式训练,Deepspeed 和 Megatr-LM 等优化准架被用来进并行算法的实现和高。此外,优化技对训练稳定性和楼型性能也很重要,例如重新启动训练损失尖峰和混合精度训练。最近的 GPT-4开发了特殊的基础设施和优化方法,从而利用小得多的模型来预测大模型的性能。

3、能力激发。

在大规模语料库上经过预训练后,LLMs 被赋子了解决一般任务的潜在能力。然而当 LLMs 执行某个特定任务时,这些能力可能不会显式地表现出来。因此设计适合的任务指令或特定的上下文策略来激发这些能力非常有用,比如思维链 prompt 有助于通过中间推理步等解决复杂推理任务。此外还可以进一步对具有自然语言任务描述的 LLMs 进行指令调优,以提高对未见过任务的泛化能力。数字化转型网www.szhzxw.cn

4、对齐调优。

由于LLMs 被训练用来捕获预训练语料库的数据特征包括高质量和低质量的数据),它们很可能生成对有毒、有偏见和有害的文本内容。为了使 LLMs 与人类价值观保持一致,lnstructGPT 设计了一种利用强化学习和人类反情的高效调优方法,使得 LMS 能够道循预期指令。ChatGPT 是在类似lnstructGPT 的技术上开发的,在产生高质量、无害的响应方面表现出了强大的对齐能力。数字化转型网www.szhzxw.cn

5、工具利用。数字化转型网www.szhzxw.cn

LLMs 本质上是基于大规模纯文本语库训练的文本生成器,因此在数值计算等文本表达不佳的任务上表现没那么好。此外LLMs 的力受限于预训练数据,无法捕获最新信息。针对这些问题,人们提出使用外部工具来弥补 LLM 的不足,比女如可以利用计算器进行精确计算,使用搜索引擎检索末知信息。ChatGPT 更是利用外部插件来联网学习新知识,这种机制可以广泛扩展 LLMs 的能力范围。数字化转型网www.szhzxw.cn

三、ChatGPT的四个关键技术

1、大规模预训练模型。数字化转型网www.szhzxw.cn

只有模型规模足够大,才可能具备推理能力。中国人民大学高领人工智能学院长聘副教授严零谈道,智能涌现不是故意设计出来的,而是大模型规模大到一定程度后,天然具备这样的特性。数字化转型网www.szhzxw.cn

2、在代码上进行预训练。

可能代码把解决一个大的问题分解成若干个小的问题,这种分布解决问题的方式有助于自然语言推理。和自然语言模型相比,代码语言模型需要更长的上下文的依赖。

3、Prompt/lnstruction Tuning。数字化转型网www.szhzxw.cn

GPT-3模型大大,已经没办法去精调了,只能用prompt,但是如果不精调,模型相当于还是一个语言模型没办法适应人,只能由人去适应模型。让人适应模型只能用指令的方式,再进行精调,这相比预训练代价要小的多。所以指令上精调就可以把一些不太多的数据,把语言模型的任务掰到适应人类的回答问题。数字化转型网www.szhzxw.cn

4、基于人类反馈的强化学习(RLHF):。

这对于结果好坏的影响不是特别大,甚至会限制语言模型生成的能力,但这种方式可能更好地和人类在安全性、无毒无害等等方面的价值观对齐。当模型上线后,它可以收集到更多用户的反馈。数字化转型网www.szhzxw.cn

更多有关大模型相关文章,例如大模型资料、大模型成功案例、大模型方案等可在数字化转型网大模型专栏里进行阅读;如果您需要大模型相关资料包报可点击蓝字填写问卷与我们取得联系;更多精彩内容可进入读者群与百万读者一同交流~数字化转型网www.szhzxw.cn

大模型专题栏目文章(三):大语言模型涌现的能力有哪些?大语言模型的关键技术有哪些?翻译:

This article is part of a series of articles on Digital Transformation Network (szhzxw.cn) for Big models. This article tells us what are the emerging capabilities of big language models? What are the key techniques of large language models?

First, what are the emerging capabilities of large language models

What are the emerging capabilities of large language models?

The emergent capability of LLM is formally defined as the capability that does not exist in small models but appears in large models, which is one of the most striking features that distinguishes LLM from previous PLMS. When this new capability emerged, it also introduced a striking feature: when scaled to a certain level, performance was significantly higher than random.数字化转型网www.szhzxw.cn

By analogy, this new model is closely related to phase transitions in physics. In principle, this ability can also be related to some complex tasks, while people are more concerned with general abilities that can be applied to solve multiple tasks. Three representative emerging capabilities of LLM are briefly introduced here:数字化转型网www.szhzxw.cn

Context learning.数字化转型网www.szhzxw.cn

The GPT-3 formula introduces contextual learning: Assuming that the language model already provides a natural language reference and multiple task statements, it can generate the expected output of the test instance by completing a sequence of words in the input text, without additional training or gradient updating.数字化转型网www.szhzxw.cn

Instructions follow.数字化转型网www.szhzxw.cn

By fine-tuning a mix of multi-task datasets formatted with natural language descriptions (i.e., instructions), LLM performs well on tiny tasks that are also described in the form of instructions. In this capability, instruction tuning enables LM to execute new tasks by managing task instructions without using explicit templates, which can greatly improve generalization ability.数字化转型网www.szhzxw.cn

Progressive reasoning.

For small language models, general learning is very good at solving complex tasks involving multiple reasoning steps, such as the mathematics private word problem, and at the same time, through the mind game inference strategy, to solve such tasks by utilizing the prompt mechanism involving intermediate reasoning steps to reach a final answer. Presumably, this ability may have been acquired through code training.

Second, what are the key technologies of large language models

What are the key techniques of large language models?

The key techniques of LLMs include scaling, training, capability stimulation, alignment tuning, tool utilization and so on.数字化转型网www.szhzxw.cn

Zoom.

Scaling was a key factor in increasing the LLMs model capacity, initially increasing the model parameters to 175 billion with GPT-3 and then further increasing the model parameters to 540 billion with PaLM. Large scale parameters are critical to emergent capabilities. Scaling is not only about model size, but also about data size and total computation.数字化转型网www.szhzxw.cn

Training.

Due to their large scale, successfully training an LLMs with strong capabilities is very challenging. Therefore, a distributed training algorithm is required to learn the network parameters of LLMs, often combining various co-row strategies. In order to support distributed training, optimization criteria such as Deepspeed and Megatr-LM are used to improve the implementation and improvement of parallel algorithms. In addition, optimization techniques are important for training stability and profile performance, such as restarting training loss spikes and mixed precision training. The recent GPT-4 developed special infrastructure and optimization methods to predict the performance of larger models using much smaller models.数字化转型网www.szhzxw.cn

Ability stimulation.

After pre-training on large corpora, LLMs are endowed with the potential to solve general tasks. However, these capabilities may not be expressed explicitly when an LLMs performs a particular task. Therefore, it is very useful to design appropriate task instructions or specific context strategies to stimulate these abilities, such as thought chain prompt, which helps solve complex reasoning tasks through intermediate reasoning steps. In addition, LLMs with natural language task description can be further tuned to improve the generalization ability of unseen tasks.数字化转型网www.szhzxw.cn

Alignment tuning.

Since LLMs are trained to capture data features of pre-trained corpora that include both high and low quality data, they are likely to generate textual content that is toxic, biased, and harmful. To align LLMs with human values, lnstructGPT designed an efficient tuning method that utilizes reinforcement learning and human reaction so that the LMS can follow the expected instructions. ChatGPT was developed on a technology similar to lnstructGPT and demonstrated strong alignment capabilities in producing high-quality, harmless responses.数字化转型网www.szhzxw.cn

Tool utilization.

LLMs are essentially text generators trained on a large plain text corpus and therefore do not perform as well on tasks where text is poorly represented, such as numerical computation. In addition, the power of LLMs is limited by pre-training data and cannot capture the latest information. In response to these problems, it is proposed to use external tools to make up for the shortcomings of LLM, such as using calculators to perform accurate calculations and using search engines to retrieve unknown information. ChatGPT makes use of external plug-ins to network and learn new knowledge, a mechanism that can greatly expand the scope of LLMs capabilities.数字化转型网www.szhzxw.cn

Three, four key technologies of ChatGPT

Large-scale pre-training model.

Only when the model is large enough can it be capable of reasoning. Yan Zero, an associate professor at the Gao Ling School of Artificial Intelligence at Renmin University of China, said that the emergence of intelligence is not deliberately designed, but after the scale of the large model is large to a certain extent, it naturally has such characteristics.

Pre-train on the code.数字化转型网www.szhzxw.cn

Maybe the code breaks down the solution to a large problem into several smaller ones, and this distributed solution helps with natural language reasoning. The code language model requires longer context dependencies than the natural language model.

3, Prompt/lnstruction Tuning.

GPT-3 model is large, has no way to fine-tune, can only use prompt, but if not fine-tuned, the model is equivalent to a language model can not adapt to people, only by people to adapt to the model. Adapting people to the model can only be done by way of instructions, and then fine-tuned, which is much less expensive than pre-training. So fine-tuning the instructions allows you to take a little bit of data and shift the task of the language model to fit the human answer.数字化转型网www.szhzxw.cn

Reinforcement Learning (RLHF) based on human feedback:

This is not particularly important for good or bad results, and may even limit the ability of language models to be generated, but this approach may better align with human values of safety, non-toxicity, and so on. When the model goes live, it can collect more feedback from users.数字化转型网www.szhzxw.cn

More articles about big model, such as big model data, big model success stories, big model solutions, etc., can be read in the Big Model column of the Digital Transformation network; If you need a large model related information package, you can click the blue character to fill in the questionnaire and contact us; More exciting content can enter the readership group and communicate with millions of readers.

本文由数字化转型网(www.szhzxw.cn)转载而成,来源:MBA百科;编辑/翻译:数字化转型网默然。

免责声明: 本网站(https://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。

本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。

免责声明: 本网站(http://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。 本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。https://www.szhzxw.cn/18438.html

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

联系我们

联系我们

17717556551

邮箱: editor@cxounion.org

关注微信
微信扫一扫关注我们

微信扫一扫关注我们

关注微博
返回顶部