数智化转型网szhzxw.cn 人工智能 邬贺铨院士在在国家数据局发表署名文章:AI时代的数据要素开发与治理

邬贺铨院士在在国家数据局发表署名文章:AI时代的数据要素开发与治理

邬贺铨院士

人工智能概念的提出已有60多年时间,但直到1997年IBM的深蓝计算机战胜国际象棋大师才为大众所知,不过基于专家系统经验的智能有限。2016年,AlphaGo战胜世界围棋高手初显大数据实力,但也只是在规则下的算法熟能生巧,类似的方法推动了自然语言识别与人脸识别技术的发展。2022年底,ChatGPT的问世标志着人工智能从判别式发展到生成式的跨越,虽然目前的大模型只是针对特定任务和指定模态,离通用人工智能还有不少距离,但语言大模型让机器初步具有常识,懂得推理,学会创作,让人和机器能以较自然的方式互动,通过与周边工具的结合,表现出拟人的智能。

与AlphaGo将数据作为查询和判别的依据不同,ChatGPT可以说读透与消化了数据,融会贯通计上心来,得出源于数据高于数据的结论。生成式大模型赋予数据以新的生命力,AI时代大数据蕴含的价值将进一步涌现。数据因AI而变得越来越重要,数据要素是新型生产力的代表,数据挖掘能力成为新时代的国家重要竞争力。   数字化转型网www.szhzxw.cn

一、培育数据资源,促进开放共享 

数据是生产和生活过程的记录及对自然观察的结果。2022年中国人口占全球18%,网民占全球21.5%,GDP占全球18.06%。据国家网信办《数字中国发展报告(2022年)》数据显示,2022年我国数据产量达8.1ZB,同比增长22.7%,全球占比为10.5%,位居全球第二;我国数据存储量达724.5EB,同比增长21.1%,全球占比为14.4%。中国产生和存储的数据在全球的占比均低于中国的人口、网民和经济规模在全球的比例。据Synergy Research Group截至2021年Q3季度统计,美国大规模数据中心在全球占比高达49%,其次是中国占比为15%。可见我国数据存储量与美国相比差距还比较大,这反映了我国在社会信息化和产业数字化程度上仍落后于美国,加快数字中国的建设将有望尽快改变这一状况。   

 政府与研究机构及企业都会存储大量数据,其中政府掌握全社会数据约80%,而且是高质量数据,但主要却仅供内部使用甚至是本部门内小单位各自存储和使用而非共享,数据利用率不高。需要从制度上明确共享内容、权限和责任,促进政府部门间数据共享,更精准地把握社会和经济运行全局,提升政府部门间工作的协同性。

与共享相比,数据开放更是社会数字化的标志之一,政府及企事业单位掌握的公共数据具有很强的社会性,政府开放数据对提升政府公信力、降低社会成本,带动数字经济发展有重要作用。国际上将政府数据开放作为数字政府的重要衡量指标,据《联合国电子政务调查报告2022》数据显示,从2012年到2022年的十年间,中国在线服务指数从0.5294上升到0.8876,在193个国家中排名从第62位上升到第15位,爱沙尼亚、芬兰、韩国位居前三,美国第8,日本第9。 数字化转型网www.szhzxw.cn

我国还存在政务数据标准规范体系待健全、政务数据统筹管理机制待完善,政务数据安全保障能力待加强的问题,需要从建设数据流通基础制度体系入手,加快数据立法,完善制度规范,统筹协调推进,编制数据目录,分类分级管理,夯实共享开放机制,提升安全保障。     除了政府开放数据以外,社会公共数据的开源开放也表征数据流通的水平。人工智能生成内容(AIGC)大模型都是利用语料库训练的,一些互联网大厂利用电商、社交、搜索等业务收集和标注了海量的语料供自身训练大模型使用,没有语料积累的企业和研究机构虽然可以从网络获得语料,但自媒体内容质量良莠不分,未经清洗与标注就用作大模型的训练语料其效果堪忧。ChatGPT大模型训练时使用了开源语料库,但中文词元(Token)占比不到0.1%,还不及一些小语种的比例,其中的原因与中文开源语料库数量少和规模小有关。

国内高校也有数亿到数十亿字的语料库但尚未开源。国内一些语言大模型直接采用国外开源语料库训练,在价值观的把控上存在潜在风险,建议对面向公众开放应用的对话类大模型需要做语料来源的评估。面向重要应用场景的大模型不宜强调训练用数据免标注和无监督学习,还是要采用经过清洗标注的数据集和保留人工微调,即有监督学习环节。     行业大模型的训练也面临挑战,专业数据没有通用数据容易获得,行业内的企业间往往不愿共享专业数据。

为此有必要建立高质量国家级重要行业领域基础知识库、数据库、资源库等。此外要鼓励社会数据要素的合理流动和利用。中共中央、国务院《关于构建数据基础制度更好发挥数据要素作用的意见》提出,依法规范、共同参与、各取所需、共享红利的发展模式,将合理降低市场主体获取数据的门槛,增强数据要素共享性、普惠性,激励创新创业创造。 

二、大模型驱动数据范式创新   

基础大模型通常从通用语料训练生成,通识能力强,从聊天对话入手容易反馈迭代优化,但聊天难成刚需,落地行业应用将更显大模型的价值,但基础大模型缺乏行业专业知识,需要大模型提供方与垂直行业合作开发行业大模型。一种模式是企业将数据交基础大模型进行再训练,待调优至理想后再进行知识蒸馏、量化及针对特定场景迁移等缩小模型规模的工作,但后续模型微调和云边端部署等仍需算法工程师支撑,企业技术力量不足还得依赖模型提供方,企业数据交到模型提供方有数据泄漏风险,但数据不全面则会导致训练效果差。

另一种模式是企业具有算法工程师,按照特定业务场景以专有数据对基础大模型进行微调,形成行业大模型或多个基于实际业务的小模型,最好是在预训练阶段就加入垂直行业企业的数据,预训练和指令微调交错进行,提高模型对行业知识的表达、理解、迁移和泛化能力。     一些强监管、重数据安全的行业核心企业,例如头部金融机构等,通常不会在第三方基础大模型上构建专业大模型,而是采用数据私有化、模型私有化、本地私有云方式构建大模型,即在加密环境中使用私有数据训练专业大模型,但需要面对成本与技术门槛高的挑战。总的来说,无论自建或合作开发行业大模型,数据安全都是前提,既掌握大模型训练技术又熟悉行业专业知识的人才是关键。 

中小企业因资金、技术和人才的限制,少有能力与基础大模型提供方合作开发行业大模型,MaaS(模型即服务)应运而生,这是针对中小企业而提出的服务模式,MaaS部署在中小企业本地设备上或公有云上,以小切口嵌入PaaS与SaaS间,并提供调用基础大模型的接口,可加入企业自身数据对模型精细化调整,从而将大模型能力嵌入到SaaS产品上,解决了传统SaaS面临的客户定制化需求和标准化产品规模化盈利之间的难题。基于MaaS通过大模型可优选小程序及配套的低代码开发和模型编排等工具,PaaS可据此搭建低代码平台,丰富工具软件,实现数据和功能的定制化,以MaaS方式使中小企业上云的同时使用个性化的小模型,为数字化转型提供智能解决方案。

当前大模型不仅是一种技术,它重塑了数据要素生态链,引领产业研究开发应用的范式变革,标志着信息化发展从网络驱动到数据驱动。面对大模型浪潮,需要在国家战略与规划部署下,统筹推进政产学研用,引导“百模并发”形成合力,避免资源分散和低水平重复,实现数据采集汇聚、加工处理、流通交易、开发应用全链条协同。 

三、数据助力社会治理信息化   

习近平总书记指出“随着互联网运用普及和大数据等技术快速发展,国家治理正逐步从线下向线下线上相结合转变,从掌握少量‘样本数据’向掌握海量‘全体数据’转变,这为推动治理模式变革、提升国家治理现代化水平提供了有利条件”。从网格化管理、精细化服务、信息化支撑的基层治理平台,到一网统揽一网通办的城市大脑,利用大数据、人工智能、物联网等信息化手段感知社会态势、畅通沟通渠道、支持快速响应,推进政府决策科学化、社会治理精准化、公共服务高效化。特别是AIGC技术的应用,重构政府与民众之间的互动过程,大模型能够提升对现实生活中复杂大系统问题的处理能力,能够精准防控社会发展中的风险,能够有力维护政治稳定和社会安全,进一步促进经济发展和社会进步。 

AI特别是生成式大模型技术是双刃剑,其推理过程不透明,解答有自圆其说的成分,尤其使用了未经鉴别的语料训练用于社会治理的大模型,可能会触发对公众的误导,甚至引起价值观的冲突。AI技术也可能被滥用或恶意利用来制造虚假新闻,引发社会传播风险,危害国家安全。我们既要用AI来辅助社会治理也要治理AI行为,但不能因AI的使用可能失控而限制对AI技术的研究与应用,AI技术需要在应用中反馈和迭代升级。当前,国际贸易、科技合作、人员往来不可避免数据跨境流动,数据的社会治理也面临对外开放的挑战,解决之道是AI监管制度体系建设与AI技术研究并重,发展与安全治理协同,使AI的监管创新与技术发展相辅相成,以技术手段和治理规范两手应对大模型的算法偏见和伦理道德失序,以法律法规防止各类数据安全事件发生和维护国家安全。 数字化转型网www.szhzxw.cn   

AI特别是生成式大模型技术是双刃剑,其推理过程不透明,解答有自圆其说的成分,尤其使用了未经鉴别的语料训练用于社会治理的大模型,可能会触发对公众的误导,甚至引起价值观的冲突。AI技术也可能被滥用或恶意利用来制造虚假新闻,引发社会传播风险,危害国家安全。我们既要用AI来辅助社会治理也要治理AI行为,但不能因AI的使用可能失控而限制对AI技术的研究与应用,AI技术需要在应用中反馈和迭代升级。当前,国际贸易、科技合作、人员往来不可避免数据跨境流动,数据的社会治理也面临对外开放的挑战,解决之道是AI监管制度体系建设与AI技术研究并重,发展与安全治理协同,使AI的监管创新与技术发展相辅相成,以技术手段和治理规范两手应对大模型的算法偏见和伦理道德失序,以法律法规防止各类数据安全事件发生和维护国家安全。

为此,首先需要按照《关于构建数据基础制度更好发挥数据要素作用的意见》,尽快完善数据产权制度、数据要素流通和交易制度、数据要素收益分配制度、数据要素治理制度,为AI技术的发展与治理提供行为规范。其次是重视数据监管的技术创新,APN6(基于IPv6的应用感知网络) 和iFIT(基于IPv6的随流检测)可以标注IP流的属性,包括数据类型和对IP流路径溯源,有利于对跨境数据流动的管理,IPv6的多归属特性可以分流敏感数据。多方计算等技术可以在不同所有者的数据融合时做到数据可用不可见。加快各类数据监管和数据安全技术的研究已成当务之急,要为数据管理规范尽快填补技术支撑手段的不足或缺失。 

四、加快数据基础设施建设 

大模型的数据训练与推理都需要算力支撑,中国2022年算力总规模为180Eflops,低于2021年美国的200Eflops,其中智能算力2022年中国为41Eflops,不及2021年美国的65Eflpos,这反映了我国在大模型的数据训练和推理算力上的差距。算力的建设是市场行为,但国家统筹推进将优化资源的利用和产业的合理布局。“东数西算”作为国家战略部署具有中国特色,反映我国区域经济、地理气候特点和能源分布的格局,政府之手的作用在东西部数据资源配置与有效应用上不是可有可无的。西部不足之处是数据中心产业配套能力薄弱和人才短缺,需要同步规划布局数据清洗标注、数据机房产品及服务业的培育发展,延伸产业链上下游,在做好承接东部的温冷数据的存算的同时,还要带动起当地热数据的上云服务,使西部的数据集群发展形成良性循环。    数字化转型网www.szhzxw.cn

算力的布局需要处理好几方面的关系,一是通用算力与智能算力的合理比例,通用算力以CPU为主,适合处理政务、智慧城市和智能客服等数据/计算密集的事务性任务;智能算力以GPU为主,适合做大模型的训练,注意到在数据训练过程中还需要算法工程师介入和微调,智算中心适于在数据源集中和算法工程师聚集地建设,不宜全面开花,动用财政资金支持的大型智算中心的建设应慎重规划。二是自建算力与云原生算力,很多单位有自建算力的积极性,但麦肯锡报告显示,商用和企业数据中心的服务器很少超过6%的利用率,通常高达30%的服务器带电闲置。

需要鼓励中小企业从自购AI服务器搭建数据中心向采购云服务转变,既降低成本又提高利用率,增强抗DDoS的能力及减碳;需要引导县级地方政府使用省地集中建设的政务云代替独立采购IT基础设施。三是存算比例,存力与算力需配合,内存与算力合理比例是GB/Gflops为1,避免因存力短缺造成算力等待而影响处理效率,据华为/罗兰贝格报告,2020年美国为1:0.9,中国为1:2.4。四是灾备容量与主用数据中心存储容量之比,数据中心需异地双容灾备份,关键数据实现本地双活,2020年当年数据灾备保护占数据中心存储投资的比例全球平均为27.4%,而我国只有7.8%,需重视改进。

数据作为生产要素是经济理论与实践的创新。数据与土地、劳动力、资本等传统生产要素不同,数据要素的开发与治理有很多需要深入研究的问题,例如数据的可复制性、使用无损性等导致数据产权和安全管理边界难以界定。党中央决策部署组建国家数据局,负责协调推进数据基础制度建设,统筹数据资源整合共享和开发利用,统筹推进数字中国、数字经济、数字社会规划和建设等,将有力促进数据要素技术创新、开发利用和有效治理,以数据强国支撑数字中国的建设。

案例|华润置地数据创新探索
数字化转型网www.szhzxw.cn

英文翻译:

The concept of artificial intelligence has been around for more than 60 years, but it wasn’t until IBM’s Deep Blue computer beat a chess master in 1997 that it became widely known, but the intelligence based on expert systems experience is limited. In 2016, AlphaGo defeated the world Go master to show the strength of big data. But it is only under the rules of the algorithm practice can make perfect, similar methods to promote the development of natural language recognition. And face recognition technology.

At the end of 2022, the advent of ChatGPT marks the leap from discriminative development of artificial intelligence to generative. Although the current large model is only for specific tasks . And specified modes, there is still a lot of distance from general artificial intelligence, but the language large model allows the machine to initially have common sense, understand reasoning, learn to create, so that people and machines can interact in a more natural way. By combining with the surrounding tools, it shows anthropomorphic intelligence.

Unlike AlphaGo, which uses data as a basis for querying and judging, ChatGPT can be said to read . And digest the data, understand the calculation and draw conclusions from the data above the data. Generative large models give new vitality to data. And the value contained in big data in the AI era will further emerge. Data is becoming more and more important because of AI, data elements are representatives of new productivity. And data mining ability has become an important national competitiveness in the new era.

First, cultivate data resources and promote open sharing

Data is the record of production and life processes and the result of observation of nature. In 2022, China will account for 18% of the world’s population, 21.5% of the world’s Internet users . And 18.06% of the world’s GDP. According to the “Digital China Development Report (2022)” data of the National Cyberspace Administration . China’s data output in 2022 reached 8.1ZB, an increase of 22.7% year-on-year, accounting for 10.5% in the world, ranking second in the world; China’s data storage capacity reached 724.5EB, an increase of 21.1%, accounting for 14.4% in the world. China generates and stores a smaller share of the world’s data than its population, Internet users and economy.

According to Synergy Research Group statistics as of Q3 2021, the United States accounted for 49% of the world’s large-scale data centers, followed by China accounted for 15%. It can be seen that the gap between China’s data storage and the United States is still relatively large, which reflects that China still lags behind the United States in the degree of social informatization. And industrial digitization, and accelerating the construction of digital China is expected to change this situation as soon as possible.

The government, research institutions and enterprises will store a large amount of data, of which the government holds about 80% of the whole social data. And it is high-quality data, but it is mainly only for internal use or even for small units within the department to store . And use rather than share, and the data utilization rate is not high. It is necessary to clarify the sharing content, authority and responsibility from the system, promote data sharing among government departments, more accurately grasp the overall social and economic operation. And enhance the coordination of work between government departments.

Compared with sharing, data opening is one of the symbols of social digitalization.

The public data held by the government, enterprises and institutions has a strong social character, and government opening data plays an important role in enhancing government credibility, reducing social costs. And driving the development of digital economy. International government data openness as an important measure of digital government, according to the United Nations e-Government Survey Report 2022 data show that from 2012 to 2022. China’s online service index rose from 0.5294 to 0.8876, ranking in 193 countries rose from 62nd to 15th. Estonia, Finland and South Korea rounded out the top three, while the United States ranked eighth and Japan ninth. 数字化转型网www.szhzxw.cn

China still has the government data standard standard system to be improved, government data overall management mechanism to be improved, government data security protection ability to strengthen the problems, need to start from the construction of data circulation basic system system, speed up data legislation, improve system norms, overall coordination, compilation of data catalog, classification. And classification management, tamping sharing and opening mechanism, improve security.       In addition to the government’s open data. The open source of social public data also represents the level of data circulation. Large models of artificial intelligence-generated content (AIGC) are trained using corpora.

Some Internet giants collect and label massive corpus for their own training of large models by means of e-commerce, social networking, search and other services.

Although enterprises and research institutions without corpus accumulation can obtain corpus from the Internet. The quality of we-media content does not distinguish between good and bad. The effect of large model training corpus without cleaning and labeling is not satisfactory. The open source corpus was used in the training of ChatGPT’s large model, but Chinese tokens accounted for less than 0.1%, which is lower than the proportion of some small languages . And the reason is related to the small number and scale of Chinese open source corpus.

Chinese universities also have corpus of hundreds of millions to billions of words but have not yet opened source.

Some large language models in China directly use foreign open source corpus for training, which has potential risks in the control of values. It is suggested to evaluate the source of corpus for large dialogue models open to the public. Large models for important application scenarios should not emphasize free labeling and unsupervised learning of training data. Or use cleaned and labeled data sets and retain manual fine-tuning, that is, supervised learning links. The training of large industry models also faces challenges. Professional data is not as easy to obtain as general data, and enterprises in the industry are often reluctant to share professional data. 数字化转型网www.szhzxw.cn

To this end, it is necessary to establish a high-quality national important industry basic knowledge base, database, resource base, etc. In addition, the rational flow and utilization of social data elements should be encouraged. The Opinions of the Central Committee of the Communist Party of China and The State Council on Building a Data basic System to Better Play the role of data Elements put forward that the development model of standardizing according to law, participating jointly, taking what each needs. And sharing dividends will reasonably lower the threshold for market players to obtain data, enhance the sharing and universality of data elements. And encourage innovation, entrepreneurship and creation.

Second, big model drives data paradigm innovation

The basic large model is usually generated from the training of general corpus, with strong general knowledge and easy feedback . And iterative optimization starting from the chat dialogue. But the chat is difficult to become just need, and the value of the large model will be more obvious in the application of the industry. But the basic large model lacks industry expertise, and the large model provider needs to cooperate with the vertical industry to develop the industry large model. In one mode, the enterprise hands over the data to the basic large model for retraining, and then carries out knowledge distillation, quantification, and migration for specific scenarios to reduce the scale of the model after it has been optimized to an ideal level.

However, the algorithm engineers are still required to support the subsequent model fine-tuning and cloud-edge deployment. And the enterprise relies on the model provider due to its lack of technical strength. However, incomplete data will lead to poor training results 数字化转型网www.szhzxw.cn

. The other model is that the enterprise has algorithm engineers who fine-tune the basic large model with proprietary data according to specific business scenarios to form an industry large model or multiple small models based on actual business. It is best to add the data of vertical industry enterprises in the pre-training stage . And pre-training and instruction fine-tuning are staggered to improve the model’s expression, understanding, migration and generalization ability of industry knowledge.      

Some core enterprises in the industry with strong supervision and heavy data security.

such as the head financial institutions, usually do not build professional large models on the third-party basic large models, but use data privatization, model privatization. And local private cloud to build large models, that is, use private data to train professional large models in the encryption environment, but need to face the challenge of high cost and technical threshold. In general, whether self-built or co-developed large industry models, data security is a prerequisite. And talents who master both large model training techniques and familiar with industry expertise are key.

Due to capital, technology and talent constraints, small and medium-sized enterprises have little ability to cooperate with basic large model providers to develop large models of the industry. MaaS (Model as a service) emerged, which is a service model proposed for small and medium-sized enterprises. MaaS is deployed on local devices of small and medium-sized enterprises or on public cloud . And embedded between PaaS and SaaS with a small incision.

It also provides an interface to call the basic large model.

And can add the enterprise’s own data to fine-tune the model, so as to embed the large model capability into the SaaS product, solving the problem between the traditional SaaS customer customization needs and the standardized product scale profit. Based on MaaS, small programs can be selected through large models and supporting tools such as low-code development and model orchestration. PaaS can build a low-code platform accordingly, enrich tool software, and customize data and functions. MaaS enables smes to use personalized small models while going to the cloud, and provides intelligent solutions for digital transformation. 数字化转型网www.szhzxw.cn

The current large model is not only a technology, it reshapes the ecological chain of data elements, leads the paradigm change of industrial research, development and application. And marks the development of information technology from network driven to data driven. In the face of the wave of large models, it is necessary to coordinate the promotion of government, industry, university and research under the national strategy and planning deployment, guide the “hundred-model concurrency” to form a synergy, avoid resource dispersion . And low-level duplication, and achieve the whole chain of collaboration in data collection and aggregation, processing, circulation and transaction, and development and application.

Third, data support social governance informatization

From the grass-roots governance platform supported by grid management, refined services and information technology, to the urban brain of the one-network unified one-network operation. The use of big data, artificial intelligence, Internet of things and other information means to perceive social trends, smooth communication channels, support rapid response, and promote scientific government decision-making, accurate social governance, and efficient public services. In particular, the application of AIGC technology can reconstruct the interaction process between the government and the public. The large model can improve the ability to deal with complex large-scale system problems in real life. Accurately prevent and control risks in social development, effectively maintain political stability. And social security, and further promote economic development and social progress. 数字化转型网www.szhzxw.cn

AI, especially generative large model technology, is a double-edged sword. Its reasoning process is not transparent and its answers are self-consistent. In particular, it uses unidentified corpus to train large models for social governance, which may trigger misleading to the public and even cause conflicts of values. AI technology may also be abused or maliciously used to create false news, causing social communication risks and endangering national security. We should not only use AI to assist social governance but also to govern AI behavior. But we should not limit the research and application of AI technology because the use of AI may be out of control. AI technology needs feedback and iterative upgrading in the application.

At present, international trade, scientific and technological cooperation, personnel exchanges are inevitable cross-border flow of data.

And social governance of data is also facing the challenge of opening up to the outside world. The solution is to pay equal attention to the construction of AI regulatory system and AI technology research. And coordinate development and security governance, so that AI regulatory innovation and technological development complement each other. Both technical means and governance norms should be adopted to deal with algorithmic bias and ethical disorder of large models. And laws and regulations should be adopted to prevent various data security incidents and safeguard national security.

To this end, it is first necessary to improve the data property rights system. The data element circulation and transaction system. The data element income distribution system, and the data element governance system as soon as possible in accordance with the Opinions on Building a Data basic system to better Play the Role of Data elements. So as to provide norms of behavior for the development and governance of AI technology.

APN6 (ipv6-based Application Awareness Network) and iFIT (ipv6-based flow detection) can mark the attributes of IP flow, including data type. And IP flow path traceability, which is conducive to the management of cross-border data flow. IPv6’s multi-homing feature can distribute sensitive data. Techniques such as multi-party computing can make data invisible when data from different owners is merged. It has become urgent to speed up the research of various data supervision and data security technologies . And to fill the insufficiency or lack of technical support means for data management norms as soon as possible.

Fourth, accelerate the construction of data infrastructure

The data training and reasoning of large models require the support of computing power. The total scale of computing power in China in 2022 is 180Eflops. Lower than 200Eflops in the United States in 2021. And the intelligent computing power in China in 2022 is 41Eflops, lower than 65Eflpos in the United States in 2021. This reflects the gap between the data training and inference computing power of large models in China. The construction of computing power is a market behavior. But the overall promotion of the state will optimize the use of resources and the reasonable layout of the industry.

As a national strategic deployment, “counting in the East and counting in the West” has Chinese characteristics. Reflecting China’s regional economic, geographical and climatic characteristics and energy distribution pattern. The role of the government’s hand in the allocation and effective application of data resources in the east and west is not optional. The deficiency of the west is the weak supporting capacity of the data center industry and the shortage of talents, it is necessary to synchronize the planning and layout of data cleaning and labeling, the cultivation and development of data room products and service industries, extend the upstream and downstream of the industrial chain, and do a good job in the storage and calculation of warm and cold data in the east, but also to drive the local hot data cloud services, so that the development of data clusters in the West has formed a virtuous cycle.

The layout of computing power needs to deal with several aspects of the relationship.

One is the reasonable proportion of general computing power and intelligent computing power, general computing power is mainly CPU, suitable for dealing with data/computation-intensive transactional tasks such as government affairs, smart cities. And intelligent customer service. The intelligent computing power is mainly GPU, which is suitable for the training of large models. It is noted that the intervention and fine-tuning of algorithm engineers are needed in the process of data training.

The intelligent computing center is suitable for the construction of data source concentration. And algorithm engineers gathering place. And should not be fully developed. Many units have the enthusiasm to build their own computing power. But McKinsey’s report shows that the utilization rate of servers in commercial and enterprise data centers rarely exceeds 6%. And usually up to 30% of the servers are live and idle. 数字化转型网www.szhzxw.cn

Small and medium-sized enterprises need to be encouraged to shift from purchasing AI servers to building data centers to purchasing cloud services to reduce costs. And improve utilization, enhance anti-ddos capabilities and reduce carbon. IT is necessary to guide local governments at the county level to use the government cloud built centrally at the provincial level instead of independently purchasing IT infrastructure. The reasonable ratio of memory and computing power is 1 GB/Gflops to avoid affecting the processing efficiency caused by the shortage of memory and computing power waiting.

According to the Huawei/Roland Berger report, in 2020.

The ratio is 1:0.9 in the United States and 1:2.4 in China. Fourth, the ratio of disaster recovery capacity to the storage capacity of the primary data center. Data centers need remote dual disaster recovery backup, and key data is locally hypermetro. In 2020, the global average proportion of data disaster recovery protection in data center storage investment is 27.4%, while only 7.8% in China.

Data as a factor of production is the innovation of economic theory and practice. Data is different from traditional production factors such as land, labor, capital, etc. There are many problems in the development and governance of data factors that need to be further studied, such as data replicability and non-destructive use, which make it difficult to define the boundary of data property rights and security management. The Central Committee of the CPC Central Committee decided to set up a National Data Bureau, which is responsible for coordinating and promoting the construction of data infrastructure systems, coordinating the integration, sharing, development and utilization of data resources, and coordinating the planning and construction of digital China, digital economy, and digital society, which will effectively promote the technological innovation, development, utilization and effective governance of data elements, and support the construction of digital China with data power.

本文由数字化转型网(www.szhzxw.cn)转载而成,来源于CDO研习社;编辑/翻译:数字化转型网小汤圆。

数字化转型网www.szhzxw.cn

免责声明: 本网站(https://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。

本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。

免责声明: 本网站(http://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。 本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。https://www.szhzxw.cn/25751.html
联系我们

联系我们

17717556551

邮箱: editor@cxounion.org

关注微信
微信扫一扫关注我们

微信扫一扫关注我们

关注微博
返回顶部