什么是湖仓一体?

跟数字化转型一样,很多人对数据中台的理解也不太一样,任何事物都不是凭空发生的,先看下最近湖仓一体技术发展的三个阶段

此图片的alt属性为空;文件名为%E5%9B%BE%E7%89%87-865.png

第一代:以数据仓库为核心的平台。因为数据仓库是只能支持结构化数据,无法支持更多类型的数据,比如视频、音频、文档等。中国数字化转型网www.szhzxw.cn

第二代:数据湖+数据仓库的两层架构。从原始数据到数据湖,从数据湖到数据仓库,需要经过额外的ETL,不但增加了数据出错的机率,而且增加了成本和时间。

第三代:湖仓一体,深度融合,可以看成全部封装好了,然后对外提供统一的服务。

数据仓库:数据仓库的目的是构建面向分析的集成化数据环境,从多个数据源抽取有价值的数据,在仓库内转换和流动,提供给BI等分析工具为企业提供决策支持中国数字化转型网www.szhzxw.cn

数据湖:首先借用一句广告语“大自然的搬运工”,将业务系统数据(结构化、半结构化、非结构化)汇聚到湖里,然后湖中的数据可供存取、处理、分析及传输。为什么不叫数据海呢?我们通常认为大海是人类无法掌控的,数据汇聚到湖里的目的是为了能控制和使用湖里的数据。

此图片的alt属性为空;文件名为%E5%9B%BE%E7%89%87-867.png

湖仓两层架构:左边这张图,是湖仓分离的。可以认为1+1=2,甚至小于2,需要做一些额外的工作比如ETL,就像图中小猫那样做一些猫工操作。

这个架构里面,元数据较难统一,同时还需要为数据湖、数据仓库两套存储系统分别对接不同的计算引擎,造成重复的数据开发成本,以及两套存储共存也会造成数据冗余、数据不一致等风险。

湖仓一体平台,类似于在湖边搭建了很多小仓库,如果取名的话,可以叫数据分析仓库、机器学习仓库、搜索引擎仓库、数据API服务仓库等。中国数字化转型网www.szhzxw.cn

这个架构实现数据仓库和数据湖的数据/元数据无缝打通和自由流动。避免了1+1这种模式的问题。正常情况下,图中的猫是可以睡觉,除非有自动异常预警。

其实图示里,这些仓库应该在湖中或者湖面下会更合适,因为湖仓一体对上层应用来说(比如数据开发),可以理解为封装好了一个协同平台,如果是自研产品,也可以看成是一个黑盒子。

翻译:

What is a lake?

The understanding of the platform is not the same, anything does not happen in a vacuum, first look at the three stages of the recent development of lake warehouse technology.

First generation: Platform with data warehouse as the core. Because the data warehouse can only support structured data, it cannot support more types of data. Such as video, audio, documents, etc. China Digital Transformation network www.szhzxw.cn中国数字化转型网www.szhzxw.cn

The second generation: two-tier architecture of data lake + data warehouse. From the raw data to the data lake, and from the data lake to the data warehouse, additional ETL is required. Which not only increases the probability of data errors, but also increases the cost and time.

The third generation: Lake warehouse one, deep integration, can be seen as all packaged, and then provide unified services.

Data warehouse, data lake, lake warehouse two-tier architecture

Data warehouse: The purpose of data warehouse is to build an integrated data environment for analysis, extract valuable data from multiple data sources, transform and flow in the warehouse, and provide analytical tools such as BI to provide decision support for enterprises. China Digital Transformation network www.szhzxw.cn中国数字化转型网www.szhzxw.cn

Data lake: First borrow an advertising phrase “nature’s porter”, the business system data (structured, semi-structured, unstructured) gathered into the lake, and then the data in the lake can be accessed, processed, analyzed and transmitted. Why not call it the Data sea? We often think that the sea is beyond human control, and the purpose of data accumulation in the lake is to control and use the data in the lake.

The two-layer structure of the lake silo: The picture on the left is the separation of the lake silo. It can be said that 1+1=2, or even less than 2, requires some extra work such as ETL, like the cat in the picture doing some cat work.

In this architecture, it is difficult to unify metadata

In this architecture, it is difficult to unify metadata, and it is necessary to connect different computing engines for the data lake and data warehouse storage systems, resulting in repeated data development costs, and the coexistence of two sets of storage may also cause data redundancy and data inconsistency risks.中国数字化转型网www.szhzxw.cn

Lake warehouse one platform, similar to the lake built a lot of small warehouses. If named, can be called data analysis warehouse, machine learning warehouse, search engine warehouse, data API service warehouse and so on. China Digital Transformation network www.szhzxw.cn

This architecture enables seamless and free flow of data/metadata across data warehouses and data lakes. Avoid the problem of the 1+1 pattern. Normally, the cat in the picture can sleep unless there is an automatic alarm.中国数字化转型网www.szhzxw.cn

In fact, in the diagram, these warehouses should be more suitable in the lake or under the lake, because the lake warehouse is a body for upper-level applications (such as data development), which can be understood as a collaborative platform. And if it is a self-developed product, it can also be seen as a black box.

中国数字化转型网www.szhzxw.cn
免责声明: 本网站(http://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。 本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。http://www.szhzxw.cn/21328.html

数据中台和数据仓库的区别是什么?

联系我们

联系我们

17717556551

邮箱: editor@cxounion.org

关注微信
微信扫一扫关注我们

微信扫一扫关注我们

关注微博
返回顶部