
1、数据湖的定义
数据湖是一个以原始格式存储数据的存储库或系统。数据湖按原样存储数据,而无须事先对数据进行结构化处理。数据湖可以存储结构化数据(如关系型数据库中的表)、半结构化数据(如CSV、日志、XML、JSON)、非结构化数据(如电子邮件、文档、PDF)和二进制数据(如图形、音频、视频)。
2、数据湖的作用
数据湖可以更方便、以更低的成本解决不同数据结构的统一存储问题,同时数据湖还能够为机器学习提供全局数据。我们可以将数据湖理解为一个融合了大数据集成、存储、处理、机器学习、数据挖掘的解决方案。
英文翻译:
- Definition of data lake
A data lake is a repository or system that stores data in its original format. A data lake stores data as-is without prior structured processing of the data. A data lake can store structured data (such as tables in a relational database), semi-structured data (such as CSV, logs, XML, JSON), unstructured data (such as email, documents, PDF), and binary data (such as graphics, audio, video).
- The role of the data lake
Data lakes can make it easier and cheaper to solve the problem of unified storage of different data structures, and data lakes can also provide global data for machine learning. We can understand the data lake as a solution that integrates big data integration, storage, processing, machine learning, and data mining.

本文由数字化转型网(www.szhzxw.cn)转载而成,来源于网络;编辑/翻译:数字化转型网默然。



