享受高质量 AI 数据的盛宴

人工智能渴望高质量的数据。了解如何正确喂养它。

一目了然

避免低质量的数据可以提升 AI 模型。
数据质量会影响 AI 系统的完整性、质量和可靠性。
从长远来看，严格的数据治理会带来更高质量的 AI 数据。

垃圾输入，垃圾输出的概念可以追溯到软件开发的早期。然而，有缺陷、质量差的数据也是人工智能开发人员需要提防的威胁。如果用于训练 AI 模型的数据不完整、不准确、不一致或有偏见，其预测和决策将是有缺陷的，很可能是无用的。数字化转型网（www.szhzxw.cn）

埃森哲联邦服务部数据和人工智能负责人 Noelle Russell 在接受电子邮件采访时表示，保持数据质量以提供 AI 系统就像培育花园一样。“这需要勤奋、有意识的关怀和对生态系统的深刻理解。”

确保数据质量需要多方面的方法。“这包括建立强大的数据治理框架，实施全面的数据验证和清理流程，以及在组织内培养数据素养文化，”Russell 建议道。通过将数据视为宝贵的资产，组织可以确保其 AI 系统获得高质量、相关且公正的数据。数字化转型网（www.szhzxw.cn）

Russell 说，通过同理心的视角来衡量数据质量至关重要。“这使数据工作者能够以新的方式查看数据，并就预期的解决方案如何为更多人服务提出更好的问题。”

成功的人工智能，尤其是生成式人工智能计划，需要干净、组织良好且可访问的数据。“在混乱的数据上构建人工智能就像建造一艘闪亮的新火箭飞船，你打算把它带到火星上，但你甚至没有燃料把它从发射台上拿下来，”技术服务和咨询公司NTT Data的首席人工智能官温迪柯林斯通过电子邮件说。

柯林斯强调了密切关注最重要的数据的重要性。“在数据质量方面，没有一个组织能够达到完美的顶峰，所以要专注于最重要的事情，然后从那里开始。”

1. 多重优势

提供 AI 质量数据会带来多种好处。“它提高了人工智能预测和决策的准确性和可靠性，”技术咨询公司Hedgehog Lab的首席技术官Ed Marshall通过电子邮件说。高质量的数据还可以确保 AI 模型在准确、全面和相关的数据集上进行训练，从而获得更有效的结果。也许最重要的是，高质量的数据降低了人工智能偏见的风险，这可能是由不完整或扭曲的数据引起的。“高质量的数据还可以通过减少对不断重新训练和调整的需求来提高人工智能系统的效率。” 数字化转型网（www.szhzxw.cn）

卓越的数据可显著提高 AI 预测和决策的准确性和可靠性，从而带来更好的业务成果。Russell说：“它还有助于通过确保人工智能操作的透明度和公平性，在用户和利益相关者之间建立对人工智能系统的信任。“高质量的数据还可以降低人工智能偏见的风险，这对于人工智能的道德实践至关重要。”

柯林斯建议将近期资源集中在最有价值的数据元素上。“我们不相信对人工智能采取一种大爆炸式的方法，你建造和建造，然后奇迹在三年后发生，”她说。“我们的理念是在此过程中逐步创造价值的机会。”

2. 做正确的事

IT 领导者经常对 AI 数据质量采取错误的方法。“一个常见的错误是低估了多样化和代表性数据集的重要性，”Russell说。“这可能导致有偏见的人工智能模型，这些模型在各种场景或不同人群中表现不佳。此外，许多领导者低估了构建可以负责任地扩展的模型所需的资源。Russell 建议投资于经过充分测试且强大的数据管道，并专注于输入数据模型标准化和全面的数据验证，以确保输入生成模型的数据是高质量的。

Marshall 建议，保持长期 AI 数据质量的最佳方法是建立严格的数据治理框架。这需要为数据收集、处理和管理创建严格的协议。“这种方法的有效性在于它能够确保输入人工智能系统的数据是准确、一致和有代表性的，”他解释说。“通过维护高质量的数据，您可以降低 AI 输出中出现偏差、错误和异常的风险，这对于 AI 驱动决策的可靠性至关重要。”

IT 领导者往往低估了确保持续数据质量管理的重要性。“有一种普遍的误解，即一旦人工智能系统被训练和部署，重点就会从数据质量上转移开来，”马歇尔说。这可能是一个很大的错误，因为人工智能系统是动态的，需要不断提供高质量的数据以保持准确性和相关性。“忽视这一事实可能会导致人工智能性能随着时间的推移而下降，并且无法适应新的模式或操作环境的变化。” 数字化转型网（www.szhzxw.cn）

3. 最后的想法

鉴于人工智能的快速发展，Russell 认为当今的 IT 领导者必须继续学习，不仅要通过书籍和课程，还要通过实践经验。“现在是时候拥抱求知欲，并授权组织各个部门的人也这样做了。”

英文原文：

At a Glance

Avoiding poor quality data can boost AI models.
The quality of data impacts the integrity, quality, and reliability of AI systems.
Rigorous data governance leads to higher quality AI data in the long term.

Garbage-in, garbage out is concept that dates back to the earliest days of software development. Yet flawed, poor quality data is also a threat that AI developers need to watch out for. If the data used to train an AI model is incomplete, inaccurate, inconsistent, or biased, its predictions and decisions will be flawed and most likely useless.

Maintaining data quality to feed AI systems is like nurturing a garden, says Noelle Russell, data and AI lead with Accenture Federal Services, in an email interview. “It requires diligence, intentional care, and a deep understanding of the ecosystem.” 数字化转型网（www.szhzxw.cn）

Ensuring data quality requires a multi-faceted approach. “This includes establishing robust data governance frameworks, implementing comprehensive data validation and cleaning processes, and fostering a culture of data literacy within the organization,” Russell advises. By treating data as a valuable asset, organizations can ensure that their AI systems are fed with high-quality, relevant, and unbiased data.

It’s essential to approach data quality through an empathetic lens, Russell says. “This empowers data workers to see data in new ways and ask better questions about how the intended solution can serve more people.”

Successful AI, particularly generative AI initiatives, require clean, well-organized, and accessible data. “Building AI on messy data is like building a shiny new rocket ship that you intend take to Mars, but you don’t have the fuel to even get it off the launchpad,” says Wendy Collins, chief AI officer at technology service and consulting firm NTT Data via email.

Collins stresses the importance of paying the closest attention to the data that’s most important. “No organization is ever going achieve the pinnacle of perfection when it comes to data quality, so focus on what matters most and start there.” 数字化转型网（www.szhzxw.cn）

1. Multiple Benefits

Feeding AI quality data leads to multiple benefits. “It enhances the accuracy and reliability of AI predictions and decisions,” says Ed Marshall, CTO at technology consulting firm Hedgehog Lab, via email. Quality data also ensures that AI models are trained on accurate, comprehensive, and relevant datasets, leading to more effective outcomes. Perhaps most important, quality data reduces the risk of AI biases, which can result from incomplete or skewed data. “High-quality data can [also] improve the efficiency of AI systems by reducing the need for constant retraining and adjustments.”

Data excellence significantly improves the accuracy and reliability of AI predictions and decisions, leading to better business outcomes. “It also helps in building trust in AI systems among users and stakeholders by ensuring transparency and fairness in AI operations,” Russell says. “Quality data also reduces the risk of AI biases, which is critical for ethical AI practices.” 数字化转型网（www.szhzxw.cn）

Collins recommends focusing near-term resources on the most valuable data elements. “We don’t believe in having a big bang approach to AI where you build and build and then the magic happens three years later,” she says. “Our philosophy is to incrementally build in value creation opportunities along the way.”

2. Doing it Right

IT leaders often take the wrong approach to AI data quality. “One common mistake is underestimating the importance of diverse and representative datasets,” Russell says. “This can lead to biased AI models that don’t perform well across various scenarios or different groups of people.” Additionally, many leaders underestimate the resources needed to build models that can scale responsibly. Russell recommends investing in well-tested and robust data pipelines, and to focus on input data model standardization as well as comprehensive data validation to ensure that the data being fed into the generative model is high quality. 数字化转型网（www.szhzxw.cn）

The best way to maintain long-term AI data quality is by establishing a rigorous data governance framework, Marshall advises. This requires creating strict protocols for data collection, processing, and management. “The effectiveness of this approach lies in its ability to ensure that the data feeding into AI systems is accurate, consistent, and representative,” he explains. “By maintaining high-quality data, you reduce the risk of biases, errors, and anomalies in AI outputs, which is crucial for the reliability of AI-driven decisions.”

IT leaders often underestimate the importance of ensuring ongoing data quality management. “There’s a common misconception that once an AI system is trained and deployed, the focus can shift away from data quality,” Marshall states. That can be a big mistake, since AI systems are dynamic and require continuous feeding with high-quality data to maintain accuracy and relevance. “Neglecting this fact can lead to degraded AI performance over time and a failure to adapt to new patterns or changes in the operational environment.” 数字化转型网（www.szhzxw.cn）

3. A Final Thought

Given the rapid pace of AI advancement, Russell believes that today’s IT leaders must continue to learn, not only through books and courses, but also via hands-on experiences. “Now is the time to embrace intellectual curiosity and empower those in every part of your organization to do the same.”

本文由数字化转型网（www.szhzxw.cn）转载而成，来源于INFORMATIONWEEK.COM；编辑/翻译：数字化转型网宁檬树。

免责声明: 本网站(https://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿，凡在本网站出现的信息，均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性，但不保证有关资料的准确性及可靠性，读者在使用前请进一步核实，并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏，概不负任何法律责任。

本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时，请及时通知本站，予以删除。

享受高质量 AI 数据的盛宴

一目了然

1. 多重优势

2. 做正确的事

3. 最后的想法