随着数据对公司的成功比以往任何时候都更加重要,Python正在从数据专业人员的领域扩展到业务分析师和其他技术水平较低的用户。但是,如果您对Python相对陌生,那么有哪些机会呢?您应该注意哪些最佳实践来确保您的成功?
数据专业人员是一种珍贵的商品,在许多组织中,业务需求已经超过了数据团队的资源和能力。与此同时,业务分析人员也遇到了BI工具所能做的限制,他们正在寻找更高级的分析方法。Python是这里成功的关键。
Python的使用正在快速增长。在今年早些时候对2万多名开发人员进行的一项调查中,Python的受欢迎程度仅次于JavaScript,并且Python在过去6个月里增加了330万净新用户,在全球范围内达到1570万用户。
[用Serdar调到Dev,在5分钟或更少的时间内获得Go和Python编码技巧]
近年来,Python社区创建了新的框架和包,使非专业开发人员更容易使用该语言进行高级分析、机器学习和应用程序开发。例如NumPy,一个用于数值数据的开源Python库;Prophet用于运行预测,H3是Uber开始的一个用于操纵地理空间数据的项目。
Python向非专业开发人员的传播并非没有先例。随着自助服务BI工具的兴起,以及业务人员学习编写自己的Excel宏,也出现了类似的模式。Python的扩展使用将产生更大的影响,因为该语言本身功能强大。
一、Python分析入门
业务用户通常比专业开发人员更了解哪些具体的见解对他们的业务部门最有帮助,并且有几个入门级用例可以让他们开始使用Python。下面是三个例子:
(一)相关矩阵
相关矩阵是显示不同变量的相关系数的表格。这可以让你分析数据集的不同维度,以确定一个表现出行为a的人是否也可能表现出行为b,例如,相关矩阵对于确定在杂货店中哪些商品应该放在彼此附近,或者当电子商务用户结账时提供哪些额外的商品是有用的。
(二)主成分分析
另一个可能的起点是主成分分析,它可以减少有噪声的数据集的大小,并确定哪些属性对给定的结果具有最大的预测能力。例如,如果一家公司出售抵押贷款,主成分分析可以揭示哪些人口统计因素(收入、邮政编码、婚姻状况等)最能预测销售,有助于有针对性的活动和优惠。
(三)预测
企业面临的另一个常见问题是预测。考虑预测客户需求、销售或收入,所有成熟的企业都需要这样做。构建预测是探索预测分析的一种方式,使用Python中的Prophet或Scikit-Learn等开源库。
正如他们所说,强大的能力带来巨大的责任,Python新用户应该采用一些最佳实践,以确保他们构建的应用程序是健壮和安全的。
二、Python维护和管理
其中一个问题是维护Python包以确保依赖关系得到适当的管理。Anaconda在这里很有帮助,因为它极大地简化了包的管理和部署。使用Snowflake的Snowpark for Python,我们将最流行的Python包从Anaconda默认通道预安装到我们的Python运行时中,这样它们就不必手动安装了。我们还将Conda包管理器集成到Snowpark中,以管理Python包及其依赖项。
与任何数据项目一样,需要注意安全性和治理问题,但现代云数据平台提供了一个已经设置和配置好的运行时,用户可以利用这些平台内置的安全性和治理功能。例如,Snowpark中的Python运行时默认情况下不允许外部网络访问,以防止常见的安全问题,如数据泄露。与创建和维护自己的环境或容器相比,对于Python新手用户来说,使用预先配置的安全Python运行时(如Snowpark)要容易得多。
现在还处于早期阶段,随着时间的推移,我希望出现更多专门针对非专业开发人员的Python工具和资源。需要改进的一个领域是Python用户与不想自己学习这门语言的同事共享工作成果的方法。雪花收购Streamlit在一定程度上就是为了解决这个问题。这个开源工具允许数据团队构建应用程序,为非技术用户可视化地呈现数据。Python本身是一种构建应用程序的强大语言,因此在为最终用户构建数据应用程序时使用它将使该语言得到更广泛的采用。
原文:
With data more critical than ever to companies’ success, Python is spreading beyond the realm of data professionals and being adopted by business analysts and other less technical users. But what are the opportunities if you’re relatively new to Python and what best practices should you be aware of to ensure your success?
Data professionals are a precious commodity and in many organizations the demands of the business have outgrown the resources and capacity of data teams. At the same time, business analysts are running into the limits of what BI tools can do for them and looking for ways to do more advanced analytics. Python is the key to success here.
Python usage is growing fast. In a survey of more than 20,000 developers earlier this year, Python ranked second only to JavaScript in terms of popularity, and Python added 3.3 million net new users over the previous six months to reach 15.7 million users worldwide.
In recent years, the Python community has created new frameworks and packages that make the language more accessible to non-professional developers for advanced analytics, machine learning, and app development. Examples include NumPy, an open source Python library for numerical data; Prophet, for running forecasts, and H3, a project begun at Uber for manipulating geospatial data.
Python’s spread to non-professional developers isn’t without precedent. A similar pattern played out with the rise of self-service BI tools, and with business people learning to script their own Excel macros. The expanded use of Python will be even more impactful because the language itself is so capable.
Getting started with Python analytics
Business users often understand better than professional developers what specific insights will be most helpful to their business units, and there are several entry-level use cases where they can start putting Python to work. Here are three examples:
Correlation matrices
A correlation matrix is a table that shows the correlation coefficients for different variables. This can allow you to analyze different dimensions of a data set to determine if a person who exhibits behavior A, for example, is also likely to exhibit behavior B. Correlation matrices are useful for determining which items to place near to each other in a grocery store, or which additional items to offer when an ecommerce user is checking out.
Principal component analysis
Another possible starting point is principal component analysis, which can reduce the size of a noisy data set and determine which attributes have the most predictive power for a given outcome. If a company sells mortgages, for example, a principal component analysis can reveal which demographic factors (income, ZIP code, marital status, etc.) are most predictive of a sale, helping to target campaigns and offers.
Forecasting
Another common problem for businesses is forecasting. Think of predicting customer demand, sales, or revenue, which all mature businesses need to do. Building forecasts is a way to explore predictive analytics, using open source libraries such as Prophet or Scikit-Learn in Python.
Great power, as they say, brings great responsibility, and there are best practices that new Python users should employ to ensure that the applications they build are robust and secure.
Python care and feeding
One issue is maintaining Python packages to ensure that dependencies are properly managed. Anaconda is helpful here, because it greatly simplifies package management and deployment. With Snowflake’s Snowpark for Python, we pre-install the most popular Python packages from the Anaconda defaults channel into our Python runtime so they don’t have to be installed manually. We’ve also integrated the Conda package manager into Snowpark to manage Python packages and their dependencies.
Like any data project, there are security and governance issues to be aware of, but modern cloud data platforms provide a runtime that is already set up and configured, and users can take advantage of the security and governance capabilities built into those platforms. For example, the Python runtime in Snowpark disallows external network access by default to protect against common security concerns such as data exfiltration. Using a pre-configured secure Python runtime like Snowpark is much easier for novice Python users compared to creating and maintaining your own environments or containers.
It’s early days still, and over time I expect additional Python tools and resources aimed specifically at non-professional developers to emerge. One area that needs to evolve is the methods by which Python users can share the outputs of their work with colleagues who don’t want to learn the language themselves. Snowflake’s purchase of Streamlit was intended in part to address this. The open source tool allows data teams to build applications that bring data to life visually for non-technical users. Python itself is a powerful language for building applications, so its use in building data applications for end users will make the language even more widely adopted.
To get started, RealPython offers a comprehensive beginner’s guide to Python, and Full Stack Python links to many resources here. The Python Software Foundation has an active community where experienced users provide advice and answer questions for all ability levels.
本文由数字化转型网(www.szhzxw.cn)翻译而成,作者:Torsten Grabs;翻译:数字化转型网郑亚茹;翻译审核:数字化转型网默然。

免责声明: 本网站(http://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。
本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。
