Python作为一种能够快速方便地处理数据、执行数据分析和完成工作的首选语言已经赢得了声誉。但是由于Python的生态系统是如此庞大和强大,许多刚开始学习这门语言的人都很难把它整理干净。他们会问:“我应该使用NumPy还是Pandas来完成这项工作?”或者“Plotly和Bokeh之间有什么区别?”听起来是不是很熟悉?
Lee Vaughn(旧金山No Starch出版社)的《科学家的Python工具》,将于2023年1月发布,是Python困惑的指南。正如引言中所描述的,本书旨在被用作“在Python发行版、工具和库的茂密丛林中砍刀”。为了实现这一目标,本书只介绍了一种用于科学工作的流行Python发行版anaconda,以及与之打包的常见科学计算工具和库:Spyder IDE、Jupyter Notebook和Jupyterlab,以及NumPy、Matplotlib、Pandas、Seaborn和Scikit-learn库。
一、设置Python工作区
本书的第一部分涉及设置一个工作区,在本例中,安装Anaconda并熟悉Jupyter和Spyder等工具。本文还介绍了创建虚拟环境和在其中管理包的细节,并提供了许多详细的命令行指令和屏幕截图。
一个警示:这本书的一个不言而喻的假设是,您只使用Anaconda作为您的主要Python安装。如果您的开发环境中有其他Python发行版(除了与操作系统捆绑的发行版),我建议删除它们以避免出现问题。
二、了解Python语言
对于那些根本不了解Python的人来说,本书的第二部分是该语言的压缩入门。除了介绍python语法、数据和容器类型、流控制、函数/模块等基础知识外,还详细介绍了类和面向对象编程、编写自文档代码以及使用文件(文本、pickle数据和JSON)。如果你需要更深入的介绍,序言可以为你提供更强大的学习资源。也就是说,本节本身就像一些独立的“Python入门”指南一样详细。
三、拆分Anaconda
第三部分介绍了许多与Anaconda打包的库,用于通用科学计算(SciPy)、深度学习、计算机视觉、自然语言处理、仪表板和可视化、地理空间数据和地理可视化等等。本节的目标不是深入地演示这些库,而是列出它们之间的差异,并允许在它们之间进行明智的选择。一个例子是如何选择深度学习库的建议:
如果你是深度学习的新手,可以考虑Keras,然后是PyTorch。[…如果您正在处理大型数据集,并且需要速度和性能,请选择PyTorch或TensorFlow。
四、示范
第四部分将深入介绍几个关键库:NumPy、Matplotlib、Pandas、Seaborn(用于数据可视化)和Scikit-learn。每个库都通过实际示例进行了演示。以Pandas、Seaborn和Scikit-learn为例,有一个有趣的项目涉及一个数据集(帕尔默企鹅项目),您可以在阅读时与它进行交互。
本书不涉及使用Python进行科学计算的某些方面。例如,没有讨论Cython和Numba,也没有提到与其他科学计算语言(如R或FORTRAN)的交叉集成。相反,这本书专注于它的主要任务:指导您通过Anaconda提供的科学Python产品的丛林。
原文:
AT A GLANCE
Python Tools for Scientists: An Introduction to Using Anaconda, JupyterLab, and Python’s Scientific Libraries
Python has earned a name as a go-to language for working quickly and conveniently with data, performing data analysis, and getting things done. But because the Python ecosystem is so vast and powerful, many people who are just starting with the language have a hard time sorting through it all. “Do I use NumPy or Pandas for this job?”, they ask, or “What’s the difference between Plotly and Bokeh?” Sound familiar?
Python Tools for Scientists, by Lee Vaughn (No Starch Press, San Francisco), to be released in January 2023, is a guide for the Pythonically perplexed. As described in the introduction, this book is intended to be used as “a machete for hacking through the dense jungle of Python distributions, tools, and libraries.” In keeping with that goal, the book is confined to one popular Python distribution for scientific work—Anaconda—and the common scientific computing tools and libraries that are packaged with it: the Spyder IDE, Jupyter Notebook, and Jupyterlab, and the NumPy, Matplotlib, Pandas, Seaborn, and Scikit-learn libraries.
Setting up a Python workspace
The first part of the book deals with setting up a workspace, in this case by installing Anaconda and getting familiar with tools like Jupyter and Spyder. It also covers the details of creating virtual environments and managing packages within them, with many detailed command-line instructions and screenshots throughout.
A cautionary note
An unspoken assumption of this book is that you’re using only Anaconda as your primary Python installation. If you have other Python distributions in your development environment (aside from what’s bundled with the operating system), I suggest removing them to avoid problems.
Getting to know the Python language
For those who don’t know Python at all, the book’s second part is a compressed primer for the language. Aside from covering the basics—Python syntax, data, and container types, flow control, functions/modules—it also provides detail on classes and object-oriented programming, writing self-documenting code, and working with files (text, pickled data, and JSON). If you need a more in-depth introduction, the preface points you toward more robust learning resources. That said, this section by itself is as detailed as some standalone “get started with Python” guides.
Unpacking Anaconda
Part three tours many of the libraries packaged with Anaconda for general scientific computing (SciPy), deep learning, computer vision, natural language processing, dashboards and visualization, geospatial data and geovisualization, and many more. The goal of this section isn’t to demonstrate the libraries in depth, but rather to lay out their differences and allow for informed choices between them. An example is the recommendation for how to choose a deep learning library:
If you’re brand new to deep learning, consider Keras, followed by PyTorch. […] If you’re working with large datasets and need speed and performance, choose either PyTorch or TensorFlow.
Demonstrations
Part four goes into depth with several key libraries: NumPy, Matplotlib, Pandas, Seaborn (for data visualization), and Scikit-learn. Each library is demonstrated with practical examples. In the case of Pandas, Seaborn, and Scikit-learn, there’s a fun project involving a dataset (the Palmer Penguins Project) that you can interact with as you read along.
This book does not cover some aspects of scientific computing with Python. For instance, Cython and Numba aren’t discussed, and there’s no mention of cross-integration with other scientific-computing languages like R or FORTRAN. Instead, this book stays focused on its main mission: guiding you through the thicket of scientific Python offerings available using Anaconda.
本文由数字化转型网(www.szhzxw.cn)翻译而成,作者:Serdar Yegulalp;翻译:数字化转型网郑亚茹;翻译审核:数字化转型网默然。

免责声明: 本网站(http://www.szhzxw.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。
本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。
