Python数据科学手册(第2版,影印版)
Jake VanderPlas
出版时间:2023年03月
页数:563
“这本新鲜出炉的新版提供了清晰易懂的示例,帮助你顺利地设置和使用基本的数据科学和机器学习工具。”
——Anne Bonner
Content Simplicit的创始人兼首席执行官

Python是众多研究人员眼中的一流工具,主要原因在于它所提供的可用于存储、操作、洞察数据的各种库。数据科学堆栈的各个部分都存在多种资源,但只有本书的新版将它们汇集于一处,包括IPython、NumPy、pandas、Matplotlib、Scikit-Learn以及其他相关工具。
熟悉阅读和编写Python代码的在职科技人员和数据处理人员会发现这份全面的案头参考书的第2版非常适用于处理各种日常问题:数据的操作、转换、清理;不同类型数据的可视化;使用数据建立统计或机器学习模型。一句话,这是Python科学计算的必备参考书。
你将从此手册中学到:
● IPython和Jupyter,为使用Python的科技人员提供计算环境
● NumPy,包括用于高效存储和操作密集数据数组的ndarray
● Pandas,提供了用于高效存储和操作标记/列数据的DataFrame
● Matplotlib,包括一系列灵活的数据可视化功能
● Scikit-Learn,帮助你构建最重要和最成熟的机器学习算法的高效简洁的Python实现
  1. Preface
  2. Part I. Jupyter: Beyond Normal Python
  3. 1. Getting Started in IPython and Jupyter
  4. Launching the IPython Shell
  5. Launching the Jupyter Notebook
  6. Help and Documentation in IPython
  7. Keyboard Shortcuts in the IPython Shell
  8. 2. Enhanced Interactive Features
  9. IPython Magic Commands
  10. Input and Output History
  11. IPython and Shell Commands
  12. 3. Debugging and Profiling
  13. Errors and Debugging
  14. Profiling and Timing Code
  15. More IPython Resources
  16. Part II. Introduction to NumPy
  17. 4. Understanding Data Types in Python
  18. A Python Integer Is More Than Just an Integer
  19. A Python List Is More Than Just a List
  20. Fixed-Type Arrays in Python
  21. Creating Arrays from Python Lists
  22. Creating Arrays from Scratch
  23. NumPy Standard Data Types
  24. 5. The Basics of NumPy Arrays
  25. NumPy Array Attributes
  26. Array Indexing: Accessing Single Elements
  27. Array Slicing: Accessing Subarrays
  28. Reshaping of Arrays
  29. Array Concatenation and Splitting
  30. 6. Computation on NumPy Arrays: Universal Functions
  31. The Slowness of Loops
  32. Introducing Ufuncs
  33. Exploring NumPy’s Ufuncs
  34. Advanced Ufunc Features
  35. Ufuncs: Learning More
  36. 7. Aggregations: min, max, and Everything in Between
  37. Summing the Values in an Array
  38. Minimum and Maximum
  39. Example: What Is the Average Height of US Presidents?
  40. 8. Computation on Arrays: Broadcasting
  41. Introducing Broadcasting
  42. Rules of Broadcasting
  43. Broadcasting in Practice
  44. 9. Comparisons, Masks, and Boolean Logic
  45. Example: Counting Rainy Days
  46. Comparison Operators as Ufuncs
  47. Working with Boolean Arrays
  48. Boolean Arrays as Masks
  49. Using the Keywords and/or Versus the Operators &/|
  50. 10. Fancy Indexing
  51. Exploring Fancy Indexing
  52. Combined Indexing
  53. Example: Selecting Random Points
  54. Modifying Values with Fancy Indexing
  55. Example: Binning Data
  56. 11. Sorting Arrays
  57. Fast Sorting in NumPy: np.sort and np.argsort
  58. Sorting Along Rows or Columns
  59. Partial Sorts: Partitioning
  60. Example: k-Nearest Neighbors
  61. 12. Structured Data: NumPy’s Structured Arrays
  62. Exploring Structured Array Creation
  63. More Advanced Compound Types
  64. Record Arrays: Structured Arrays with a Twist
  65. On to Pandas
  66. Part III. Data Manipulation with Pandas
  67. 13. Introducing Pandas Objects
  68. The Pandas Series Object
  69. The Pandas DataFrame Object
  70. The Pandas Index Object
  71. 14. Data Indexing and Selection
  72. Data Selection in Series
  73. Data Selection in DataFrames
  74. 15. Operating on Data in Pandas
  75. Ufuncs: Index Preservation
  76. Ufuncs: Index Alignment
  77. Ufuncs: Operations Between DataFrames and Series
  78. 16. Handling Missing Data
  79. Trade-offs in Missing Data Conventions
  80. Missing Data in Pandas
  81. Pandas Nullable Dtypes
  82. Operating on Null Values
  83. 17. Hierarchical Indexing
  84. A Multiply Indexed Series
  85. Methods of MultiIndex Creation
  86. Indexing and Slicing a MultiIndex
  87. Rearranging Multi-Indexes
  88. 18. Combining Datasets: concat and append
  89. Recall: Concatenation of NumPy Arrays
  90. Simple Concatenation with pd.concat
  91. 19. Combining Datasets: merge and join
  92. Relational Algebra
  93. Categories of Joins
  94. Specification of the Merge Key
  95. Specifying Set Arithmetic for Joins
  96. Overlapping Column Names: The suffixes Keyword
  97. Example: US States Data
  98. 20. Aggregation and Grouping
  99. Planets Data
  100. Simple Aggregation in Pandas
  101. groupby: Split, Apply, Combine
  102. 21. Pivot Tables
  103. Motivating Pivot Tables
  104. Pivot Tables by Hand
  105. Pivot Table Syntax
  106. Example: Birthrate Data
  107. 22. Vectorized String Operations
  108. Introducing Pandas String Operations
  109. Tables of Pandas String Methods
  110. Example: Recipe Database
  111. 23. Working with Time Series
  112. Dates and Times in Python
  113. Pandas Time Series: Indexing by Time
  114. Pandas Time Series Data Structures
  115. Regular Sequences: pd.date_range
  116. Frequencies and Offsets
  117. Resampling, Shifting, and Windowing
  118. Example: Visualizing Seattle Bicycle Counts
  119. 24. High-Performance Pandas: eval and query
  120. Motivating query and eval: Compound Expressions
  121. pandas.eval for Efficient Operations
  122. DataFrame.eval for Column-Wise Operations
  123. The DataFrame.query Method
  124. Performance: When to Use These Functions
  125. Further Resources
  126. Part IV. Visualization with Matplotlib
  127. 25. General Matplotlib Tips
  128. Importing Matplotlib
  129. Setting Styles
  130. show or No show? How to Display Your Plots
  131. 26. Simple Line Plots
  132. Adjusting the Plot: Line Colors and Styles
  133. Adjusting the Plot: Axes Limits
  134. Labeling Plots
  135. Matplotlib Gotchas
  136. 27. Simple Scatter Plots
  137. Scatter Plots with plt.plot
  138. Scatter Plots with plt.scatter
  139. plot Versus scatter: A Note on Efficiency
  140. Visualizing Uncertainties
  141. 28. Density and Contour Plots
  142. Visualizing a Three-Dimensional Function
  143. Histograms, Binnings, and Density
  144. Two-Dimensional Histograms and Binnings
  145. 29. Customizing Plot Legends
  146. Choosing Elements for the Legend
  147. Legend for Size of Points
  148. Multiple Legends
  149. 30. Customizing Colorbars
  150. Customizing Colorbars
  151. Example: Handwritten Digits
  152. 31. Multiple Subplots
  153. plt.axes: Subplots by Hand
  154. plt.subplot: Simple Grids of Subplots
  155. plt.subplots: The Whole Grid in One Go
  156. plt.GridSpec: More Complicated Arrangements
  157. 32. Text and Annotation
  158. Example: Effect of Holidays on US Births
  159. Transforms and Text Position
  160. Arrows and Annotation
  161. 33. Customizing Ticks
  162. Major and Minor Ticks
  163. Hiding Ticks or Labels
  164. Reducing or Increasing the Number of Ticks
  165. Fancy Tick Formats
  166. Summary of Formatters and Locators
  167. 34. Customizing Matplotlib: Configurations and Stylesheets
  168. Plot Customization by Hand
  169. Changing the Defaults: rcParams
  170. Stylesheets
  171. 35. Three-Dimensional Plotting in Matplotlib
  172. Three-Dimensional Points and Lines
  173. Three-Dimensional Contour Plots
  174. Wireframes and Surface Plots
  175. Surface Triangulations
  176. Example: Visualizing a Mobius Strip
  177. 36. Visualization with Seaborn
  178. Exploring Seaborn Plots
  179. Categorical Plots
  180. Example: Exploring Marathon Finishing Times
  181. Further Resources
  182. Other Python Visualization Libraries
  183. Part V. Machine Learning
  184. 37. What Is Machine Learning?
  185. Categories of Machine Learning
  186. Qualitative Examples of Machine Learning Applications
  187. Summary
  188. 38. Introducing Scikit-Learn
  189. Data Representation in Scikit-Learn
  190. The Estimator API
  191. Application: Exploring Handwritten Digits
  192. Summary
  193. 39. Hyperparameters and Model Validation
  194. Thinking About Model Validation
  195. Selecting the Best Model
  196. Learning Curves
  197. Validation in Practice: Grid Search
  198. Summary
  199. 40. Feature Engineering
  200. Categorical Features
  201. Text Features
  202. Image Features
  203. Derived Features
  204. Imputation of Missing Data
  205. Feature Pipelines
  206. 41. In Depth: Naive Bayes Classification
  207. Bayesian Classification
  208. Gaussian Naive Bayes
  209. Multinomial Naive Bayes
  210. When to Use Naive Bayes
  211. 42. In Depth: Linear Regression
  212. Simple Linear Regression
  213. Basis Function Regression
  214. Regularization
  215. Example: Predicting Bicycle Traffic
  216. 43. In Depth: Support Vector Machines
  217. Motivating Support Vector Machines
  218. Support Vector Machines: Maximizing the Margin
  219. Example: Face Recognition
  220. Summary
  221. 44. In Depth: Decision Trees and Random Forests
  222. Motivating Random Forests: Decision Trees
  223. Ensembles of Estimators: Random Forests
  224. Random Forest Regression
  225. Example: Random Forest for Classifying Digits
  226. Summary
  227. 45. In Depth: Principal Component Analysis
  228. Introducing Principal Component Analysis
  229. PCA as Noise Filtering
  230. Example: Eigenfaces
  231. Summary
  232. 46. In Depth: Manifold Learning
  233. Manifold Learning: “HELLO”
  234. Multidimensional Scaling
  235. Nonlinear Manifolds: Locally Linear Embedding
  236. Some Thoughts on Manifold Methods
  237. Example: Isomap on Faces
  238. Example: Visualizing Structure in Digits
  239. 47. In Depth: k-Means Clustering
  240. Introducing k-Means
  241. Expectation–Maximization
  242. Examples
  243. 48. In Depth: Gaussian Mixture Models
  244. Motivating Gaussian Mixtures: Weaknesses of k-Means
  245. Generalizing E–M: Gaussian Mixture Models
  246. Choosing the Covariance Type
  247. Gaussian Mixture Models as Density Estimation
  248. Example: GMMs for Generating New Data
  249. 49. In Depth: Kernel Density Estimation
  250. Motivating Kernel Density Estimation: Histograms
  251. Kernel Density Estimation in Practice
  252. Selecting the Bandwidth via Cross-Validation
  253. Example: Not-so-Naive Bayes
  254. 50. Application: A Face Detection Pipeline
  255. HOG Features
  256. HOG in Action: A Simple Face Detector
  257. Caveats and Improvements
  258. Further Machine Learning Resources
  259. Index
书名:Python数据科学手册(第2版,影印版)
作者:Jake VanderPlas
国内出版社:东南大学出版社
出版时间:2023年03月
页数:563
书号:978-7-5766-0658-4
原版书书名:Python Data Science Handbook, 2nd Edition
原版书出版商:O'Reilly Media
Jake VanderPlas
 
Jake VanderPlas是Google研究院(Google Research)的一名软件工程师,从事支持数据密集型研究的相关工具开发。他创建并开发了各种用于数据密集型科学的Python工具, 包括Scikit-Learn、SciPy、Astropy、Altair、JAX等软件包。
 
 
The animal on the cover of Python Data Science Handbook is a Mexican beaded lizard (Heloderma horridum), a reptile found in Mexico and parts of Guatemala. The Greek word heloderma translates to “studded skin,” referring to the distinctive beaded texture of the lizard’s skin. These bumps are osteoderms, which each contain a small piece of bone and serve as protective armor.
The Mexican beaded lizard is black with yellow patches and bands. It has a broad head and a thick tail that stores fat to help it survive the hot summer months when it is inactive. On average, these lizards are 22–36 inches long, and weigh around 1.8 pounds. As with most snakes and lizards, the tongue of the Mexican beaded lizard is its primary sensory organ. It will flick it out repeatedly to gather scent particles from the environment and detect prey (or, during mating season, a potential partner).
It and the Gila monster (a close relative) are the only venomous lizards in the world. When threatened, the Mexican beaded lizard will bite and clamp down, chewing, because it cannot release a large quantity of venom at once. This bite and the aftereffects of the venom are extremely painful, though rarely fatal to humans. The beaded lizard’s venom contains enzymes that have been synthesized to help treat diabetes, and further pharmacological research is in progress. It is endangered by loss of habitat, poaching for the pet trade, and locals who kill it out of fear. This animal is protected by legislation in both countries where it lives.
购买选项
定价:148.00元
书号:978-7-5766-0658-4
出版社:东南大学出版社