数据科学入门(影印版)
Sam Lau, Joseph Gonzalez, Deborah Nolan
出版时间:2024年03月
页数:594
“我真希望在第一次用‘数据科学家’这个词来描述我们的工作时能有这本书。如果你想从事数据科学/工程、AI或机器学习,这本书就是你的起点。”
——DJ Patil博士
美国第一位首席数据科学家

作为一名有抱负的数据科学家,你理解为什么组织机构的重要决策都依赖于数据 ——......展开全部内容介绍
  1. Preface
  2. Part I. The Data Science Lifecycle
  3. 1. The Data Science Lifecycle
  4. The Stages of the Lifecycle
  5. Examples of the Lifecycle
  6. Summary
  7. 2. Questions and Data Scope
  8. Big Data and New Opportunities
  9. Target Population, Access Frame, and Sample
  10. Instruments and Protocols
  11. Measuring Natural Phenomena
  12. Accuracy
  13. Summary
  14. 3. Simulation and Data Design
  15. The Urn Model
  16. Example: Simulating Election Poll Bias and Variance
  17. Example: Simulating a Randomized Trial for a Vaccine
  18. Example: Measuring Air Quality
  19. Summary
  20. 4. Modeling with Summary Statistics
  21. The Constant Model
  22. Minimizing Loss
  23. Summary
  24. 5. Case Study: Why Is My Bus Always Late?
  25. Question and Scope
  26. Data Wrangling
  27. Exploring Bus Times
  28. Modeling Wait Times
  29. Summary
  30. Part II. Rectangular Data
  31. 6. Working with Dataframes Using pandas
  32. Subsetting
  33. Aggregating
  34. Joining
  35. Transforming
  36. How Are Dataframes Different from Other Data Representations?
  37. Summary
  38. 7. Working with Relations Using SQL
  39. Subsetting
  40. Aggregating
  41. Joining
  42. Transforming and Common Table Expressions
  43. Summary
  44. Part III. Understanding The Data
  45. 8. Wrangling Files
  46. Data Source Examples
  47. File Formats
  48. File Encoding
  49. File Size
  50. The Shell and Command-Line Tools
  51. Table Shape and Granularity
  52. Summary
  53. 9. Wrangling Dataframes
  54. Example: Wrangling CO2 Measurements from the Mauna Loa Observatory
  55. Quality Checks
  56. Missing Values and Records
  57. Transformations and Timestamps
  58. Modifying Structure
  59. Example: Wrangling Restaurant Safety Violations
  60. Summary
  61. 10. Exploratory Data Analysis
  62. Feature Types
  63. What to Look For in a Distribution
  64. What to Look For in a Relationship
  65. Comparisons in Multivariate Settings
  66. Guidelines for Exploration
  67. Example: Sale Prices for Houses
  68. Summary
  69. 11. Data Visualization
  70. Choosing Scale to Reveal Structure
  71. Smoothing and Aggregating Data
  72. Facilitating Meaningful Comparisons
  73. Incorporating the Data Design
  74. Adding Context
  75. Creating Plots Using plotly
  76. Other Tools for Visualization
  77. Summary
  78. 12. Case Study: How Accurate Are Air Quality Measurements?
  79. Question, Design, and Scope
  80. Finding Collocated Sensors
  81. Wrangling and Cleaning AQS Sensor Data
  82. Wrangling PurpleAir Sensor Data
  83. Exploring PurpleAir and AQS Measurements
  84. Creating a Model to Correct PurpleAir Measurements
  85. Summary
  86. Part IV. Other Data Sources
  87. 13. Working with Text
  88. Examples of Text and Tasks
  89. String Manipulation
  90. Regular Expressions
  91. Text Analysis
  92. Summary
  93. 14. Data Exchange
  94. NetCDF Data
  95. JSON Data
  96. HTTP
  97. REST
  98. XML, HTML, and XPath
  99. Summary
  100. Part V. Linear Modeling
  101. 15. Linear Models
  102. Simple Linear Model
  103. Example: A Simple Linear Model for Air Quality
  104. Fitting the Simple Linear Model
  105. Multiple Linear Model
  106. Fitting the Multiple Linear Model
  107. Example: Where Is the Land of Opportunity?
  108. Feature Engineering for Numeric Measurements
  109. Feature Engineering for Categorical Measurements
  110. Summary
  111. 16. Model Selection
  112. Overfitting
  113. Train-Test Split
  114. Cross-Validation
  115. Regularization
  116. Model Bias and Variance
  117. Summary
  118. 17. Theory for Inference and Prediction
  119. Distributions: Population, Empirical, Sampling
  120. Basics of Hypothesis Testing
  121. Bootstrapping for Inference
  122. Basics of Confidence Intervals
  123. Basics of Prediction Intervals
  124. Probability for Inference and Prediction
  125. Summary
  126. 18. Case Study: How to Weigh a Donkey
  127. Donkey Study Question and Scope
  128. Wrangling and Transforming
  129. Exploring
  130. Modeling a Donkey’s Weight
  131. Summary
  132. Part VI. Classification
  133. 19. Classification
  134. Example: Wind-Damaged Trees
  135. Modeling and Classification
  136. Modeling Proportions (and Probabilities)
  137. A Loss Function for the Logistic Model
  138. From Probabilities to Classification
  139. Summary
  140. 20. Numerical Optimization
  141. Gradient Descent Basics
  142. Minimizing Huber Loss
  143. Convex and Differentiable Loss Functions
  144. Variants of Gradient Descent
  145. Summary
  146. 21. Case Study: Detecting Fake News
  147. Question and Scope
  148. Obtaining and Wrangling the Data
  149. Exploring the Data
  150. Modeling
  151. Summary
  152. Additional Material
  153. Data Sources
  154. Index
购买选项
定价:169.00元
书号:978-1098113001
出版社:东南大学出版社