数据之美(影印版)
数据之美(影印版)
Toby Segaran, Jeff Hammerbacher 编
出版时间:2010年09月
页数:364
“数据实际上已经是下一代计算机应用程序的真正核心。在本书中,业界领先者描述了他们的项目是如何采用新方法来攫取数据的威力。对于那些对数据的未来和解决问题的方法感兴趣的人来说,这是一本必须要读的书。”
——Tim O’Reilly,O’Reilly Media, Inc.创始人和CEO
你很快就会发现基于数据的工作会变得多么广泛和美妙。通过一系列的个人故事,该领域的39位最佳数据从业者解释了他们是如何为各式各样的项目来开发简单而又优雅的解决方案,包括从火星着陆器到电台司令(Radiohead)的视频,以及更多。通过这本书,你可以:
· 探索大量在线数据集内在的机会和挑战
· 了解如何使用地图和数据糅合来可视化城市犯罪趋势
· 发现众包和透明度如何推进了药品研究的状态
· 理解新数据如何能在覆盖先前数据时提醒用户
· 了解处理DNA数据所需的巨量基础设施

本书还获得了以下人员的帮助:
Nathan Yau
Jonathan Follett 和 Matthew Holm
J.M. Hughes
Brian F. Cooper, Raghu Ramakrishnan
和Utkarsh Srivastava
Jeff Hammerbacher
Jason Dykes 和 Jo Wood
Jeff Jonas 和 Lisa Sokol
Jud Valeski
Alon Halevy 和 Jayant Madhavan
Aaron Koblin 和 Valdean Klump
Michal Migurski
Jeffrey Heer
Coco Krumme
Peter Norvig
Matt Wood 和 Ben Blackburne
Jean-Claude Bradley, Rajarshi Guha,
Andrew Lang, Pierre Lindenbaum,
Cameron Neylon, Antony Williams
和 Egon Willighagen
Brendan O’Connor 和 Lukas Biewald
Hadley Wickham, Deborah F. Swayne
和 David Poole
Andrew Gelman, Jonathan P. Kastellec
和 Yair Ghitza
Toby Segaran
  1. PREFACE
  2. 1 SEEING YOUR LIFE IN DATA
  3. by Nathan Yau
  4. Personal Environmental Impact Report (PEIR)
  5. your.flowingdata (YFD)
  6. Personal Data Collection
  7. Data Storage
  8. Data Processing
  9. Data Visualization
  10. The Point
  11. How to Participate
  12. 2 THE BEAUTIFUL PEOPLE: KEEPING USERS IN MIND WHEN
  13. DESIGNING DATA COLLECTION METHODS
  14. by Jonathan Follett and Matthew Holm
  15. Introduction: User Empathy Is the New Black
  16. The Project: Surveying Customers About a
  17. New Luxury Product
  18. Specific Challenges to Data Collection
  19. Designing Our Solution
  20. Results and Reflection
  21. 3 EMBEDDED IMAGE DATA PROCESSING ON MARS
  22. by J. M. Hughes
  23. Abstract
  24. Introduction
  25. Some Background
  26. To Pack or Not to Pack
  27. The Three Tasks
  28. Slotting the Images
  29. Passing the Image: Communication Among the Three Tasks
  30. Getting the Picture: Image Download and Processing
  31. Image Compression
  32. Downlink, or, It’s All Downhill from Here
  33. Conclusion
  34. 4 CLOUD STORAGE DESIGN IN A PNUTSHELL
  35. by Brian F. Cooper, Raghu Ramakrishnan, and
  36. Utkarsh Srivastava
  37. Introduction
  38. Updating Data
  39. Complex Queries
  40. Comparison with Other Systems
  41. Conclusion
  42. 5 INFORMATION PLATFORMS AND THE RISE OF THE
  43. DATA SCIENTIST
  44. by Jeff Hammerbacher
  45. Libraries and Brains
  46. Facebook Becomes Self-Aware
  47. A Business Intelligence System
  48. The Death and Rebirth of a Data Warehouse
  49. Beyond the Data Warehouse
  50. The Cheetah and the Elephant
  51. The Unreasonable Effectiveness of Data
  52. New Tools and Applied Research
  53. MAD Skills and Cosmos
  54. Information Platforms As Dataspaces
  55. The Data Scientist
  56. Conclusion
  57. 6 THE GEOGRAPHIC BEAUTY OF A PHOTOGRAPHIC ARCHIVE
  58. by Jason Dykes and Jo Wood
  59. Beauty in Data: Geograph
  60. Visualization, Beauty, and Treemaps
  61. A Geographic Perspective on Geograph Term Use
  62. Beauty in Discovery
  63. Reflection and Conclusion
  64. 7 DATA FINDS DATA
  65. by Jeff Jonas and Lisa Sokol
  66. Introduction
  67. The Benefits of Just-in-Time Discovery
  68. Corruption at the Roulette Wheel
  69. Enterprise Discoverability
  70. Federated Search Ain’t All That
  71. Directories: Priceless
  72. Relevance: What Matters and to Whom?
  73. Components and Special Considerations
  74. Privacy Considerations
  75. Conclusion
  76. 8 PORTABLE DATA IN REAL TIME
  77. by Jud Valeski
  78. Introduction
  79. The State of the Art
  80. Social Data Normalization
  81. Conclusion: Mediation via Gnip
  82. 9 SURFACING THE DEEP WEB
  83. by Alon Halevy and Jayant Madhaven
  84. What Is the Deep Web?
  85. Alternatives to Offering Deep-Web Access
  86. Conclusion and Future Work
  87. 10 BUILDING RADIOHEAD’S HOUSE OF CARDS
  88. by Aaron Koblin with Valdean Klump
  89. How It All Started
  90. The Data Capture Equipment
  91. The Advantages of Two Data Capture Systems
  92. The Data
  93. Capturing the Data, aka “The Shoot”
  94. Processing the Data
  95. Post-Processing the Data
  96. Launching the Video
  97. Conclusion
  98. 11 VISUALIZING URBAN DATA
  99. by Michal Migurski
  100. Introduction
  101. Background
  102. Cracking the Nut
  103. Making It Public
  104. Revisiting
  105. Conclusion
  106. 12 THE DESIGN OF SENSE.US
  107. by Jeffrey Heer
  108. Visualization and Social Data Analysis
  109. Data
  110. Visualization
  111. Collaboration
  112. Voyagers and Voyeurs
  113. Conclusion
  114. 13 WHAT DATA DOESN’T DO
  115. by Coco Krumme
  116. When Doesn’t Data Drive?
  117. Conclusion
  118. 14 NATURAL LANGUAGE CORPUS DATA
  119. by Peter Norvig
  120. Word Segmentation
  121. Secret Codes
  122. Spelling Correction
  123. Other Tasks
  124. Discussion and Conclusion
  125. 15 LIFE IN DATA: THE STORY OF DNA
  126. by Matt Wood and Ben Blackburne
  127. DNA As a Data Store
  128. DNA As a Data Source
  129. Fighting the Data Deluge
  130. The Future of DNA
  131. 16 BEAUTIFYING DATA IN THE REAL WORLD
  132. by Jean-Claude Bradley, Rajarshi Guha, Andrew Lang,
  133. Pierre Lindenbaum, Cameron Neylon, Antony Williams,
  134. and Egon Willighagen
  135. The Problem with Real Data
  136. Providing the Raw Data Back to the Notebook
  137. Validating Crowdsourced Data
  138. Representing the Data Online
  139. Closing the Loop: Visualizations to Suggest
  140. New Experiments
  141. Building a Data Web from Open Data and Free Services
  142. 17 SUPERFICIAL DATA ANALYSIS: EXPLORING MILLIONS OF
  143. SOCIAL STEREOTYPES
  144. by Brendan O’Connor and Lukas Biewald
  145. Introduction
  146. Preprocessing the Data
  147. Exploring the Data
  148. Age, Attractiveness, and Gender
  149. Looking at Tags
  150. Which Words Are Gendered?
  151. Clustering
  152. Conclusion
  153. 18 BAY AREA BLUES: THE EFFECT OF THE HOUSING CRISIS
  154. by Hadley Wickham, Deborah F. Swayne,
  155. and David Poole
  156. Introduction
  157. How Did We Get the Data?
  158. Geocoding
  159. Data Checking
  160. Analysis
  161. The Influence of Inflation
  162. The Rich Get Richer and the Poor Get Poorer
  163. Geographic Differences
  164. Census Information
  165. Exploring San Francisco
  166. Conclusion
  167. 19 BEAUTIFUL POLITICAL DATA
  168. by Andrew Gelman, Jonathan P. Kastellec,
  169. and Yair Ghitza
  170. Example 1: Redistricting and Partisan Bias
  171. Example 2: Time Series of Estimates
  172. Example 3: Age and Voting
  173. Example 4: Public Opinion and Senate Voting on
  174. Supreme Court Nominees
  175. Example 5: Localized Partisanship in Pennsylvania
  176. Conclusion
  177. 20 CONNECTING DATA
  178. by Toby Segaran
  179. What Public Data Is There, Really?
  180. The Possibilities of Connected Data
  181. Within Companies
  182. Impediments to Connecting Data
  183. Possible Solutions
  184. Conclusion
  185. CONTRIBUTORS
  186. INDEX
书名:数据之美(影印版)
作者:Toby Segaran, Jeff Hammerbacher 编
国内出版社:东南大学出版社
出版时间:2010年09月
页数:364
书号:978-7-5641-2272-0
原版书出版商:O'Reilly Media
Toby Segaran
 
Toby Segaran是Genstruct公司的软件开发主管,这家公司涉足计算生物领域,他本人的职责是设计算法,并利用数据挖掘技术来辅助了解药品机理。Toby Segaran还为其他几家公司和数个开源项目服务,帮助它们从收集到的数据当中分析并发掘价值。除此以外,Toby Segaran还建立了几个免费的网站应用,包括流行的tasktoy和Lazybase。他非常喜欢滑雪与品酒,其博客地址是blog.kiwitobes.com,现居于旧金山。
Toby Segaran is a director of software development at Genstruct, a computational
biology company, where he designs algorithms and applies data-mining techniques
to help understand drug mechanisms. He also works with other companies and
open source projects to help them analyze and find value in their collected datasets.
In addition, he has built several free web applications including the popular tasktoy
and Lazybase. He enjoys snowboarding and wine tasting. His blog is located at
blog.kiwitobes.com. He lives in San Francisco.
 
 
Jeff Hammerbacher
 
The cover image is a stock photo from Jupiter Images.