数据之美(影印版)
出版时间:2010年09月
页数:364
“数据实际上已经是下一代计算机应用程序的真正核心。在本书中,业界领先者描述了他们的项目是如何采用新方法来攫取数据的威力。对于那些对数据的未来和解决问题的方法感兴趣的人来说,这是一本必须要读的书。”
——Tim O’Reilly,O’Reilly Media, Inc.创始人和CEO
你很快就会发现基于数据的工作会变得多么广泛和美妙。通过一系列的个人故事,该领域的39位最佳数据从业者解释了他们是如何为各式各样的项目来开发简单而又优雅的解决方案,包括从火星着陆器到电台司令(Radiohead)的视频,以及更多。通过这本书,你可以:
· 探索大量在线数据集内在的机会和挑战
· 了解如何使用地图和数据糅合来可视化城市犯罪趋势
· 发现众包和透明度如何推进了药品研究的状态
· 理解新数据如何能在覆盖先前数据时提醒用户
· 了解处理DNA数据所需的巨量基础设施
本书还获得了以下人员的帮助:
Nathan Yau
Jonathan Follett 和 Matthew Holm
J.M. Hughes
Brian F. Cooper, Raghu Ramakrishnan
和Utkarsh Srivastava
Jeff Hammerbacher
Jason Dykes 和 Jo Wood
Jeff Jonas 和 Lisa Sokol
Jud Valeski
Alon Halevy 和 Jayant Madhavan
Aaron Koblin 和 Valdean Klump
Michal Migurski
Jeffrey Heer
Coco Krumme
Peter Norvig
Matt Wood 和 Ben Blackburne
Jean-Claude Bradley, Rajarshi Guha,
Andrew Lang, Pierre Lindenbaum,
Cameron Neylon, Antony Williams
和 Egon Willighagen
Brendan O’Connor 和 Lukas Biewald
Hadley Wickham, Deborah F. Swayne
和 David Poole
Andrew Gelman, Jonathan P. Kastellec
和 Yair Ghitza
Toby Segaran
- PREFACE
- 1 SEEING YOUR LIFE IN DATA
- by Nathan Yau
- Personal Environmental Impact Report (PEIR)
- your.flowingdata (YFD)
- Personal Data Collection
- Data Storage
- Data Processing
- Data Visualization
- The Point
- How to Participate
- 2 THE BEAUTIFUL PEOPLE: KEEPING USERS IN MIND WHEN
- DESIGNING DATA COLLECTION METHODS
- by Jonathan Follett and Matthew Holm
- Introduction: User Empathy Is the New Black
- The Project: Surveying Customers About a
- New Luxury Product
- Specific Challenges to Data Collection
- Designing Our Solution
- Results and Reflection
- 3 EMBEDDED IMAGE DATA PROCESSING ON MARS
- by J. M. Hughes
- Abstract
- Introduction
- Some Background
- To Pack or Not to Pack
- The Three Tasks
- Slotting the Images
- Passing the Image: Communication Among the Three Tasks
- Getting the Picture: Image Download and Processing
- Image Compression
- Downlink, or, It’s All Downhill from Here
- Conclusion
- 4 CLOUD STORAGE DESIGN IN A PNUTSHELL
- by Brian F. Cooper, Raghu Ramakrishnan, and
- Utkarsh Srivastava
- Introduction
- Updating Data
- Complex Queries
- Comparison with Other Systems
- Conclusion
- 5 INFORMATION PLATFORMS AND THE RISE OF THE
- DATA SCIENTIST
- by Jeff Hammerbacher
- Libraries and Brains
- Facebook Becomes Self-Aware
- A Business Intelligence System
- The Death and Rebirth of a Data Warehouse
- Beyond the Data Warehouse
- The Cheetah and the Elephant
- The Unreasonable Effectiveness of Data
- New Tools and Applied Research
- MAD Skills and Cosmos
- Information Platforms As Dataspaces
- The Data Scientist
- Conclusion
- 6 THE GEOGRAPHIC BEAUTY OF A PHOTOGRAPHIC ARCHIVE
- by Jason Dykes and Jo Wood
- Beauty in Data: Geograph
- Visualization, Beauty, and Treemaps
- A Geographic Perspective on Geograph Term Use
- Beauty in Discovery
- Reflection and Conclusion
- 7 DATA FINDS DATA
- by Jeff Jonas and Lisa Sokol
- Introduction
- The Benefits of Just-in-Time Discovery
- Corruption at the Roulette Wheel
- Enterprise Discoverability
- Federated Search Ain’t All That
- Directories: Priceless
- Relevance: What Matters and to Whom?
- Components and Special Considerations
- Privacy Considerations
- Conclusion
- 8 PORTABLE DATA IN REAL TIME
- by Jud Valeski
- Introduction
- The State of the Art
- Social Data Normalization
- Conclusion: Mediation via Gnip
- 9 SURFACING THE DEEP WEB
- by Alon Halevy and Jayant Madhaven
- What Is the Deep Web?
- Alternatives to Offering Deep-Web Access
- Conclusion and Future Work
- 10 BUILDING RADIOHEAD’S HOUSE OF CARDS
- by Aaron Koblin with Valdean Klump
- How It All Started
- The Data Capture Equipment
- The Advantages of Two Data Capture Systems
- The Data
- Capturing the Data, aka “The Shoot”
- Processing the Data
- Post-Processing the Data
- Launching the Video
- Conclusion
- 11 VISUALIZING URBAN DATA
- by Michal Migurski
- Introduction
- Background
- Cracking the Nut
- Making It Public
- Revisiting
- Conclusion
- 12 THE DESIGN OF SENSE.US
- by Jeffrey Heer
- Visualization and Social Data Analysis
- Data
- Visualization
- Collaboration
- Voyagers and Voyeurs
- Conclusion
- 13 WHAT DATA DOESN’T DO
- by Coco Krumme
- When Doesn’t Data Drive?
- Conclusion
- 14 NATURAL LANGUAGE CORPUS DATA
- by Peter Norvig
- Word Segmentation
- Secret Codes
- Spelling Correction
- Other Tasks
- Discussion and Conclusion
- 15 LIFE IN DATA: THE STORY OF DNA
- by Matt Wood and Ben Blackburne
- DNA As a Data Store
- DNA As a Data Source
- Fighting the Data Deluge
- The Future of DNA
- 16 BEAUTIFYING DATA IN THE REAL WORLD
- by Jean-Claude Bradley, Rajarshi Guha, Andrew Lang,
- Pierre Lindenbaum, Cameron Neylon, Antony Williams,
- and Egon Willighagen
- The Problem with Real Data
- Providing the Raw Data Back to the Notebook
- Validating Crowdsourced Data
- Representing the Data Online
- Closing the Loop: Visualizations to Suggest
- New Experiments
- Building a Data Web from Open Data and Free Services
- 17 SUPERFICIAL DATA ANALYSIS: EXPLORING MILLIONS OF
- SOCIAL STEREOTYPES
- by Brendan O’Connor and Lukas Biewald
- Introduction
- Preprocessing the Data
- Exploring the Data
- Age, Attractiveness, and Gender
- Looking at Tags
- Which Words Are Gendered?
- Clustering
- Conclusion
- 18 BAY AREA BLUES: THE EFFECT OF THE HOUSING CRISIS
- by Hadley Wickham, Deborah F. Swayne,
- and David Poole
- Introduction
- How Did We Get the Data?
- Geocoding
- Data Checking
- Analysis
- The Influence of Inflation
- The Rich Get Richer and the Poor Get Poorer
- Geographic Differences
- Census Information
- Exploring San Francisco
- Conclusion
- 19 BEAUTIFUL POLITICAL DATA
- by Andrew Gelman, Jonathan P. Kastellec,
- and Yair Ghitza
- Example 1: Redistricting and Partisan Bias
- Example 2: Time Series of Estimates
- Example 3: Age and Voting
- Example 4: Public Opinion and Senate Voting on
- Supreme Court Nominees
- Example 5: Localized Partisanship in Pennsylvania
- Conclusion
- 20 CONNECTING DATA
- by Toby Segaran
- What Public Data Is There, Really?
- The Possibilities of Connected Data
- Within Companies
- Impediments to Connecting Data
- Possible Solutions
- Conclusion
- CONTRIBUTORS
- INDEX
书名:数据之美(影印版)
国内出版社:东南大学出版社
出版时间:2010年09月
页数:364
书号:978-7-5641-2272-0
原版书出版商:O'Reilly Media
Toby Segaran
Toby Segaran是《Programming Collective Intelligence》的作者,生物技术软件公司Incellico的创始人。是Genstruct公司的软件开发主管,这家公司涉足计算生物领域,他本人的职责是设计算法,并利用数据挖掘技术来辅助了解药品机理。Toby Segaran还为其他几家公司和数个开源项目服务,帮助它们从收集到的数据当中分析并发掘价值。除此以外,Toby Segaran还建立了几个免费的网站应用,包括流行的tasktoy和Lazybase。他非常喜欢滑雪与品酒,其博客地址是blog.kiwitobes.com,现居于旧金山。
Jeff Hammerbacher
The cover image is a stock photo from Jupiter Images.