集体智慧编程(影印版)
集体智慧编程(影印版)
Toby Segaran
出版时间:2008年03月
页数:334
“好极了!我无法想象会有更好的方式来开始学习这些算法和方法,也没有更好的方式能让我(一个人工智能老家伙)头脑中关于它们的细节知识迅速复苏。”
—— Dan Russel,Uber Tech负责人,Google

“Toby的书非常成功地将复杂的机器学习算法问题分解为现实而易理解的例子,可直接用于分析当前Web上的社会化交互行为。如果我在两年前拥有这本书,一定会省下大把浪费在迷途歧路上的宝贵时间。”
—— Tim Wolters,CTO,Collective Intellect

想要探寻搜索排名、产品推荐、社会化书签和在线匹配背后的力量吗?这本颇具魅力的书籍向你展现如何创建Web 2.0应用程序,从参与性Internet应用程序产生的大量数据中挖掘金矿。运用本书中介绍的先进算法,你可以编写聪明的程序,以访问其他网站那些有趣的数据集,从自有应用程序的用户中收集数据,或者分析和理解你所发现的数据。

《集体智慧编程》将你带入机器学习和统计的世界,并且阐释了如何从你和他人每天收集的信息中获得关于用户体验、市场营销、个性品味及人类行为的结论。每个算法的描述都十分简明清晰,相关代码均可以立即用于你的网站、博客、Wiki或特定应用程序。本书讲解了下列主题:

* 可以让在线零售商推荐产品或媒体的协作过滤技术
* 用于在大数据集中发现同类项组的聚类方法
* 从数以百万计可能方案中选择问题最佳解决方案的最优化算法
* 贝叶斯过滤,用在基于单词类型和其他特征的垃圾信息过滤中
* 支持向量(support-vector)机器,用于在线交友网站中的速配
* 用于问题解决的演化智能——计算机如何通过多次玩同样的游戏,改进自身代码并获得技能提升

每一章都包含了相关练习,可通过扩展使算法变得更强大。超越简单的数据库支持应用程序模式,让 Internet数据财富为你所用。
  1. Foreword
  2. Preface
  3. 1. Introduction to Collective Intelligence
  4. What Is Collective Intelligence?
  5. What Is Machine Learning?
  6. Limits of Machine Learning
  7. Real-Life Examples
  8. Other Uses for Learning Algorithms
  9. 2. Making Recommendations
  10. Collaborative Filtering
  11. Collecting Preferences
  12. Finding Similar Users
  13. Recommending Items
  14. Matching Products
  15. Building a del.icio.us Link Recommender
  16. Item-Based Filtering
  17. Using the MovieLens Dataset
  18. User-Based or Item-Based Filtering?
  19. Exercises
  20. 3. Discovering Groups
  21. Supervised versus Unsupervised Learning
  22. Word Vectors
  23. Hierarchical Clustering
  24. Drawing the Dendrogram
  25. Column Clustering
  26. K-Means Clustering
  27. Clusters of Preferences
  28. Viewing Data in Two Dimensions
  29. Other Things to Cluster
  30. Exercises
  31. 4. Searching and Ranking
  32. What’s in a Search Engine?
  33. A Simple Crawler
  34. Building the Index
  35. Querying
  36. Content-Based Ranking
  37. Using Inbound Links
  38. Learning from Clicks
  39. Exercises
  40. 5. Optimization
  41. Group Travel
  42. Representing Solutions
  43. The Cost Function
  44. Random Searching
  45. Hill Climbing
  46. Simulated Annealing
  47. Genetic Algorithms
  48. Real Flight Searches
  49. Optimizing for Preferences
  50. Network Visualization
  51. Other Possibilities
  52. Exercises
  53. 6. Document Filtering
  54. Filtering Spam
  55. Documents and Words
  56. Training the Classifier
  57. Calculating Probabilities
  58. A Nai?Nve Classifier
  59. The Fisher Method
  60. Persisting the Trained Classifiers
  61. Filtering Blog Feeds
  62. Improving Feature Detection
  63. Using Akismet
  64. Alternative Methods
  65. Exercises
  66. 7. Modeling with Decision Trees
  67. Predicting Signups
  68. Introducing Decision Trees
  69. Training the Tree
  70. Choosing the Best Split
  71. Recursive Tree Building
  72. Displaying the Tree
  73. Classifying New Observations
  74. Pruning the Tree
  75. Dealing with Missing Data
  76. Dealing with Numerical Outcomes
  77. Modeling Home Prices
  78. Modeling “Hotness”
  79. When to Use Decision Trees
  80. Exercises
  81. 8. Building Price Models
  82. Building a Sample Dataset
  83. k-Nearest Neighbors
  84. Weighted Neighbors
  85. Cross-Validation
  86. Heterogeneous Variables
  87. Optimizing the Scale
  88. Uneven Distributions
  89. Using Real Data—the eBay API
  90. When to Use k-Nearest Neighbors
  91. Exercises
  92. 9. Advanced Classification: Kernel Methods and SVMs
  93. Matchmaker Dataset
  94. Difficulties with the Data
  95. Basic Linear Classification
  96. Categorical Features
  97. Scaling the Data
  98. Understanding Kernel Methods
  99. Support-Vector Machines
  100. Using LIBSVM
  101. Matching on Facebook
  102. Exercises
  103. 10. Finding Independent Features
  104. A Corpus of News
  105. Previous Approaches
  106. Non-Negative Matrix Factorization
  107. Displaying the Results
  108. Using Stock Market Data
  109. Exercises
  110. 11. Evolving Intelligence
  111. What Is Genetic Programming?
  112. Programs As Trees
  113. Creating the Initial Population
  114. Testing a Solution
  115. Mutating Programs
  116. Crossover
  117. Building the Environment
  118. A Simple Game
  119. Further Possibilities
  120. Exercises
  121. 12. Algorithm Summary
  122. Bayesian Classifier
  123. Decision Tree Classifier
  124. Neural Networks
  125. Support-Vector Machines
  126. k-Nearest Neighbors
  127. Clustering
  128. Multidimensional Scaling
  129. Non-Negative Matrix Factorization
  130. Optimization
  131. A. Third-Party Libraries
  132. B. Mathematical Formulas
  133. Index
书名:集体智慧编程(影印版)
作者:Toby Segaran
国内出版社:东南大学出版社
出版时间:2008年03月
页数:334
书号:978-7-5641-1139-7
原版书出版商:O'Reilly Media
Toby Segaran
 
Toby Segaran是Genstruct公司的软件开发主管,这家公司涉足计算生物领域,他本人的职责是设计算法,并利用数据挖掘技术来辅助了解药品机理。Toby Segaran还为其他几家公司和数个开源项目服务,帮助它们从收集到的数据当中分析并发掘价值。除此以外,Toby Segaran还建立了几个免费的网站应用,包括流行的tasktoy和Lazybase。他非常喜欢滑雪与品酒,其博客地址是blog.kiwitobes.com,现居于旧金山。
Toby Segaran is a director of software development at Genstruct, a computational
biology company, where he designs algorithms and applies data-mining techniques
to help understand drug mechanisms. He also works with other companies and
open source projects to help them analyze and find value in their collected datasets.
In addition, he has built several free web applications including the popular tasktoy
and Lazybase. He enjoys snowboarding and wine tasting. His blog is located at
blog.kiwitobes.com. He lives in San Francisco.
 
 
The animals on the cover of Programming Collective Intelligence are King penguins
(Aptenodytes patagonicus). Although named for the Patagonia region, King Penguins
no longer breed in South America; the last colony there was wiped out by 19thcentury
sealers. Today, these penguins are found on sub-Antarctic islands such as
Prince Edward, Crozet, Macquarie, and Falkland Islands. They live on beaches and
flat glacial lands near the sea. King penguins are extremely social birds; they breed in
colonies of as many as 10,000 and raise their young in crèches.
Standing 30 inches tall and weighing up to 30 pounds, the King is one of the largest
types of penguin—second only to its close relative the Emperor penguin. Apart from
size, the major identifying feature of the King penguin is the bright orange patches on
its head that extend down to its silvery breast plumage. These penguins have a sleek
body frame and can run on land, instead of hopping like Emperor penguins. They are
well adapted to the sea, eating a diet of fish and squid, and can dive down 700 feet,
far deeper than most other penguins go. Because males and females are similar in size
and appearance, they are distinguished by behavioral clues such as mating rituals.
King penguins do not build nests; instead, they tuck their single egg under their
bellies and rest it on their feet. No other bird has a longer breeding cycle than these
penguins, who breed twice every three years and fledge a single chick. The chicks are
round, brown, and so fluffy that early explorers thought they were an entirely
different species of penguin, calling them “woolly penguins.” With a world population
of two million breeding pairs, King penguins are not a threatened species, and
the World Conservation Union has assigned them to the Least Concern category.