挖掘社交网络(影印版)
Matthew A. Russell
出版时间:2011年06月
页数:332
Facebook、Twitter和LinkedIn产生了大量的宝贵的社交数据,但是你怎样才能找出谁通过社交媒介进行联系?他们在讨论些什么?或者他们在哪儿?这本简洁而且具有操作性的书将为你展示如何回答这些甚至更多的问题。你将学到如何组合社交网络数据、分析技术,如何通过可视化帮助你找到你一直在社交世界中寻找的内
容,以及那些你都不知道存在的有用信息。
每个独立章节介绍了在社交网络的不同领域挖掘数据的技术,这些领域包括博客和电子邮件。你所需要具备的就是一定的编程经验和学习基本的Python工具的意愿。

· 获得社交网络世界里的直观概要
· 使用GitHub上灵活的脚本来获取从诸如Twitter、Facebook和LinkedIn等社交网络API而来的数据
· 学习如何应用便捷的Python工具来分解和切割你所收集的数据
· 通过XHTM Friends Network探索基于微格式的社交联系
· 应用诸如TF-IDF、余弦相似性、配置分析、文档摘要、基团检测之类的先进挖掘技术
· 通过基于HTML5和JavaScript工具集的网络技术建立交互式可视化

Matthew A. Russell,Digital Reasoning Systems的工程副总裁和Zaffra的负责人,是热爱数据挖掘、开源和网络应用技术的计算机科学家。他是《Dojo:The Definitive Guide》(O’Reilly出版)的作者。

“《挖掘社交网络》是《集体智慧编程》一书的自然进阶:一种可操作的实践性方法来通过Python从社交网络中采集数
据。”
——Jeff Hammerbacher
首席科学家,Cloudera

“对于探索结构化和非结构化数据的一系列工具、技术和理论的丰富、紧凑并具有可操作性的介绍。”
——Alex Martelli
高级技术工程师,Google,
《Python in a Nutshell》的作者
  1. Preface
  2. 1. Introduction: Hacking on Twitter Data
  3. Installing Python Development Tools
  4. Collecting and Manipulating Twitter Data
  5. Tinkering with Twitter’s API
  6. Frequency Analysis and Lexical Diversity
  7. Visualizing Tweet Graphs
  8. Synthesis: Visualizing Retweets with Protovis
  9. Closing Remarks
  10. 2. Microformats: Semantic Markup and Common Sense Collide
  11. XFN and Friends
  12. Exploring Social Connections with XFN
  13. A Breadth-First Crawl of XFN Data
  14. Geocoordinates: A Common Thread for Just About Anything
  15. Wikipedia Articles + Google Maps = Road Trip?
  16. Slicing and Dicing Recipes (for the Health of It)
  17. Collecting Restaurant Reviews
  18. Summary
  19. 3. Mailboxes: Oldies but Goodies
  20. mbox: The Quick and Dirty on Unix Mailboxes
  21. mbox + CouchDB = Relaxed Email Analysis
  22. Bulk Loading Documents into CouchDB
  23. Sensible Sorting
  24. Map/Reduce-Inspired Frequency Analysis
  25. Sorting Documents by Value
  26. couchdb-lucene: Full-Text Indexing and More
  27. Threading Together Conversations
  28. Look Who’s Talking
  29. Visualizing Mail “Events” with SIMILE Timeline
  30. Analyzing Your Own Mail Data
  31. The Graph Your (Gmail) Inbox Chrome Extension
  32. Closing Remarks
  33. 4. Twitter: Friends, Followers, and Setwise Operations
  34. RESTful and OAuth-Cladded APIs
  35. No, You Can’t Have My Password
  36. A Lean, Mean Data-Collecting Machine
  37. A Very Brief Refactor Interlude
  38. Redis: A Data Structures Server
  39. Elementary Set Operations
  40. Souping Up the Machine with Basic Friend/Follower Metrics
  41. Calculating Similarity by Computing Common Friends and Followers
  42. Measuring Influence
  43. Constructing Friendship Graphs
  44. Clique Detection and Analysis
  45. The Infochimps “Strong Links” API
  46. Interactive 3D Graph Visualization
  47. Summary
  48. 5. Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet
  49. Pen : Sword :: Tweet : Machine Gun (?!?)
  50. Analyzing Tweets (One Entity at a Time)
  51. Tapping (Tim’s) Tweets
  52. Who Does Tim Retweet Most Often?
  53. What’s Tim’s Influence?
  54. How Many of Tim’s Tweets Contain Hashtags?
  55. Juxtaposing Latent Social Networks (or #JustinBieber Versus #TeaParty)
  56. What Entities Co-Occur Most Often with #JustinBieber and #TeaParty
  57. Tweets?
  58. On Average, Do #JustinBieber or #TeaParty Tweets Have More
  59. Hashtags?
  60. Which Gets Retweeted More Often: #JustinBieber or #TeaParty?
  61. How Much Overlap Exists Between the Entities of #TeaParty and
  62. #JustinBieber Tweets?
  63. Visualizing Tons of Tweets
  64. Visualizing Tweets with Tricked-Out Tag Clouds
  65. Visualizing Community Structures in Twitter Search Results
  66. Closing Remarks
  67. 6. LinkedIn: Clustering Your Professional Network for Fun (and Profit?)
  68. Motivation for Clustering
  69. Clustering Contacts by Job Title
  70. Standardizing and Counting Job Titles
  71. Common Similarity Metrics for Clustering
  72. A Greedy Approach to Clustering
  73. Hierarchical and k-Means Clustering
  74. Fetching Extended Profile Information
  75. Geographically Clustering Your Network
  76. Mapping Your Professional Network with Google Earth
  77. Mapping Your Professional Network with Dorling Cartograms
  78. Closing Remarks
  79. 7. Google Buzz: TF-IDF, Cosine Similarity, and Collocations
  80. Buzz = Twitter + Blogs (???)
  81. Data Hacking with NLTK
  82. Text Mining Fundamentals
  83. A Whiz-Bang Introduction to TF-IDF
  84. Querying Buzz Data with TF-IDF
  85. Finding Similar Documents
  86. The Theory Behind Vector Space Models and Cosine Similarity
  87. Clustering Posts with Cosine Similarity
  88. Visualizing Similarity with Graph Visualizations
  89. Buzzing on Bigrams
  90. How the Collocation Sausage Is Made: Contingency Tables and Scoring
  91. Functions
  92. Tapping into Your Gmail
  93. Accessing Gmail with OAuth
  94. Fetching and Parsing Email Messages
  95. Before You Go Off and Try to Build a Search Engine…
  96. Closing Remarks
  97. 8. Blogs et al.: Natural Language Processing (and Beyond)
  98. NLP: A Pareto-Like Introduction
  99. Syntax and Semantics
  100. A Brief Thought Exercise
  101. A Typical NLP Pipeline with NLTK
  102. Sentence Detection in Blogs with NLTK
  103. Summarizing Documents
  104. Analysis of Luhn’s Summarization Algorithm
  105. Entity-Centric Analysis: A Deeper Understanding of the Data
  106. Quality of Analytics
  107. Closing Remarks
  108. 9. Facebook: The All-in-One Wonder
  109. Tapping into Your Social Network Data
  110. From Zero to Access Token in Under 10 Minutes
  111. Facebook’s Query APIs
  112. Visualizing Facebook Data
  113. Visualizing Your Entire Social Network
  114. Visualizing Mutual Friendships Within Groups
  115. Where Have My Friends All Gone? (A Data-Driven Game)
  116. Visualizing Wall Data As a (Rotating) Tag Cloud
  117. Closing Remarks
  118. 10. The Semantic Web: A Cocktail Discussion
  119. An Evolutionary Revolution?
  120. Man Cannot Live on Facts Alone
  121. Open-World Versus Closed-World Assumptions
  122. Inferencing About an Open World with FuXi
  123. Hope
  124. Index
书名:挖掘社交网络(影印版)
作者:Matthew A. Russell
国内出版社:东南大学出版社
出版时间:2011年06月
页数:332
书号:978-7-5641-2686-5
原版书书名:Mining the Social Web
原版书出版商:O'Reilly Media
Matthew A. Russell
 
马修·罗塞尔(Matthew A. Russell),Digital Reasoning Systems公司的技术副总裁和Zaffra公司的负责人,是热爱数据挖掘、开源和Web应用技术的计算机科学家。他也是《Dojo: The Definitive Guide》(O’Reilly出版社)的作者。在Linkedin上联系他或在Twitter关注@ptwobrussell,可随时了解他的最新动态。
 
 
The animal on the cover of Mining the Social Web is a groundhog (Marmota monax),
also known as a woodchuck (a name derived from the Algonquin name wuchak).
Groundhogs are famously associated with the US/Canadian holiday Groundhog Day,
held every February 2nd. Folklore holds that if the groundhog emerges from its burrow
that day and sees its shadow, winter will continue for six more weeks. Proponents say
that the rodents forecast accurately 75 to 90 percent of the time. Many cities host
famous groundhog weather prognosticators, including Punxsutawney Phil (of Punxsutawney,
PA and the 1993 Bill Murray film).
This legend perhaps originates from the fact that the groundhog is one of the few species
that enters true hibernation during the winter. Primarily herbivorous, groundhogs will
fatten up in the summer on vegetation, berries, nuts, insects, and the crops in human
gardens, causing many to consider them pests. They then dig a winter burrow, and
remain there from October to March (although they may emerge earlier in temperate
areas, or, presumably, if they will be the center of attention on their eponymous
holiday).
The groundhog is the largest member of the squirrel family, around 16–26 inches long
and weighing 4–9 pounds. They are equipped with curved, thick claws ideal for digging,
and two coats of fur: a dense grey undercoat and a lighter colored topcoat of longer
hairs, which provides protection against the elements.
Groundhogs range throughout most of Canada and northern regions of the United
States, in places where open space and woodlands meet. They are capable of climbing
trees and swimming, but are usually found on the ground, not far from the burrows
they dig for sleeping, rearing their young, and protection from predators. These burrows
typically have two to five entrances, and up to 46 feet of tunnels.