网站运维(影印版)
John Allspaw, Jesse Robbins 编
出版时间:2011年03月
页数:336
“网络正在改变我们的生活方式并触及每一个人。而越来越依赖于网络的同时,人们也越来越依赖于我们。所以网站运维是至关重要的工作。”
——来自序言
网络应用牵涉到很多专业人士,而网站运维人员必须确保应用的每一部分在其整个生命周期中都能正常工作。当初创公司遭遇了未曾预期的访问流量尖峰,或者当某个新特性导致成熟应用失效时,你就需要这样的专业知识。在这部文章和访谈集中,网站运维老手Theo Schlossnagle、Baron Schwartz和Alistair Croll向这个日新月异的领域提供了他们的真知灼见。你还将学到如何使网站蓬勃发展的秘诀,这是来自最大规模网站建设者的第一手资料。
· 学习网站运维技能,了解这些技巧来自于经验而非学校教育的原因
· 理解为何从应用程序和基础设施收集统计数据都很重要
· 为数据库架构和规模日益增长带来的隐患考虑通用的处理方法
· 学习如何处理宕机和降级相关的人为因素
· 找到在蜂拥而至的巨大流量后避免灾难的方法
· 问题发生后了解症结所在,防止其再次发生
  1. Foreword
  2. Preface
  3. 1 Web Operations: The Career
  4. Theo Schlossnagle
  5. Why Does Web Operations Have It Tough?
  6. From Apprentice to Master
  7. Conclusion
  8. 2 How Picnik Uses Cloud Computing: Lessons Learned
  9. Justin Huff
  10. Where the Cloud Fits (and Why!)
  11. Where the Cloud Doesn’t Fit (for Picnik)
  12. Conclusion
  13. 3 Infrastructure and Application Metrics
  14. John Allspaw, with Matt Massie
  15. Time Resolution and Retention Concerns
  16. Locality of Metrics Collection and Storage
  17. Layers of Metrics
  18. Providing Context for Anomaly Detection and Alerts
  19. Log Lines Are Metrics, Too
  20. Correlation with Change Management and Incident Timelines
  21. Making Metrics Available to Your Alerting Mechanisms
  22. Using Metrics to Guide Load-Feedback Mechanisms
  23. A Metrics Collection System, Illustrated: Ganglia
  24. Conclusion
  25. 4 Continuous Deployment
  26. Eric Ries
  27. Small Batches Mean Faster Feedback
  28. Small Batches Mean Problems Are Instantly Localized
  29. Small Batches Reduce Risk
  30. Small Batches Reduce Overhead
  31. The Quality Defenders’ Lament
  32. Getting Started
  33. Continuous Deployment Is for Mission-Critical
  34. Applications
  35. Conclusion
  36. 5 Infrastructure As Code
  37. Adam Jacob
  38. Service-Oriented Architecture
  39. Conclusion
  40. 6 Monitoring
  41. Patrick Debois
  42. Story: “The Start of a Journey”
  43. Step 1: Understand What You Are Monitoring
  44. Step 2: Understand Normal Behavior
  45. Step 3: Be Prepared and Learn
  46. Conclusion
  47. 7 How Complex Systems Fail
  48. John Allspaw and Richard Cook
  49. How Complex Systems Fail
  50. Further Reading
  51. 8 Community Management and Web Operations
  52. Heather Champ and John Allspaw
  53. 9 Dealing with Unexpected Traffic Spikes
  54. Brian Moon
  55. How It All Started
  56. Alarms Abound
  57. Putting Out the Fire
  58. Surviving the Weekend
  59. Preparing for the Future
  60. CDN to the Rescue
  61. Proxy Servers
  62. Corralling the Stampede
  63. Streamlining the Codebase
  64. How Do We Know It Works?
  65. The Real Test
  66. Lessons Learned
  67. Improvements Since Then
  68. 10 Dev and Ops Collaboration and Cooperation
  69. Paul Hammond
  70. Deployment
  71. Shared, Open Infrastructure
  72. Trust
  73. On-call Developers
  74. Avoiding Blame
  75. Conclusion
  76. 11 How Your Visitors Feel: User-Facing Metrics
  77. Alistair Croll and Sean Power
  78. Why Collect User-Facing Metrics?
  79. What Makes a Site Slow?
  80. Measuring Delay
  81. Building an SLA
  82. Visitor Outcomes: Analytics
  83. Other Metrics Marketing Cares About
  84. How User Experience Affects Web Ops
  85. The Future of Web Monitoring
  86. Conclusion
  87. 12 Relational Database Strategy and Tactics for the Web
  88. Baron Schwartz
  89. Requirements for Web Databases
  90. How Typical Web Databases Grow
  91. The Yearning for a Cluster
  92. Database Strategy
  93. Database Tactics
  94. Conclusion
  95. 13 How to Make Failure Beautiful: The Art and
  96. Science of Postmortems
  97. Jake Loomis
  98. The Worst Postmortem
  99. What Is a Postmortem?
  100. When to Conduct a Postmortem
  101. Who to Invite to a Postmortem
  102. Running a Postmortem
  103. Postmortem Follow-Up
  104. Conclusion
  105. 14 Storage
  106. Anoop Nagwani
  107. Data Asset Inventory
  108. Data Protection
  109. Capacity Planning
  110. Storage Sizing
  111. Operations
  112. Conclusion
  113. 15 Nonrelational Databases
  114. Eric Florenzano
  115. NoSQL Database Overview
  116. Some Systems in Detail
  117. Conclusion
  118. 16 Agile Infrastructure
  119. Andrew Clay Shafer
  120. Agile Infrastructure
  121. So, What’s the Problem?
  122. Communities of Interest and Practice
  123. Trading Zones and Apologies
  124. Conclusion
  125. 17 Things That Go Bump in the Night (and How to
  126. Sleep Through Them)
  127. Mike Christian
  128. Definitions
  129. How Many 9s?
  130. Impact Duration Versus Incident Duration
  131. Datacenter Footprint
  132. Gradual Failures
  133. Trust Nobody
  134. Failover Testing
  135. Monitoring and History of Patterns
  136. Getting a Good Night’s Sleep
  137. Contributors
  138. Index
书名:网站运维(影印版)
作者:John Allspaw, Jesse Robbins 编
国内出版社:东南大学出版社
出版时间:2011年03月
页数:336
书号:978-7-5641-2502-8
原版书书名:Web Operations
原版书出版商:O'Reilly Media
John Allspaw
 
John Allspaw目前就职于Flickr.com并担任工程经理一职,该网站以共享用户上传的照片闻名。自该网站1999年成立以来,使他聚集了丰富的经验。这些经验包括在线新闻杂志(比如:Salon.com、InfoWorld.com、Macworld.com)以及一些当前急速增长的社会站点(比如:Friendster 和Flickr)。
John在Friendster公司时,该网站曾呈五倍增长。他负责将Friend-ster站点从只有几十台服务器的数据中心过度到多于400台服务器的两个数据中心,以支持重新设计的后端基础设施。当他加盟Flickr公司时,在温哥华只有一个10多台服务器的微小数据中心,现在在美国己经设立了多个数据中心。
在他从事网站职业之前,John曾作为机械工程师在建模和仿真领域为国家公路交通安全局的汽车碰撞模拟实验做出贡献。
 
 
Jesse Robbins