Eric Sammer
“Eric Sammer的书籍针对搭建和运行一个Hadoop集群的各个方面提供了实践性的、贴切的、易于理解且蕴含丰富细节的建议。所有Hadoop管理员都应该阅读一下。”
——Tom White
Apache Hadoop组委会成员, Apache软件基金会成员

如果你需要维护大型而且复杂的Hadoop集群的话,本书是绝对必需的。随着Hadoop变成数据中心里大规模数据处理的行业标准,操作手册方面的需求急剧增长。Eric Sammer,Cloudera公司的首席方案架构师,在本书中为你展示了产品级Hadoop的运行细节,从规划、安装和配置系统到提供可持续的维护管理。

· HDFS和MapReduce概览:它们存在的原因和原理
· 从硬件和OS选择到网络需求来规划Hadoop部署
· 根据重要属性列表来学习搭建和配置细节
· 通过在多个组中共享集群来管理资源
· 获取最常见的集群维护任务运行手册
· 监控Hadoop集群——以及学习基于实际例子的故障检测
· 使用基础工具和技术来处理备份和灾难性故障

Eric Sammer是Cloudera公司的首席方案架构师,他帮助客户规划、部署和开发各种规模上的Hadoop和相关项目。他在开发和运营分布式的、高并发数据的录入和处理系统方面拥有丰富的经验。
  1. Chapter 1: Introduction
  2. Chapter 2: HDFS
  3. Goals and Motivation
  4. Design
  5. Daemons
  6. Reading and Writing Data
  7. Managing Filesystem Metadata
  8. Namenode High Availability
  9. Namenode Federation
  10. Access and Integration
  11. Chapter 3: MapReduce
  12. The Stages of MapReduce
  13. Introducing Hadoop MapReduce
  14. YARN
  15. Chapter 4: Planning a Hadoop Cluster
  16. Picking a Distribution and Version of Hadoop
  17. Hardware Selection
  18. Operating System Selection and Preparation
  19. Kernel Tuning
  20. Disk Configuration
  21. Network Design
  22. Chapter 5: Installation and Configuration
  23. Installing Hadoop
  24. Configuration: An Overview
  25. Environment Variables and Shell Scripts
  26. Logging Configuration
  27. HDFS
  28. Namenode High Availability
  29. Namenode Federation
  30. MapReduce
  31. Rack Topology
  32. Security
  33. Chapter 6: Identity, Authentication, and Authorization
  34. Identity
  35. Kerberos and Hadoop
  36. Authorization
  37. Tying It Together
  38. Chapter 7: Resource Management
  39. What Is Resource Management?
  40. HDFS Quotas
  41. MapReduce Schedulers
  42. Chapter 8: Cluster Maintenance
  43. Managing Hadoop Processes
  44. HDFS Maintenance Tasks
  45. MapReduce Maintenance Tasks
  46. Chapter 9: Troubleshooting
  47. Differential Diagnosis Applied to Systems
  48. Common Failures and Problems
  49. “Is the Computer Plugged In?”
  50. Treatment and Care
  51. War Stories
  52. Chapter 10: Monitoring
  53. An Overview
  54. Hadoop Metrics
  55. Health Monitoring
  56. Chapter 11: Backup and Recovery
  57. Data Backup
  58. Namenode Metadata
  59. Appendix: Deprecated Configuration Properties
作者:Eric Sammer
原版书书名:Hadoop Operations
原版书出版商:O'Reilly Media
Eric Sammer
The animal on the cover of Hadoop Operations is a spotted cavy, or lowland paca. The large rodent goes by different names depending on where it lives: tepezcuintle in Mexico and Central America, pisquinte in Costa Rica, jaleb in the Yucatán peninsula, conejo pintado in Panama, guanta in Ecuador, and so on. The name comes from the now extinct Tupian language of Brazil, meaning "awaken” and “alert.”

The paca has coarse fur and strong legs, at the end of which are four digits in the front and five on the back; pacas use their nails as hooves. Usually weighing in about 13 to 26 pounds, the paca usually has two litters per year.

Overall, this rodent keeps to itself, often described as a quiet, solitary nocturnal animal. They live in burrows that they dig themselves, about seven feet into the ground. Pacas prefer to live near water, which is where they tend to run for escape when threatened. Living in the tropical Americas means a diet of fruit such as avocado and mango as well as leaves, stems, roots, and seeds. These animals are great climbers and gather their own fruit. Considered a pest for farmers harvesting yam, sugar cane, corn, and cassava, the lowland paca are hunted for their delicious meat in Belize.