Hadoop权威指南(第二版,影印版)
Tom White
出版时间:2011年06月
页数:600
揭示了Apache Hadoop如何为你释放数据的力量。这本内容全面的书籍展示了如何使用Hadoop架构搭建和维护可靠、可伸缩的分布式系统。Hadoop架构是MapReduce算法的一种开源应用,是Google开创其帝国的重要基石。程序员可从中探索如何分析海量数据集,管理员可以了解如何建立与运行Hadoop集群。
本修订版涵盖了Hadoop最近的更新,包括诸如Hive、Sqoop和Avro之类的新特性。它也提供了案例学习来展示Hadoop如何解决特殊问题......展开全部内容介绍
  1. Foreword
  2. Preface
  3. 1. Meet Hadoop
  4. Data!
  5. Data Storage and Analysis
  6. Comparison with Other Systems
  7. RDBMS
  8. Grid Computing
  9. Volunteer Computing
  10. A Brief History of Hadoop
  11. Apache Hadoop and the Hadoop Ecosystem
  12. 2. MapReduce
  13. A Weather Dataset
  14. Data Format
  15. Analyzing the Data with Unix Tools
  16. Analyzing the Data with Hadoop
  17. Map and Reduce
  18. Java MapReduce
  19. Scaling Out
  20. Data Flow
  21. Combiner Functions
  22. Running a Distributed MapReduce Job
  23. Hadoop Streaming
  24. Ruby
  25. Python
  26. Hadoop Pipes
  27. Compiling and Running
  28. 3. The Hadoop Distributed Filesystem
  29. The Design of HDFS
  30. HDFS Concepts
  31. Blocks
  32. Namenodes and Datanodes
  33. The Command-Line Interface
  34. Basic Filesystem Operations
  35. Hadoop Filesystems
  36. Interfaces
  37. The Java Interface
  38. Reading Data from a Hadoop URL
  39. Reading Data Using the FileSystem API
  40. Writing Data
  41. Directories
  42. Querying the Filesystem
  43. Deleting Data
  44. Data Flow
  45. Anatomy of a File Read
  46. Anatomy of a File Write
  47. Coherency Model
  48. Parallel Copying with distcp
  49. Keeping an HDFS Cluster Balanced
  50. Hadoop Archives
  51. Using Hadoop Archives
  52. Limitations
  53. 4. Hadoop I/O
  54. Data Integrity
  55. Data Integrity in HDFS
  56. LocalFileSystem
  57. ChecksumFileSystem
  58. Compression
  59. Codecs
  60. Compression and Input Splits
  61. Using Compression in MapReduce
  62. Serialization
  63. The Writable Interface
  64. Writable Classes
  65. Implementing a Custom Writable
  66. Serialization Frameworks
  67. Avro
  68. File-Based Data Structures
  69. SequenceFile
  70. MapFile
  71. 5. Developing a MapReduce Application
  72. The Configuration API
  73. Combining Resources
  74. Variable Expansion
  75. Configuring the Development Environment
  76. Managing Configuration
  77. GenericOptionsParser, Tool, and ToolRunner
  78. Writing a Unit Test
  79. Mapper
  80. Reducer
  81. Running Locally on Test Data
  82. Running a Job in a Local Job Runner
  83. Testing the Driver
  84. Running on a Cluster
  85. Packaging
  86. Launching a Job
  87. The MapReduce Web UI
  88. Retrieving the Results
  89. Debugging a Job
  90. Using a Remote Debugger
  91. Tuning a Job
  92. Profiling Tasks
  93. MapReduce Workflows
  94. Decomposing a Problem into MapReduce Jobs
  95. Running Dependent Jobs
  96. 6. How MapReduce Works
  97. Anatomy of a MapReduce Job Run
  98. Job Submission
  99. Job Initialization
  100. Task Assignment
  101. Task Execution
  102. Progress and Status Updates
  103. Job Completion
  104. Failures
  105. Task Failure
  106. Tasktracker Failure
  107. Jobtracker Failure
  108. Job Scheduling
  109. The Fair Scheduler
  110. The Capacity Scheduler
  111. Shuffle and Sort
  112. The Map Side
  113. The Reduce Side
  114. Configuration Tuning
  115. Task Execution
  116. Speculative Execution
  117. Task JVM Reuse
  118. Skipping Bad Records
  119. The Task Execution Environment
  120. 7. MapReduce Types and Formats
  121. MapReduce Types
  122. The Default MapReduce Job
  123. Input Formats
  124. Input Splits and Records
  125. Text Input
  126. Binary Input
  127. Multiple Inputs
  128. Database Input (and Output)
  129. Output Formats
  130. Text Output
  131. Binary Output
  132. Multiple Outputs
  133. Lazy Output
  134. Database Output
  135. 8. MapReduce Features
  136. Counters
  137. Built-in Counters
  138. User-Defined Java Counters
  139. User-Defined Streaming Counters
  140. Sorting
  141. Preparation
  142. Partial Sort
  143. Total Sort
  144. Secondary Sort
  145. Joins
  146. Map-Side Joins
  147. Reduce-Side Joins
  148. Side Data Distribution
  149. Using the Job Configuration
  150. Distributed Cache
  151. MapReduce Library Classes
  152. 9. Setting Up a Hadoop Cluster
  153. Cluster Specification
  154. Network Topology
  155. Cluster Setup and Installation
  156. Installing Java
  157. Creating a Hadoop User
  158. Installing Hadoop
  159. Testing the Installation
  160. SSH Configuration
  161. Hadoop Configuration
  162. Configuration Management
  163. Environment Settings
  164. Important Hadoop Daemon Properties
  165. Hadoop Daemon Addresses and Ports
  166. Other Hadoop Properties
  167. User Account Creation
  168. Security
  169. Kerberos and Hadoop
  170. Delegation Tokens
  171. Other Security Enhancements
  172. Benchmarking a Hadoop Cluster
  173. Hadoop Benchmarks
  174. User Jobs
  175. Hadoop in the Cloud
  176. Hadoop on Amazon EC2
  177. 10. Administering Hadoop
  178. HDFS
  179. Persistent Data Structures
  180. Safe Mode
  181. Audit Logging
  182. Tools
  183. Monitoring
  184. Logging
  185. Metrics
  186. Java Management Extensions
  187. Maintenance
  188. Routine Administration Procedures
  189. Commissioning and Decommissioning Nodes
  190. Upgrades
  191. 11. Pig
  192. Installing and Running Pig
  193. Execution Types
  194. Running Pig Programs
  195. Grunt
  196. Pig Latin Editors
  197. An Example
  198. Generating Examples
  199. Comparison with Databases
  200. Pig Latin
  201. Structure
  202. Statements
  203. Expressions
  204. Types
  205. Schemas
  206. Functions
  207. User-Defined Functions
  208. A Filter UDF
  209. An Eval UDF
  210. A Load UDF
  211. Data Processing Operators
  212. Loading and Storing Data
  213. Filtering Data
  214. Grouping and Joining Data
  215. Sorting Data
  216. Combining and Splitting Data
  217. Pig in Practice
  218. Parallelism
  219. Parameter Substitution
  220. 12. Hive
  221. Installing Hive
  222. The Hive Shell
  223. An Example
  224. Running Hive
  225. Configuring Hive
  226. Hive Services
  227. The Metastore
  228. Comparison with Traditional Databases
  229. Schema on Read Versus Schema on Write
  230. Updates, Transactions, and Indexes
  231. HiveQL
  232. Data Types
  233. Operators and Functions
  234. Tables
  235. Managed Tables and External Tables
  236. Partitions and Buckets
  237. Storage Formats
  238. Importing Data
  239. Altering Tables
  240. Dropping Tables
  241. Querying Data
  242. Sorting and Aggregating
  243. MapReduce Scripts
  244. Joins
  245. Subqueries
  246. Views
  247. User-Defined Functions
  248. Writing a UDF
  249. Writing a UDAF
  250. 13. HBase
  251. HBasics
  252. Backdrop
  253. Concepts
  254. Whirlwind Tour of the Data Model
  255. Implementation
  256. Installation
  257. Test Drive
  258. Clients
  259. Java
  260. Avro, REST, and Thrift
  261. Example
  262. Schemas
  263. Loading Data
  264. Web Queries
  265. HBase Versus RDBMS
  266. Successful Service
  267. HBase
  268. Use Case: HBase at Streamy.com
  269. Praxis
  270. Versions
  271. HDFS
  272. UI
  273. Metrics
  274. Schema Design
  275. Counters
  276. Bulk Load
  277. 14. ZooKeeper
  278. Installing and Running ZooKeeper
  279. An Example
  280. Group Membership in ZooKeeper
  281. Creating the Group
  282. Joining a Group
  283. Listing Members in a Group
  284. Deleting a Group
  285. The ZooKeeper Service
  286. Data Model
  287. Operations
  288. Implementation
  289. Consistency
  290. Sessions
  291. States
  292. Building Applications with ZooKeeper
  293. A Configuration Service
  294. The Resilient ZooKeeper Application
  295. A Lock Service
  296. More Distributed Data Structures and Protocols
  297. ZooKeeper in Production
  298. Resilience and Performance
  299. Configuration
  300. 15. Sqoop
  301. Getting Sqoop
  302. A Sample Import
  303. Generated Code
  304. Additional Serialization Systems
  305. Database Imports: A Deeper Look
  306. Controlling the Import
  307. Imports and Consistency
  308. Direct-mode Imports
  309. Working with Imported Data
  310. Imported Data and Hive
  311. Importing Large Objects
  312. Performing an Export
  313. Exports: A Deeper Look
  314. Exports and Transactionality
  315. Exports and SequenceFiles
  316. 16. Case Studies
  317. Hadoop Usage at Last.fm
  318. Last.fm: The Social Music Revolution
  319. Hadoop at Last.fm
  320. Generating Charts with Hadoop
  321. The Track Statistics Program
  322. Summary
  323. Hadoop and Hive at Facebook
  324. Introduction
  325. Hadoop at Facebook
  326. Hypothetical Use Case Studies
  327. Hive
  328. Problems and Future Work
  329. Nutch Search Engine
  330. Background
  331. Data Structures
  332. Selected Examples of Hadoop Data Processing in Nutch
  333. Summary
  334. Log Processing at Rackspace
  335. Requirements/The Problem
  336. Brief History
  337. Choosing Hadoop
  338. Collection and Storage
  339. MapReduce for Logs
  340. Cascading
  341. Fields, Tuples, and Pipes
  342. Operations
  343. Taps, Schemes, and Flows
  344. Cascading in Practice
  345. Flexibility
  346. Hadoop and Cascading at ShareThis
  347. Summary
  348. TeraByte Sort on Apache Hadoop
  349. Using Pig and Wukong to Explore Billion-edge Network Graphs
  350. Measuring Community
  351. Everybody’s Talkin’ at Me: The Twitter Reply Graph
  352. Symmetric Links
  353. Community Extraction
  354. A. Installing Apache Hadoop
  355. B. Cloudera’s Distribution for Hadoop
  356. C. Preparing the NCDC Weather Data
  357. Index
购买选项
定价:98.00元
书号:978-7-5641-2676-6
出版社:东南大学出版社