《Hadoop权威指南（第二版，影印版）》—

Hadoop权威指南（第二版，影印版）

出版时间：2011年06月

页数：600

揭示了Apache Hadoop如何为你释放数据的力量。这本内容全面的书籍展示了如何使用Hadoop架构搭建和维护可靠、可伸缩的分布式系统。Hadoop架构是MapReduce算法的一种开源应用，是Google开创其帝国的重要基石。程序员可从中探索如何分析海量数据集，管理员可以了解如何建立与运行Hadoop集群。
本修订版涵盖了Hadoop最近的更新，包括诸如Hive、Sqoop和Avro之类的新特性。它也提供了案例学习来展示Hadoop如何解决特殊问题。期待尽情享受你的数据？这就是你要的书。

·使用Hadoop分布式文件系统（HDFS）来存储海量数据集，通过MapReduce对这些数据集运行分布式计算
·熟悉Hadoop的数据和I/O构件，用于压缩、数据集成、序列化和持久处理
·洞悉编写MapReduce实际应用程序时的常见陷阱和高级特性
·设计、构建和管理专用的Hadoop集群或在云上运行Hadoop
·使用Pig这种高级的查询语言来处理大规模数据
·使用Hive、Hadoop的数据仓库系统来分析数据集
·利用HBase这个Hadoop数据库来处理结构化和半结构化数据
·学习Zookeeper，这是一个用于构建分布式系统的协作原语工具箱

“祝贺你有此良机向大师学习Hadoop，在享用技术本身的同时，体验大师的睿智和朴素的文风。”
——Doug Cutting
Cloudera公司
本书作者Tom White从2007年起就是Apache Hadoop的理事。他是Apache软件基金会的成员和Cloudera的工程师。Tom为oreilly.com，java.net和IBM的developerWorks撰文，并为业内会议演讲。

Cloudera是基于Hadoop的软件和服务的领先提供商。Hadoop的Cloudera发行版（CDH）是全面的基于Apache Hadoop的数据管理平台。Cloudera公司提供了使用Hadoop所需的工具、平台和支持。

适用于有编程经验的读者

目录
产品信息
关于作者
封面介绍

Foreword
Preface
1. Meet Hadoop
Data!
Data Storage and Analysis
Comparison with Other Systems
RDBMS
Grid Computing
Volunteer Computing
A Brief History of Hadoop
Apache Hadoop and the Hadoop Ecosystem
2. MapReduce
A Weather Dataset
Data Format
Analyzing the Data with Unix Tools
Analyzing the Data with Hadoop
Map and Reduce
Java MapReduce
Scaling Out
Data Flow
Combiner Functions
Running a Distributed MapReduce Job
Hadoop Streaming
Ruby
Python
Hadoop Pipes
Compiling and Running
3. The Hadoop Distributed Filesystem
The Design of HDFS
HDFS Concepts
Blocks
Namenodes and Datanodes
The Command-Line Interface
Basic Filesystem Operations
Hadoop Filesystems
Interfaces
The Java Interface
Reading Data from a Hadoop URL
Reading Data Using the FileSystem API
Writing Data
Directories
Querying the Filesystem
Deleting Data
Data Flow
Anatomy of a File Read
Anatomy of a File Write
Coherency Model
Parallel Copying with distcp
Keeping an HDFS Cluster Balanced
Hadoop Archives
Using Hadoop Archives
Limitations
4. Hadoop I/O
Data Integrity
Data Integrity in HDFS
LocalFileSystem
ChecksumFileSystem
Compression
Codecs
Compression and Input Splits
Using Compression in MapReduce
Serialization
The Writable Interface
Writable Classes
Implementing a Custom Writable
Serialization Frameworks
Avro
File-Based Data Structures
SequenceFile
MapFile
5. Developing a MapReduce Application
The Configuration API
Combining Resources
Variable Expansion
Configuring the Development Environment
Managing Configuration
GenericOptionsParser, Tool, and ToolRunner
Writing a Unit Test
Mapper
Reducer
Running Locally on Test Data
Running a Job in a Local Job Runner
Testing the Driver
Running on a Cluster
Packaging
Launching a Job
The MapReduce Web UI
Retrieving the Results
Debugging a Job
Using a Remote Debugger
Tuning a Job
Profiling Tasks
MapReduce Workflows
Decomposing a Problem into MapReduce Jobs
Running Dependent Jobs
6. How MapReduce Works
Anatomy of a MapReduce Job Run
Job Submission
Job Initialization
Task Assignment
Task Execution
Progress and Status Updates
Job Completion
Failures
Task Failure
Tasktracker Failure
Jobtracker Failure
Job Scheduling
The Fair Scheduler
The Capacity Scheduler
Shuffle and Sort
The Map Side
The Reduce Side
Configuration Tuning
Task Execution
Speculative Execution
Task JVM Reuse
Skipping Bad Records
The Task Execution Environment
7. MapReduce Types and Formats
MapReduce Types
The Default MapReduce Job
Input Formats
Input Splits and Records
Text Input
Binary Input
Multiple Inputs
Database Input (and Output)
Output Formats
Text Output
Binary Output
Multiple Outputs
Lazy Output
Database Output
8. MapReduce Features
Counters
Built-in Counters
User-Defined Java Counters
User-Defined Streaming Counters
Sorting
Preparation
Partial Sort
Total Sort
Secondary Sort
Joins
Map-Side Joins
Reduce-Side Joins
Side Data Distribution
Using the Job Configuration
Distributed Cache
MapReduce Library Classes
9. Setting Up a Hadoop Cluster
Cluster Specification
Network Topology
Cluster Setup and Installation
Installing Java
Creating a Hadoop User
Installing Hadoop
Testing the Installation
SSH Configuration
Hadoop Configuration
Configuration Management
Environment Settings
Important Hadoop Daemon Properties
Hadoop Daemon Addresses and Ports
Other Hadoop Properties
User Account Creation
Security
Kerberos and Hadoop
Delegation Tokens
Other Security Enhancements
Benchmarking a Hadoop Cluster
Hadoop Benchmarks
User Jobs
Hadoop in the Cloud
Hadoop on Amazon EC2
10. Administering Hadoop
HDFS
Persistent Data Structures
Safe Mode
Audit Logging
Tools
Monitoring
Logging
Metrics
Java Management Extensions
Maintenance
Routine Administration Procedures
Commissioning and Decommissioning Nodes
Upgrades
11. Pig
Installing and Running Pig
Execution Types
Running Pig Programs
Grunt
Pig Latin Editors
An Example
Generating Examples
Comparison with Databases
Pig Latin
Structure
Statements
Expressions
Types
Schemas
Functions
User-Defined Functions
A Filter UDF
An Eval UDF
A Load UDF
Data Processing Operators
Loading and Storing Data
Filtering Data
Grouping and Joining Data
Sorting Data
Combining and Splitting Data
Pig in Practice
Parallelism
Parameter Substitution
12. Hive
Installing Hive
The Hive Shell
An Example
Running Hive
Configuring Hive
Hive Services
The Metastore
Comparison with Traditional Databases
Schema on Read Versus Schema on Write
Updates, Transactions, and Indexes
HiveQL
Data Types
Operators and Functions
Tables
Managed Tables and External Tables
Partitions and Buckets
Storage Formats
Importing Data
Altering Tables
Dropping Tables
Querying Data
Sorting and Aggregating
MapReduce Scripts
Joins
Subqueries
Views
User-Defined Functions
Writing a UDF
Writing a UDAF
13. HBase
HBasics
Backdrop
Concepts
Whirlwind Tour of the Data Model
Implementation
Installation
Test Drive
Clients
Java
Avro, REST, and Thrift
Example
Schemas
Loading Data
Web Queries
HBase Versus RDBMS
Successful Service
HBase
Use Case: HBase at Streamy.com
Praxis
Versions
HDFS
UI
Metrics
Schema Design
Counters
Bulk Load
14. ZooKeeper
Installing and Running ZooKeeper
An Example
Group Membership in ZooKeeper
Creating the Group
Joining a Group
Listing Members in a Group
Deleting a Group
The ZooKeeper Service
Data Model
Operations
Implementation
Consistency
Sessions
States
Building Applications with ZooKeeper
A Configuration Service
The Resilient ZooKeeper Application
A Lock Service
More Distributed Data Structures and Protocols
ZooKeeper in Production
Resilience and Performance
Configuration
15. Sqoop
Getting Sqoop
A Sample Import
Generated Code
Additional Serialization Systems
Database Imports: A Deeper Look
Controlling the Import
Imports and Consistency
Direct-mode Imports
Working with Imported Data
Imported Data and Hive
Importing Large Objects
Performing an Export
Exports: A Deeper Look
Exports and Transactionality
Exports and SequenceFiles
16. Case Studies
Hadoop Usage at Last.fm
Last.fm: The Social Music Revolution
Hadoop at Last.fm
Generating Charts with Hadoop
The Track Statistics Program
Summary
Hadoop and Hive at Facebook
Introduction
Hadoop at Facebook
Hypothetical Use Case Studies
Hive
Problems and Future Work
Nutch Search Engine
Background
Data Structures
Selected Examples of Hadoop Data Processing in Nutch
Summary
Log Processing at Rackspace
Requirements/The Problem
Brief History
Choosing Hadoop
Collection and Storage
MapReduce for Logs
Cascading
Fields, Tuples, and Pipes
Operations
Taps, Schemes, and Flows
Cascading in Practice
Flexibility
Hadoop and Cascading at ShareThis
Summary
TeraByte Sort on Apache Hadoop
Using Pig and Wukong to Explore Billion-edge Network Graphs
Measuring Community
Everybody’s Talkin’ at Me: The Twitter Reply Graph
Symmetric Links
Community Extraction
A. Installing Apache Hadoop
B. Cloudera’s Distribution for Hadoop
C. Preparing the NCDC Weather Data
Index

书名：Hadoop权威指南（第二版，影印版）

作者：Tom White 著

国内出版社：东南大学出版社

出版时间：2011年06月

页数：600

书号：978-7-5641-2676-6

原版书书名：Hadoop: The Definitive Guide, Second Edition

原版书出版商：O'Reilly Media

Tom White

自从 2007 年 2 月以来,Tom White 一直担任 Apache Hadoop 项目负责人。他是 Apache 软件基金会的成员之一。他就职于 Cloudera,该公司提供 Hadoop 产品、服务、支持和培训服务。在此之前,Tom 是一名独立的 Hadoop 顾问,曾帮助很多公司搭建、使用和扩展 Hadoop 应用。他曾为 O’Reilly.com,Java.net 和 IBM 的 developerWorks 写过大量文章,并定期在行业大会上发表 Hadoop 主题演讲。Tom 拥有英国剑桥大学数学学士学位和利兹大学科学哲学硕士学位。现在,他和他的家人居住在旧金山。

查看Tom White更多信息

The animal on the cover of Hadoop: The Definitive Guide is an African elephant. These
members of the genus Loxodonta are the largest land animals on earth (slightly larger
than their cousin, the Asian elephant) and can be identified by their ears, which have
been said to look somewhat like the continent of Asia. Males stand 12 feet tall at the
shoulder and weigh 12,000 pounds, but they can get as big as 15,000 pounds, whereas
females stand 10 feet tall and weigh 8,000–11,000 pounds. Even young elephants are
very large: at birth, they already weigh approximately 200 pounds and stand about 3
feet tall.
African elephants live throughout sub-Saharan Africa. Most of the continent’s elephants
live on savannas and in dry woodlands. In some regions, they can be found in
desert areas; in others, they are found in mountains.
The species plays an important role in the forest and savanna ecosystems in which they
live. Many plant species are dependent on passing through an elephant’s digestive tract
before they can germinate; it is estimated that at least a third of tree species in west
African forests rely on elephants in this way. Elephants grazing on vegetation also affect
the structure of habitats and influence bush fire patterns. For example, under natural
conditions, elephants make gaps through the rainforest, enabling the sunlight to enter,
which allows the growth of various plant species. This, in turn, facilitates more abundance
and more diversity of smaller animals. As a result of the influence elephants have
over many plants and animals, they are often referred to as a keystone species because
they are vital to the long-term survival of the ecosystems in which they live.

购买选项

定价：98.00元

书号：978-7-5641-2676-6

出版社：东南大学出版社

联系出版社邮购