《Hive编程（影印版）》—

Hive编程（影印版）

Edward Capriolo, Dean Wampler, Jason Rutherglen 著

出版时间：2013年06月

页数：352

你是否需要把一个关系型数据库应用迁移到Hadoop上？这本全面的指南将为你介绍Apache Hive，它是Hadoop的数据仓库平台。你将快速了解如何使用Hive的SQL方言——HiveQL——来汇总、查询和分析存储在Hadoop分布式文件系统中的大数据集。
这本由实例驱动的指南为你展示了如何在你的环境中搭建和配置Hive，它也提供了对Hadoop和MapReduce的概括介绍，并且演示了Hive是如何在Hadoop的生态系统中工作的。你还将在其中找到现实世界的实例分析，它们展示了那些使用Hive的公司是如何解决PB容量数据层面上的独特问题。

· 使用Hive来创建、改变和删除数据库、表、视图、函数和索引
· 定制文件和外部数据库中的数据格式和存储选项
· 从表中加载和提取数据——以及使用查询、分组、过滤、连接和其他常用查询方法
· 获取创建用户自定义函数的最佳实践
· 了解你应该使用的Hive模式和你应该避免的错误模式
· 把Hive集成到其他数据处理程序中
· 在NoSQL数据库和其他数据存储中使用存储处理器
· 了解在Amazon公司的Elastic MapReduce上运行Hive的优点和缺点

Edward Capriolo是Media6degrees的系统管理员，也是Apache软件基金会的成员和Hadoop-Hive项目的委员之一。
Dean Wampler是Think Big Analytics公司的资深咨询顾问，他专长于大数据问题以及诸如Hadoop这样的工具和Machine Learning（机器学习）。
Jason Rutherglen是Think Big Analytics公司的软件架构师，他专长于大数据、Hadoop、搜索和安全。

目录
产品信息
关于作者
封面介绍

Chapter 1: Introduction
An Overview of Hadoop and MapReduce
Hive in the Hadoop Ecosystem
Java Versus Hive: The Word Count Algorithm
What’s Next
Chapter 2: Getting Started
Installing a Preconfigured Virtual Machine
Detailed Installation
What Is Inside Hive?
Starting Hive
Configuring Your Hadoop Environment
The Hive Command
The Command-Line Interface
Chapter 3: Data Types and File Formats
Primitive Data Types
Collection Data Types
Text File Encoding of Data Values
Schema on Read
Chapter 4: HiveQL: Data Definition
Databases in Hive
Alter Database
Creating Tables
Partitioned, Managed Tables
Dropping Tables
Alter Table
Chapter 5: HiveQL: Data Manipulation
Loading Data into Managed Tables
Inserting Data into Tables from Queries
Creating Tables and Loading Them in One Query
Exporting Data
Chapter 6: HiveQL: Queries
SELECT … FROM Clauses
WHERE Clauses
GROUP BY Clauses
JOIN Statements
ORDER BY and SORT BY
DISTRIBUTE BY with SORT BY
CLUSTER BY
Casting
Queries that Sample Data
UNION ALL
Chapter 7: HiveQL: Views
Views to Reduce Query Complexity
Views that Restrict Data Based on Conditions
Views and Map Type for Dynamic Tables
View Odds and Ends
Chapter 8: HiveQL: Indexes
Creating an Index
Rebuilding the Index
Showing an Index
Dropping an Index
Implementing a Custom Index Handler
Chapter 9: Schema Design
Table-by-Day
Over Partitioning
Unique Keys and Normalization
Making Multiple Passes over the Same Data
The Case for Partitioning Every Table
Bucketing Table Data Storage
Adding Columns to a Table
Using Columnar Tables
(Almost) Always Use Compression!
Chapter 10: Tuning
Using EXPLAIN
EXPLAIN EXTENDED
Limit Tuning
Optimized Joins
Local Mode
Parallel Execution
Strict Mode
Tuning the Number of Mappers and Reducers
JVM Reuse
Indexes
Dynamic Partition Tuning
Speculative Execution
Single MapReduce MultiGROUP BY
Virtual Columns
Chapter 11: Other File Formats and Compression
Determining Installed Codecs
Choosing a Compression Codec
Enabling Intermediate Compression
Final Output Compression
Sequence Files
Compression in Action
Archive Partition
Compression: Wrapping Up
Chapter 12: Developing
Changing Log4J Properties
Connecting a Java Debugger to Hive
Building Hive from Source
Setting Up Hive and Eclipse
Hive in a Maven Project
Unit Testing in Hive with hive_test
The New Plugin Developer Kit
Chapter 13: Functions
Discovering and Describing Functions
Calling Functions
Standard Functions
Aggregate Functions
Table Generating Functions
A UDF for Finding a Zodiac Sign from a Day
UDF Versus GenericUDF
Permanent Functions
User-Defined Aggregate Functions
User-Defined Table Generating Functions
Accessing the Distributed Cache from a UDF
Annotations for Use with Functions
Macros
Chapter 14: Streaming
Identity Transformation
Changing Types
Projecting Transformation
Manipulative Transformations
Using the Distributed Cache
Producing Multiple Rows from a Single Row
Calculating Aggregates with Streaming
CLUSTER BY, DISTRIBUTE BY, SORT BY
GenericMR Tools for Streaming to Java
Calculating Cogroups
Chapter 15: Customizing Hive File and Record Formats
File Versus Record Formats
Demystifying CREATE TABLE Statements
File Formats
Record Formats: SerDes
CSV and TSV SerDes
ObjectInspector
Think Big Hive Reflection ObjectInspector
XML UDF
XPath-Related Functions
JSON SerDe
Avro Hive SerDe
Binary Output
Chapter 16: Hive Thrift Service
Starting the Thrift Server
Setting Up Groovy to Connect to HiveService
Connecting to HiveServer
Getting Cluster Status
Result Set Schema
Fetching Results
Retrieving Query Plan
Metastore Methods
Administrating HiveServer
Hive ThriftMetastore
Chapter 17: Storage Handlers and NoSQL
Storage Handler Background
HiveStorageHandler
HBase
Cassandra
DynamoDB
Chapter 18: Security
Integration with Hadoop Security
Authentication with Hive
Authorization in Hive
Chapter 19: Locking
Locking Support in Hive with Zookeeper
Explicit, Exclusive Locks
Chapter 20: Hive Integration with Oozie
Oozie Actions
A Two-Query Workflow
Oozie Web Console
Variables in Workflows
Capturing Output
Capturing Output to Variables
Chapter 21: Hive and Amazon Web Services (AWS)
Why Elastic MapReduce?
Instances
Before You Start
Managing Your EMR Hive Cluster
Thrift Server on EMR Hive
Instance Groups on EMR
Configuring Your EMR Cluster
Persistence and the Metastore on EMR
HDFS and S3 on EMR Cluster
Putting Resources, Configs, and Bootstrap Scripts on S3
Logs on S3
Spot Instances
Security Groups
EMR Versus EC2 and Apache Hive
Wrapping Up
Chapter 22: HCatalog
Introduction
MapReduce
Command Line
Security Model
Architecture
Chapter 23: Case Studies
m6d.com (Media6Degrees)
Outbrain
NASA’s Jet Propulsion Laboratory
Photobucket
SimpleReach
Experiences and Needs from the Customer Trenches
Glossary
Appendix: References

书名：Hive编程（影印版）

作者：Edward Capriolo, Dean Wampler, Jason Rutherglen 著

国内出版社：东南大学出版社

出版时间：2013年06月

页数：352

书号：978-7-5641-4197-4

原版书书名：Programming Hive

原版书出版商：O'Reilly Media

Edward Capriolo

Media6degrees公司系统管理员，他是Apache软件基金会成员，还是Hadoop-Hive项目成员。

查看Edward Capriolo 更多信息

Dean Wampler

Think Big Analytics公司总顾问，对大数据问题以及Hadoop和机器学习有专门的研究。

查看Dean Wampler更多信息

Jason Rutherglen

Think Big Analytics公司软件架构师，对大数据、Hadoop、搜索和安全有专门的研究。

查看Jason Rutherglen更多信息

The animal on the cover of Programming Hive is a European hornet (Vespa cabro) andits hive. The European hornet is the only hornet in North America, introduced to thecontinent when European settlers migrated to the Americas. This hornet can be foundthroughout Europe and much of Asia, adapting its hive-building techniques to differentclimates when necessary.

The hornet is a social insect, related to bees and ants. The hornet’s hive consists of onequeen, a few male hornets (drones), and a large quantity of sterile female workers. Thechief purpose of drones is to reproduce with the hornet queen, and they die soon after.It is the female workers who are responsible for building the hive, carrying food, andtending to the hornet queen’s eggs.

The hornet’s nest itself is the consistency of paper, since it is constructed out of woodpulp in several layers of hexagonal cells. The end result is a pear-shaped nest attachedto its shelter by a short stem. In colder areas, hornets will abandon the nest in the winterand take refuge in hollow logs or trees, or even human houses, where the queen andher eggs will stay until the warmer weather returns. The eggs form the start of a newcolony, and the hive can be constructed once again.

购买选项

定价：54.00元

书号：978-7-5641-4197-4

出版社：东南大学出版社

联系出版社邮购