Hive编程(影印版)
Edward Capriolo, Dean Wampler, Jason Rutherglen
出版时间:2013年06月
页数:352
你是否需要把一个关系型数据库应用迁移到Hadoop上?这本全面的指南将为你介绍Apache Hive,它是Hadoop的数据仓库平台。你将快速了解如何使用Hive的SQL方言——HiveQL——来汇总、查询和分析存储在Hadoop分布式文件系统中的大数据集。
这本由实例驱动的指南为你展示了如何在你的环境中搭建和配置Hive,它也提供了对Hadoop和MapReduce的概括介绍,并且演示了Hive是如何在Hadoop的生态系统中工作的。你还将在其中找到现......展开全部内容介绍
  1. Chapter 1: Introduction
  2. An Overview of Hadoop and MapReduce
  3. Hive in the Hadoop Ecosystem
  4. Java Versus Hive: The Word Count Algorithm
  5. What’s Next
  6. Chapter 2: Getting Started
  7. Installing a Preconfigured Virtual Machine
  8. Detailed Installation
  9. What Is Inside Hive?
  10. Starting Hive
  11. Configuring Your Hadoop Environment
  12. The Hive Command
  13. The Command-Line Interface
  14. Chapter 3: Data Types and File Formats
  15. Primitive Data Types
  16. Collection Data Types
  17. Text File Encoding of Data Values
  18. Schema on Read
  19. Chapter 4: HiveQL: Data Definition
  20. Databases in Hive
  21. Alter Database
  22. Creating Tables
  23. Partitioned, Managed Tables
  24. Dropping Tables
  25. Alter Table
  26. Chapter 5: HiveQL: Data Manipulation
  27. Loading Data into Managed Tables
  28. Inserting Data into Tables from Queries
  29. Creating Tables and Loading Them in One Query
  30. Exporting Data
  31. Chapter 6: HiveQL: Queries
  32. SELECT … FROM Clauses
  33. WHERE Clauses
  34. GROUP BY Clauses
  35. JOIN Statements
  36. ORDER BY and SORT BY
  37. DISTRIBUTE BY with SORT BY
  38. CLUSTER BY
  39. Casting
  40. Queries that Sample Data
  41. UNION ALL
  42. Chapter 7: HiveQL: Views
  43. Views to Reduce Query Complexity
  44. Views that Restrict Data Based on Conditions
  45. Views and Map Type for Dynamic Tables
  46. View Odds and Ends
  47. Chapter 8: HiveQL: Indexes
  48. Creating an Index
  49. Rebuilding the Index
  50. Showing an Index
  51. Dropping an Index
  52. Implementing a Custom Index Handler
  53. Chapter 9: Schema Design
  54. Table-by-Day
  55. Over Partitioning
  56. Unique Keys and Normalization
  57. Making Multiple Passes over the Same Data
  58. The Case for Partitioning Every Table
  59. Bucketing Table Data Storage
  60. Adding Columns to a Table
  61. Using Columnar Tables
  62. (Almost) Always Use Compression!
  63. Chapter 10: Tuning
  64. Using EXPLAIN
  65. EXPLAIN EXTENDED
  66. Limit Tuning
  67. Optimized Joins
  68. Local Mode
  69. Parallel Execution
  70. Strict Mode
  71. Tuning the Number of Mappers and Reducers
  72. JVM Reuse
  73. Indexes
  74. Dynamic Partition Tuning
  75. Speculative Execution
  76. Single MapReduce MultiGROUP BY
  77. Virtual Columns
  78. Chapter 11: Other File Formats and Compression
  79. Determining Installed Codecs
  80. Choosing a Compression Codec
  81. Enabling Intermediate Compression
  82. Final Output Compression
  83. Sequence Files
  84. Compression in Action
  85. Archive Partition
  86. Compression: Wrapping Up
  87. Chapter 12: Developing
  88. Changing Log4J Properties
  89. Connecting a Java Debugger to Hive
  90. Building Hive from Source
  91. Setting Up Hive and Eclipse
  92. Hive in a Maven Project
  93. Unit Testing in Hive with hive_test
  94. The New Plugin Developer Kit
  95. Chapter 13: Functions
  96. Discovering and Describing Functions
  97. Calling Functions
  98. Standard Functions
  99. Aggregate Functions
  100. Table Generating Functions
  101. A UDF for Finding a Zodiac Sign from a Day
  102. UDF Versus GenericUDF
  103. Permanent Functions
  104. User-Defined Aggregate Functions
  105. User-Defined Table Generating Functions
  106. Accessing the Distributed Cache from a UDF
  107. Annotations for Use with Functions
  108. Macros
  109. Chapter 14: Streaming
  110. Identity Transformation
  111. Changing Types
  112. Projecting Transformation
  113. Manipulative Transformations
  114. Using the Distributed Cache
  115. Producing Multiple Rows from a Single Row
  116. Calculating Aggregates with Streaming
  117. CLUSTER BY, DISTRIBUTE BY, SORT BY
  118. GenericMR Tools for Streaming to Java
  119. Calculating Cogroups
  120. Chapter 15: Customizing Hive File and Record Formats
  121. File Versus Record Formats
  122. Demystifying CREATE TABLE Statements
  123. File Formats
  124. Record Formats: SerDes
  125. CSV and TSV SerDes
  126. ObjectInspector
  127. Think Big Hive Reflection ObjectInspector
  128. XML UDF
  129. XPath-Related Functions
  130. JSON SerDe
  131. Avro Hive SerDe
  132. Binary Output
  133. Chapter 16: Hive Thrift Service
  134. Starting the Thrift Server
  135. Setting Up Groovy to Connect to HiveService
  136. Connecting to HiveServer
  137. Getting Cluster Status
  138. Result Set Schema
  139. Fetching Results
  140. Retrieving Query Plan
  141. Metastore Methods
  142. Administrating HiveServer
  143. Hive ThriftMetastore
  144. Chapter 17: Storage Handlers and NoSQL
  145. Storage Handler Background
  146. HiveStorageHandler
  147. HBase
  148. Cassandra
  149. DynamoDB
  150. Chapter 18: Security
  151. Integration with Hadoop Security
  152. Authentication with Hive
  153. Authorization in Hive
  154. Chapter 19: Locking
  155. Locking Support in Hive with Zookeeper
  156. Explicit, Exclusive Locks
  157. Chapter 20: Hive Integration with Oozie
  158. Oozie Actions
  159. A Two-Query Workflow
  160. Oozie Web Console
  161. Variables in Workflows
  162. Capturing Output
  163. Capturing Output to Variables
  164. Chapter 21: Hive and Amazon Web Services (AWS)
  165. Why Elastic MapReduce?
  166. Instances
  167. Before You Start
  168. Managing Your EMR Hive Cluster
  169. Thrift Server on EMR Hive
  170. Instance Groups on EMR
  171. Configuring Your EMR Cluster
  172. Persistence and the Metastore on EMR
  173. HDFS and S3 on EMR Cluster
  174. Putting Resources, Configs, and Bootstrap Scripts on S3
  175. Logs on S3
  176. Spot Instances
  177. Security Groups
  178. EMR Versus EC2 and Apache Hive
  179. Wrapping Up
  180. Chapter 22: HCatalog
  181. Introduction
  182. MapReduce
  183. Command Line
  184. Security Model
  185. Architecture
  186. Chapter 23: Case Studies
  187. m6d.com (Media6Degrees)
  188. Outbrain
  189. NASA’s Jet Propulsion Laboratory
  190. Photobucket
  191. SimpleReach
  192. Experiences and Needs from the Customer Trenches
  193. Glossary
  194. Appendix: References
购买选项
定价:54.00元
书号:978-7-5641-4197-4
出版社:东南大学出版社