Python自然语言处理(影印版)
Python自然语言处理(影印版)
Steven Bird, Ewan Klein, Edward Loper
出版时间:2010年09月
页数:479
本书提供了非常易学的自然语言处理入门介绍,该领域涵盖从文本和电子邮件预测过滤,到自动总结和翻译等多种语言处理技术。在本书中,你将学会编写 Python程序处理大量非结构化文本。你还将通过使用综合语言数据结构访问含有丰富注释的数据集,理解用于分析书面通信内容和结构的主要算法。
《Python自然语言处理》准备了充足的示例和练习,可以帮助你:
· 从非结构化文本中抽取信息,甚至猜测主题或识别“命名实体”
· 分析文本语言结构,包括解析和语义分析
· 访问流行的语言学数据库,包括WordNet和树库(treebank)
· 从多种语言学和人工智能领域中提取的整合技巧
本书将帮助你学习运用Python编程语言和自然语言工具包
(NLTK)获得实用的自然语言处理技能。如果对于开发Web应用、分析多语言新闻源或记录濒危语言感兴趣——即便只是想从程序员视角观察人类语言如何运作,你将发现《Python自然语言处理》是一本令人着迷且极为有用的好书。

“很少有这样一本方法清晰、代码整洁的书来讨论如此高难度的计算机问题……这是学习自然语言处理的入门佳作。”
—— Ken Getz,资深咨询顾问,MCW Technologies公司
Steven Bird是墨尔本大学计算机科学和软件工程系副教授,以及宾夕法尼亚大学语言数据联合会高级研究助理。
Ewan Klein是爱丁堡大学信息学院语言技术教授。
Edward Loper最近从宾夕法尼亚大学获得机器学习自然语言处理博士学位,目前是波士顿 BBN Technologies公司的研究员。
  1. Preface
  2. 1. Language Processing and Python
  3. 1.1 Computing with Language: Texts and Words
  4. 1.2 A Closer Look at Python: Texts as Lists of Words
  5. 1.3 Computing with Language: Simple Statistics
  6. 1.4 Back to Python: Making Decisions and Taking Control
  7. 1.5 Automatic Natural Language Understanding
  8. 1.6 Summary
  9. 1.7 Further Reading
  10. 1.8 Exercises
  11. 2. Accessing Text Corpora and Lexical Resources
  12. 2.1 Accessing Text Corpora
  13. 2.2 Conditional Frequency Distributions
  14. 2.3 More Python: Reusing Code
  15. 2.4 Lexical Resources
  16. 2.5 WordNet
  17. 2.6 Summary
  18. 2.7 Further Reading
  19. 2.8 Exercises
  20. 3. Processing Raw Text
  21. 3.1 Accessing Text from the Web and from Disk
  22. 3.2 Strings: Text Processing at the Lowest Level
  23. 3.3 Text Processing with Unicode
  24. 3.4 Regular Expressions for Detecting Word Patterns
  25. 3.5 Useful Applications of Regular Expressions
  26. 3.6 Normalizing Text
  27. 3.7 Regular Expressions for Tokenizing Text
  28. 3.8 Segmentation
  29. 3.9 Formatting: From Lists to Strings
  30. 3.10 Summary
  31. 3.11 Further Reading
  32. 3.12 Exercises
  33. 4. Writing Structured Programs
  34. 4.1 Back to the Basics
  35. 4.2 Sequences
  36. 4.3 Questions of Style
  37. 4.4 Functions: The Foundation of Structured Programming
  38. 4.5 Doing More with Functions
  39. 4.6 Program Development
  40. 4.7 Algorithm Design
  41. 4.8 A Sample of Python Libraries
  42. 4.9 Summary
  43. 4.10 Further Reading
  44. 4.11 Exercises
  45. 5. Categorizing and Tagging Words
  46. 5.1 Using a Tagger
  47. 5.2 Tagged Corpora
  48. 5.3 Mapping Words to Properties Using Python Dictionaries
  49. 5.4 Automatic Tagging
  50. 5.5 N-Gram Tagging
  51. 5.6 Transformation-Based Tagging
  52. 5.7 How to Determine the Category of a Word
  53. 5.8 Summary
  54. 5.9 Further Reading
  55. 5.10 Exercises
  56. 6. Learning to Classify Text
  57. 6.1 Supervised Classification
  58. 6.2 Further Examples of Supervised Classification
  59. 6.3 Evaluation
  60. 6.4 Decision Trees
  61. 6.5 Naive Bayes Classifiers
  62. 6.6 Maximum Entropy Classifiers
  63. 6.7 Modeling Linguistic Patterns
  64. 6.8 Summary
  65. 6.9 Further Reading
  66. 6.10 Exercises
  67. 7. Extracting Information from Text
  68. 7.1 Information Extraction
  69. 7.2 Chunking
  70. 7.3 Developing and Evaluating Chunkers
  71. 7.4 Recursion in Linguistic Structure
  72. 7.5 Named Entity Recognition
  73. 7.6 Relation Extraction
  74. 7.7 Summary
  75. 7.8 Further Reading
  76. 7.9 Exercises
  77. 8. Analyzing Sentence Structure
  78. 8.1 Some Grammatical Dilemmas
  79. 8.2 What’s the Use of Syntax?
  80. 8.3 Context-Free Grammar
  81. 8.4 Parsing with Context-Free Grammar
  82. 8.5 Dependencies and Dependency Grammar
  83. 8.6 Grammar Development
  84. 8.7 Summary
  85. 8.8 Further Reading
  86. 8.9 Exercises
  87. 9. Building Feature-Based Grammars
  88. 9.1 Grammatical Features
  89. 9.2 Processing Feature Structures
  90. 9.3 Extending a Feature-Based Grammar
  91. 9.4 Summary
  92. 9.5 Further Reading
  93. 9.6 Exercises
  94. 10. Analyzing the Meaning of Sentences
  95. 10.1 Natural Language Understanding
  96. 10.2 Propositional Logic
  97. 10.3 First-Order Logic
  98. 10.4 The Semantics of English Sentences
  99. 10.5 Discourse Semantics
  100. 10.6 Summary
  101. 10.7 Further Reading
  102. 10.8 Exercises
  103. 11. Managing Linguistic Data
  104. 11.1 Corpus Structure: A Case Study
  105. 11.2 The Life Cycle of a Corpus
  106. 11.3 Acquiring Data
  107. 11.4 Working with XML
  108. 11.5 Working with Toolbox Data
  109. 11.6 Describing Language Resources Using OLAC Metadata
  110. 11.7 Summary
  111. 11.8 Further Reading
  112. 11.9 Exercises
  113. Afterword: The Language Challenge
  114. Bibliography
  115. NLTK Index
  116. General Index
书名:Python自然语言处理(影印版)
国内出版社:东南大学出版社
出版时间:2010年09月
页数:479
书号:978-7-5641-2261-4
原版书出版商:O'Reilly Media
Steven Bird
 
Steven Bird是墨尔本大学计算机科学和软件工程系副教授,宾夕法尼亚大学语言学数据联盟高级研究助理。
 
 
Ewan Klein
 
Ewan Klein是爱丁堡大学信息学院语言技术教授。
 
 
Edward Loper
 
Edward Loper是毕业于宾夕法尼亚大学专注于机器学习的自然语言处理方向的博士,现在在波士顿的BBN Technologies担任研究员。
 
 
The animal on the cover of Natural Language Processing with Python is a right whale,
the rarest of all large whales. It is identifiable by its enormous head, which can measure
up to one-third of its total body length. It lives in temperate and cool seas in both
hemispheres at the surface of the ocean. It’s believed that the right whale may have
gotten its name from whalers who thought that it was the “right” whale to kill for oil.
Even though it has been protected since the 1930s, the right whale is still the most
endangered of all the great whales.
The large and bulky right whale is easily distinguished from other whales by the calluses
on its head. It has a broad back without a dorsal fin and a long arching mouth that begins above the eye. Its body is black, except for a white patch on its belly. Wounds
and scars may appear bright orange, often becoming infested with whale lice or
cyamids. The calluses—which are also found near the blowholes, above the eyes, and
on the chin, and upper lip—are black or gray. It has large flippers that are shaped like
paddles, and a distinctive V-shaped blow, caused by the widely spaced blowholes on
the top of its head, which rises to 16 feet above the ocean’s surface.
The right whale feeds on planktonic organisms, including shrimp-like krill and copepods.
As baleen whales, they have a series of 225–250 fringed overlapping plates hanging
from each side of the upper jaw, where teeth would otherwise be located. The plates
are black and can be as long as 7.2 feet. Right whales are “grazers of the sea,” often
swimming slowly with their mouths open. As water flows into the mouth and through
the baleen, prey is trapped near the tongue.
Because females are not sexually mature until 10 years of age and they give birth to a
single calf after a year-long pregnancy, populations grow slowly. The young right whale
stays with its mother for one year.
Right whales are found worldwide but in very small numbers. A right whale is commonly
found alone or in small groups of 1 to 3, but when courting, they may form
groups of up to 30. Like most baleen whales, they are seasonally migratory. They inhabit
colder waters for feeding and then migrate to warmer waters for breeding and calving.
Although they may move far out to sea during feeding seasons, right whales give birth
in coastal areas. Interestingly, many of the females do not return to these coastal breeding
areas every year, but visit the area only in calving years. Where they go in other
years remains a mystery.
The right whale’s only predators are orcas and humans. When danger lurks, a group
of right whales may come together in a circle, with their tails pointing outward, to deter
a predator. This defense is not always successful and calves are occasionally separated
from their mother and killed.
Right whales are among the slowest swimming whales, although they may reach speeds
up to 10 mph in short spurts. They can dive to at least 1,000 feet and can stay submerged
for up to 40 minutes. The right whale is extremely endangered, even after years of
protected status. Only in the past 15 years is there evidence of a population recovery
in the Southern Hemisphere, and it is still not known if the right whale will survive at
all in the Northern Hemisphere. Although not presently hunted, current conservation
problems include collisions with ships, conflicts with fishing activities, habitat destruction,
oil drilling, and possible competition from other whale species. Right whales
have no teeth, so ear bones and, in some cases, eye lenses can be used to estimate the
age of a right whale at death. It is believed that right whales live at least 50 years, but
there is little data on their longevity.