《自然语言标注——用于机器学习（影印版）》—

自然语言标注——用于机器学习（影印版）

出版时间：2013年06月

页数：344

“语言标注是自然语言处理的关键环节，但是它很少在计算语言学课程中被提及。这是第一本手把手讲解标注的书籍，从规范和设计到使用机器学习算法面面俱到。它必然成为本科和研究生的计算语言学课程的范本。”
——Nancy Ide
Vassar学院的计算机科学教授

本书是O’Reilly出版社的《Python自然语言处理》的最佳伴读书籍。

是时候创建属于你自己的用于机器学习的自然语言训练语料库了。无论你使用英语、汉语或者其他任何一种自然语言，本书都可以手把手地指导你一种经验证的标注开发周期——把元语添加到你的训练语料库中来帮助机器学习算法更有效工作的过程。你无需任何编程或者语言学方面的经验就可以上手。
通过每一步中的详细示例，你将学到“标注开发过程”是如何帮助你建模、标注、训练、测试、评估和修正你的训练语料库。你也将了解到一个实际标注项目的完整演示。

· 在收集你的数据集（语料库）之前定义一个清晰的标注目标
· 学习用于分析你的语料库中语言内容的工具
· 搭建用于你的标注项目的模型和规范
· 检查从基本的XML到语言标记框架这样一些不同的标注格式
· 创建适合于训练和测试机器学习算法的黄金标准语料库
· 选择用来处理你的标注数据的机器学习算法
· 评估测试结果并修正你的标注任务
· 学习如何使用用于标注文本和调整标注的轻量级软件

James Pustejovsky是Brandeis大学的教授，他在该大学的计算机科学系讲解和研究人工智能及计算语言学。
Amber Stubbs刚刚获得了Brandeis大学标注方法论的博士学位。她现在是SUNY Albany大学的博士后。

目录
产品信息
关于作者
封面介绍

Chapter 1: The Basics
The Importance of Language Annotation
A Brief History of Corpus Linguistics
Language Data and Machine Learning
The Annotation Development Cycle
Summary
Chapter 2: Defining Your Goal and Dataset
Defining Your Goal
Background Research
Assembling Your Dataset
The Size of Your Corpus
Summary
Chapter 3: Corpus Analytics
Basic Probability for Corpus Analytics
Counting Occurrences
Language Models
Summary
Chapter 4: Building Your Model and Specification
Some Example Models and Specs
Adopting (or Not Adopting) Existing Models
Different Kinds of Standards
Summary
Chapter 5: Applying and Adopting Annotation Standards
Metadata Annotation: Document Classification
Text Extent Annotation: Named Entities
Linked Extent Annotation: Semantic Roles
ISO Standards and You
Summary
Chapter 6: Annotation and Adjudication
The Infrastructure of an Annotation Project
Specification Versus Guidelines
Be Prepared to Revise
Preparing Your Data for Annotation
Writing the Annotation Guidelines
Annotators
Choosing an Annotation Environment
Evaluating the Annotations
Creating the Gold Standard (Adjudication)
Summary
Chapter 7: Training: Machine Learning
What Is Learning?
Defining Our Learning Task
Classifier Algorithms
Sequence Induction Algorithms
Clustering and Unsupervised Learning
Semi-Supervised Learning
Matching Annotation to Algorithms
Summary
Chapter 8: Testing and Evaluation
Testing Your Algorithm
Evaluating Your Algorithm
Problems That Can Affect Evaluation
Final Testing Scores
Summary
Chapter 9: Revising and Reporting
Revising Your Project
Reporting About Your Work
Summary
Chapter 10: Annotation: TimeML
The Goal of TimeML
Related Research
Building the Corpus
Model: Preliminary Specifications
Annotation: First Attempts
Model: The TimeML Specification Used in TimeBank
Annotation: The Creation of TimeBank
TimeML Becomes ISO-TimeML
Modeling the Future: Directions for TimeML
Summary
Chapter 11: Automatic Annotation: Generating TimeML
The TARSQI Components
Improvements to the TTK
TimeML Challenges: TempEval-2
Future of the TTK
Summary
Chapter 12: Afterword: The Future of Annotation
Crowdsourcing Annotation
Handling Big Data
NLP Online and in the Cloud
And Finally...
Appendix: List of Available Corpora and Specifications
Corpora
Specifications, Guidelines, and Other Resources
Representation Standards
Appendix List of Software Resources
Annotation and Adjudication Software
Machine Learning Resources
Appendix MAE User Guide
Installing and Running MAE
Loading Tasks and Files
Saving Files
Defining Your Own Task
Frequently Asked Questions
Appendix: MAI User Guide
Installing and Running MAI
Loading Tasks and Files
Adjudicating
Saving Files
Appendix Bibliography
References for Using Amazon’s Mechanical Turk/Crowdsourcing

书名：自然语言标注——用于机器学习（影印版）

作者：James Pustejovsky, Amber Stubbs 著

国内出版社：东南大学出版社

出版时间：2013年06月

页数：344

书号：978-7-5641-4281-0

原版书书名：Natural Language Annotation for Machine Learning

原版书出版商：O'Reilly Media

James Pustejovsky

James Pustejovsky是布兰迪斯大学计算机科学系教授，从事人工智能和计算语言学领域的教学和研究工作。

James Pustejovsky teaches and does research in Artificial Intelligence and Computational Linguistics in the Computer Science Department at Brandeis University. His main areas of interest include: lexical meaning, computational semantics, temporal and spatial reasoning, and corpus linguistics. He is active in the development of standards for interoperability between language processing applications, and lead the creation of the recently adopted ISO standard for time annotation, ISO-TimeML. He is currently heading the development of a standard for annotating spatial information in language. More information on publications and research activities can be found at his webpage: pusto.com.

查看James Pustejovsky更多信息

Amber Stubbs

Amber Stubbs博士于2013年在布兰迪斯大学计算机科学系取得博士学位，其博士论文的主题是自然语言标注方法论。之后Amber Stubbs博士任纽约州立大学阿尔巴尼分校博士后研究员，目前是波士顿西蒙斯学院图书馆与信息科学学院及计算机科学专业的助理教授。

Amber Stubbs recently completed her Ph.D. in Computer Science at Brandeis University, and is currently a Postdoctoral Associate at SUNY Albany. Her dissertation focused on creating an annotation methodology to aid in extracting high-level information from natural language files, particularly biomedical texts. Her website can be found at http://pages.cs.brandeis.edu/~astubbs/

查看Amber Stubbs更多信息

The animal on the cover of Natural Language Annotation for Machine Learning is the cockatiel (Nymphicus hollandicus). Their scientific name came about from European travelers who found the birds so beautiful, they named them for mythical nymphs. Hollandicus refers to “New Holland,” an older name for Australia, the continent to which these birds are native. In the wild, cockatiels can be found in arid habitats like brushland or the outback, yet they remain close to water. They are usually seen in pairs, though flocks will congregate around a single body of water.

Until six to nine months after hatching, female and male cockatiels are indistinguishable, as both have horizontal yellow stripes on the surface of their tail feathers and a dull orange patch on each cheek. When molting begins, males lose some white or yellow feathers and gain brighter yellow feathers. In addition, the orange patches on the face become much more prominent. The lifespan of a cockatiel in captivity is typically 15–20 years, but they generally live between 10–30 years in the wild.

The cockatiel was considered either a parrot or a cockatoo for some time, as scientists and biologists hotly debated which bird it actually was. It is now classified as part of the cockatoo family because they both have the same biological features—namely, upright crests, gallbladders, and powder down (a special type of feather where the tips of barbules disintegrate, forming a fine dust among the feathers).

购买选项

定价：54.00元

书号：978-7-5641-4281-0

出版社：东南大学出版社

联系出版社邮购