Semantic Authoring, Annotation and Knowledge Markup Workshop
co-located with the 5th International Conference on Knowledge Capture (K-CAP 2009), Redondo Beach, California, USA, September 1 2009
Invited Talk
Putting Interpretive Semantics into Web Content
Eduard Hovy
Information Sciences Institute
University of Southern California
http://www.isi.edu/~hovy
How can information (typically, text) on the web be 'enriched' with semantics to make it machine-interpretable?
People are good at inferring the semantics of what they read on the web—whether the material be structured or unstructured. But the less structured the material, the harder it is for machines to do so. Despite excellent work in the Semantic Web community to exploit linkage structure and other structural elements of web information, raw text typically remains beyond our reach. Only one approach seems open to us: developing systems to add semantics automatically. But since manual development can be time-consuming, one can use human annotators to manually add semantics to selected material, upon which machine learning algorithms can be trained to mimic the human work and add semantics to remaining material of the same general domain and genre.
Annotation, it turns out, is not as simple as one might think. Doing it properly—to ensure that the results are repeatable, trustworthy, consistent, etc.—requires attention to issues as varied as the nature and complexity of the semantics (the annotation labels), the choice of material to annotate, the training of annotators, the interface design, and so on. This part of the talk outlines the seven questions of an emerging 'science of annotation'.
Once annotation is done, which machine learning algorithm(s) are appropriate? Are some better than others? How can they be folded into the annotation procedure so as to avoid unnecessary annotation? This part of the talk surveys some algorithms and discusses various forms of Active Learning.
Putting it all together, I show an example of the whole process, applied in a case study in Medical Informatics that is being performed at ISI on about 100,000 neuroscience research articles provided by Elsevier. Though this is not web text, the study does illustrate the issues, potential uses, and shortcomings of human addition of semantics into unstructured (web and other) text.
Eduard Hovy directs the Natural Language Research Group at USC's Information Sciences Institute and serves as Deputy Director of the Intelligent Systems Division and as research associate professor of the Computer Science Department. He also directs the DHS Center for Knowledge Integration and Discovery at the University of Southern California and is Director of Research of its Digital Government Research Center. He completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987. His research focuses on information extraction, automated text summarization, the semi-automated construction of large lexicons and ontologies, machine translation, question answering, and digital government. Dr. Hovy regularly serves in an advisory capacity to funders of NLP research in the US and EU. He is the author or co-editor of five books and over 180 technical articles. In 2001 Dr. Hovy served as President of the Association for Computational Linguistics (ACL) and in 2001–03 as President of the International Association of Machine Translation (IAMT); he currently serves as President of the Digital Government Society of North America (DGSNA). Dr. Hovy regularly co-teaches a specialized course in the Computer Science Department of the University of Southern California, as well as occasional short courses on MT and other topics at universities and conferences. He actively advises Ph.D. students, student visitors, and faculty on sabbatical, and has served on the Ph.D. and M.S. committees for students from USC, Carnegie Mellon University, Taiwan National U, the Universities of Toronto, Karlsruhe, Pennsylvania, Stockholm, Waterloo, Nijmegen, Pretoria, and Ho Chi Minh City.
http://www.isi.edu/natural-language/nlp-at-isi.html
http://www.isi.edu/~hovy







