A CorpusBased Approach for Building Semantic Lexicons

Ellen Riloff and Jessica Shepherd
Department of Computer Science
University of Utah
Salt Lake City, UT 84112
riloff@cs.utah.edu

Abstract
Semantic knowledge can be a great asset to
natural language processing systems, but
it is usually handcoded for each application. 
Although some semantic information
is available in generalpurpose knowledge
bases such as WordNet and Cyc, many applications 
require domainspecific lexicons
that represent words and categories for a
particular topic. In this paper, we present
a corpusbased method that can be used
to build semantic lexicons for specific categories. 
The input to the system is a small
set of seed words for a category and a representative 
text corpus. The output is a
ranked list of words that are associated
with the category. A user then reviews the
topranked words and decides which ones
should be entered in the semantic lexicon.
In experiments with five categories, users
typically found about 60 words per category 
in 1015 minutes to build a core semantic lexicon.



References
Berwick, Robert C. 1989. Learning Word Meanings from 
Examples. In Semantic Structures: Advances 
in Natural Language Processing. Lawrence
Erlbaum Associates, chapter 3, pages 89--124.
Brill, E. 1994. Some Advances in Rulebased Part of
Speech Tagging. In Proceedings of the Twelfth National 
Conference on Artificial Intelligence, pages
722--727. AAAI Press/The MIT Press.
Carbonell, J. G. 1979. Towards a SelfExtending
Parser. In Proceedings of the 17th Annual Meeting
of the Association for Computational Linguistics,
pages 3--7.
Cardie, C. 1993. A CaseBased Approach to
Knowledge Acquisition for DomainSpecific Sentence 
Analysis. In Proceedings of the Eleventh National 
Conference on Artificial Intelligence, pages
798--803. AAAI Press/The MIT Press.
Church, K. 1989. A Stochastic Parts Program and
Noun Phrase Parser for Unrestricted Text. In Proceedings 
of the Second Conference on Applied Natural Language Processing.
Granger, R. H. 1977. FOULUP: A Program that
Figures Out Meanings of Words from Context. In
Proceedings of the Fifth International Joint Conference 
on Artificial Intelligence, pages 172--178.
Hastings, P. and S. Lytinen. 1994. The Ups and
Downs of Lexical Acquisition. In Proceedings of
the Twelfth National Conference on Artificial Intelligence, 
pages 754--759. AAAI Press/The MIT Press.
Jacobs, P. and U. Zernik. 1988. Acquiring Lexical
Knowledge from Text: A Case Study. In Proceedings 
of the Seventh National Conference on
Artificial Intelligence, pages 739--744.
Lehnert, W., C. Cardie, D. Fisher, J. McCarthy,
E. Riloff, and S. Soderland. 1992. University 
of Massachusetts: Description of the CIR
CUS System as Used for MUC4. In Proceedings
of the Fourth Message Understanding Conference
(MUC4), pages 282--288, San Mateo, CA. Morgan Kaufmann.
Lenat, D. B., M. Prakash, and M. Shepherd. 1986.
CYC: Using Common Sense Knowledge to Overcome 
Brittleness and KnowledgeAcquisition Bottlenecks. AI Magazine, 6:65--85.
Miller, G. 1990. Wordnet: An Online Lexical
Database. International Journal of Lexicography,
3(4).
MUC4 Proceedings. 1992. Proceedings of
the Fourth Message Understanding Conference
(MUC4). Morgan Kaufmann, San Mateo, CA.
Weischedel,
R., M. Meteer, R. Schwartz, L. Ramshaw, and
J. Palmucci. 1993. Coping with Ambiguity and
Unknown Words through Probabilistic Models.
Computational Linguistics, 19(2):359--382.
Yarowsky, D. 1992. Word sense disambiguation using 
statistical models of Roget's categories trained
on large corpora. In Proceedings of the Fourteenth
International Conference on Computational Linguistics (COLING92), pages 454--460.

