CorpusBased Identification of NonAnaphoric Noun Phrases

David L. Bean and Ellen Riloff
Department of Computer Science
University of Utah
Salt Lake City, Utah 84112
fbean,riloffg@cs.utah.edu

Abstract
Coreference resolution involves finding antecedents
for anaphoric discourse entities, such as definite
noun phrases. But many definite noun phrases are
not anaphoric because their meaning can be understood 
from general world knowledge (e.g., ``the
White House'' or ``the news media''). We have
developed a corpusbased algorithm for automatically 
identifying definite noun phrases that are
nonanaphoric, which has the potential to improve
the efficiency and accuracy of coreference resolution 
systems. Our algorithm generates lists of non
anaphoric noun phrases and noun phrase patterns
from a training corpus and uses them to recognize
nonanaphoric noun phrases in new texts. Using
1600 MUC4 terrorism news articles as the training
corpus, our approach achieved 78% recall and 87%
precision at identifying such noun phrases in 50 test
documents.

References
James Allen. 1995. Natural Language Understanding. 
Benjamin/Cummings Press, Redwood City,
CA.
Chinatsu Aone and Scott William Bennett. 1996.
Applying Machine Learning to Anaphora Resolution. 
In Connectionist, Statistical, and Symbolic 
Approaches to Learning for Natural Language 
Understanding, pages 302--314. Springer
Verlag, Berlin.
Andrew Kehler. 1997. Probabilistic coreference in
information extraction. In Proceedings of the Second 
Conference on Empirical Methods in Natural
Language Processing (EMNLP97).
7 Case sensitive text can have a significant positive effect 
on performance because it helps to identify proper
nouns. Proper nouns can then be used to look for restrictive 
premodification, something that our system cannot
take advantage of because the MUC4 corpus is entirely
in uppercase.
Christopher Kennedy and Branimir Boguraev. 1996.
Anaphor for everyone: Pronomial anaphora resolution 
without a parser. In Proceedings of the 16th
International Conference on Computational Linguistics 
(COLING96).
Shalom Lappin and Herbert J. Leass. 1994. An algorithm 
for pronomial anaphora resolution. Computational 
Linguistics, 20(4):535--561.
Joseph F. McCarthy and Wendy G. Lehnert. 1995.
Using Decision Trees for Coreference Resolution.
In Proceedings of the 14th International Joint
Conference on Artificial Intelligence (IJCAI95),
pages 1050--1055.
Ellen F. Prince. 1981. Toward a taxonomy of given
new information. In Peter Cole, editor, Radical
Pragmatics, pages 223--255. Academic Press.
Brian Roark and Eugene Charniak. 1998. Noun
phrase cooccurence statistics for semiautomatic
semantic lexcon construction. In Proceedings of
the 36th Annual Meeting of the Association for
Computational Linguistics.
R. Vieira and M. Poesio. 1997. Processing definite 
descriptions in corpora. In S. Botley and
M. McEnery, editors, Corpusbased and Computational 
Approaches to Discourse Anaphora. UCL
Press.