Evaluating Multilingual Information Retrieval and Clustering at ULIS

Atsushi Fujii y;yy Tetsuya Ishikawa y
y University of Library and Information Science
12 Kasuga, Tsukuba, 3058550, Japan
yy CREST, Japan Science and Technology Corporation
fujii@ulis.ac.jp

Abstract
This paper describes our retrieval system for
NTCIR2 Japanese/English CLIR and MLIR tasks. We
integrate query and document translation with monolingual 
retrieval to improve retrieval accuracy, and
perform clustering to improve browsing efficiency. We
also introduce an entropydriven technique in evaluating 
clustering methods.


References
[1] L. Ballesteros and W. B. Croft. Resolving ambiguity 
for crosslanguage retrieval. In Proceedings of the
21st Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval,
pages 64--71, 1998.
[2] J. G. Carbonell, Y. Yang, R. E. Frederking, R. D.
Brown, Y. Geng, and D. Lee. Translingual information
retrieval: A comparative evaluation. In Proceedings of
the 15th International Joint Conference on Artificial
Intelligence, pages 708--714, 1997.
[3] C. Fellbaum, editor. WordNet: An Electronic Lexical
Database. MIT Press, 1998.
[4] A. Fujii and T. Ishikawa. Crosslanguage information
retrieval at ULIS. In Proceedings of the 1st NTCIR
Workshop on Research in Japanese Text Retrieval and
Term Recognition, pages 163--169, 1999.
[5] A. Fujii and T. Ishikawa. Crosslanguage information 
retrieval for technical documents. In Proceedings 
of the Joint ACL SIGDAT Conference on Empirical 
Methods in Natural Language Processing and Very
Large Corpora, pages 29--37, 1999.
[6] A. Fujii and T. Ishikawa. Applying machine translation 
to twostage crosslanguage information retrieval.
In Proceedings of the 4th Conference of the Association 
for Machine Translation in the Americas, pages
13--24, 2000.
[7] A. Fujii and T. Ishikawa. Japanese/English cross
language information retrieval: Exploration of query
translation and transliteration. Computers and the Humanities, 
(To appear).
[8] M. Fukui, S. Higuchi, Y. Nakatani, M. Tanaka, A. Fujii, 
and T. Ishikawa. Applying a hybrid query translation 
method to Japanese/English crosslanguage patent
retrieval. In ACM SIGIR Workshop on Patent Retrieval, 2000.
[9] J. Gonzalo, F. Verdejo, C. Peters, and N. Calzo
lari. Applying EuroWordNet to crosslanguage text
retrieval. Computers and the Humanities, 32:185--207,
1998.
[10] M. Iwayama and T. Tokunaga. Hierarchical Bayesian
clustering for automatic text classification. In Proceedings 
of the 14th International Joint Conference on
Artificial Intelligence, pages 1322--1327, 1995.
[11] M. L. Littman, S. T. Dumais, and T. K. Landauer.
Automatic crosslanguage information retrieval using
latent semantic indexing. In G. Grefenstette, editor, 
Cross Language Information Retrieval, chapter 5,
pages 51--62. Kluwer Academic Publishers, 1998.
[12] Y. Matsumoto, A. Kitauchi, T. Yamashita, Y. Hirano,
H. Matsuda, and M. Asahara. Japanese morphological
analysis system ChaSen version 2.0 manual 2nd edition. 
Technical Report NAISTISTR99009, NAIST,
1999.
[13] J. S. McCarley. Should we translate the documents or
the queries in crosslanguage information retrieval? In
Proceedings of the 37th Annual Meeting of the Association 
for Computational Linguistics, pages 208--214,
1999.
[14] J.Y. Nie, M. Simard, P. Isabelle, and R. Durand.
Crosslanguage information retrieval based on parallel
texts and automatic mining of parallel texts from the
Web. In Proceedings of the 22nd Annual International
ACM SIGIR Conference on Research and Development 
in Information Retrieval, pages 74--81, 1999.
[15] D. W. Oard. A comparative study of query and document 
translation for crosslanguage information retrieval. 
In Proceedings of the 3rd Conference of the
Association for Machine Translation in the Americas,
pages 472--483, 1998.
[16] D. W. Oard and P. Resnik. Support for interactive 
document selection in crosslanguage information 
retrieval. Information Processing & Management,
35(3):363--379, 1999.
[17] S. E. Robertson and S. Walker. Some simple effective 
approximations to the 2poisson model for probabilistic
 weighted retrieval. In Proceedings of the
17th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval,
pages 232--241, 1994.
[18] G. Salton. Automatic processing of foreign language
documents. Journal of the American Society for Information 
Science, 21(3):187--194, 1970.

