PRIME: A System for Multilingual Patent Retrieval

Shigeto Higuchi y , Masatoshi Fukui y , Atsushi Fujii yy;yyy , and Tetsuya Ishikawa yy
y PATOLIS Corporation
2429 Shiohama Kotoku, 1350043, Japan
yy University of Library and Information Science
12 Kasuga Tsukuba, 3058550, Japan
yyy CREST, Japan Science and Technology Corporation
fujii@ulis.ac.jp

Abstract
Given the growing number of patents filed in multiple countries, users are interested in retrieving patents across languages.
We propose a multilingual patent retrieval system, which translates a user query into the target language, searches a
multilingual database for patents relevant to the query, and improves the browsing efficiency by way of machine translation
and clustering. Our system also extracts new translations from patent families consisting of comparable patents, to enhance
the translation dictionary.


References
Lisa Ballesteros and W. Bruce Croft. 1998. Resolving 
ambiguity for crosslanguage retrieval. In Proceedings of
the 21st Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval,
pages 64--71.
Eric Brill. 1995. Transformationbased errordriven learning 
and natural language processing: A case study
in partofspeech tagging. Computational Linguistics,
21(4):543--565.
Jaime G. Carbonell, Yiming Yang, Robert E. Frederking,
Ralf D. Brown, Yibing Geng, and Danny Lee. 1997.
Translingual information retrieval: A comparative evaluation. 
In Proceedings of the 15th International Joint
Conference on Artificial Intelligence, pages 708--714.
Christiane Fellbaum, editor. 1998. WordNet: An Electronic 
Lexical Database. MIT Press.
Gene Ferber. 1989. EnglishJapanese, JapaneseEnglish
Dictionary of Computer and DataProcessing Terms.
MIT Press.
Atsushi Fujii and Tetsuya Ishikawa. 1999. Cross-language
information retrieval for technical documents. In Proceedings 
of the Joint ACL SIGDAT Conference on Empirical 
Methods in Natural Language Processing and
Very Large Corpora, pages 29--37.
Atsushi Fujii and Tetsuya Ishikawa. 2001. Evaluating
multilingual information retrieval and clustering at
ULIS. In Proceedings of the 2nd NTCIR Workshop Meeting 
on Evaluation of Chinese & Japanese Text Retrieval
and Text Summarization.
Atsushi Fujii and Tetsuya Ishikawa. (To appear).
Japanese/English crosslanguage information retrieval:
Exploration of query translation and transliteration.
Computers and the Humanities.
Masatoshi Fukui, Shigeto Higuchi, Youichi Nakatani,
Masao Tanaka, Atsushi Fujii, and Tetsuya Ishikawa.
2000. Applying a hybrid query translation method to
Japanese/English crosslanguage patent retrieval. In
ACM SIGIR Workshop on Patent Retrieval.
Julio Gonzalo, Felisa Verdejo, Carol Peters, and Nico
letta Calzolari. 1998. Applying EuroWordNet to cross
language text retrieval. Computers and the Humanities,
32:185--207.
Makoto Iwayama and Takenobu Tokunaga. 1995. Hierarchical 
Bayesian clustering for automatic text classification. 
In Proceedings of the 14th International Joint
Conference on Artificial Intelligence, pages 1322--1327.
Noriko Kando, Kazuko Kuriyama, and Toshihiko Nozue.
1999. NACSIS test collection workshop (NTCIR1). In
Proceedings of the 22nd Annual International ACM SI
GIR Conference on Research and Development in Information 
Retrieval, pages 299--300.
Michael L. Littman, Susan T. Dumais, and Thomas K.
Landauer. 1998. Automatic crosslanguage information
retrieval using latent semantic indexing. In Gregory
Grefenstette, editor, CrossLanguage Information Retrieval, 
chapter 5, pages 51--62. Kluwer Academic Pub
lishers.
Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita,
Yoshitaka Hirano, Hiroshi Matsuda, and Masayuki Asahara. 
1999. Japanese morphological analysis system
ChaSen version 2.0 manual 2nd edition. Technical Report NAISTISTR99009, NAIST.
J. Scott McCarley. 1999. Should we translate the documents 
or the queries in crosslanguage information retrieval? 
In Proceedings of the 37th Annual Meeting of the
Association for Computational Linguistics, pages 208--
214.
JianYun Nie, Michel Simard, Pierre Isabelle, and Richard
Durand. 1999. Crosslanguage information retrieval
based on parallel texts and automatic mining of parallel
texts from the Web. In Proceedings of the 22nd Annual
International ACM SIGIR Conference on Research and
Development in Information Retrieval, pages 74--81.
Douglas W. Oard and Philip Resnik. 1999. Support for interactive 
document selection in crosslanguage informa
tion retrieval. Information Processing & Management,
35(3):363--379.
Douglas W. Oard. 1998. A comparative study of query
and document translation for crosslanguage information
retrieval. In Proceedings of the 3rd Conference of the
Association for Machine Translation in the Americas,
pages 472--483.
S. E. Robertson and S. Walker. 1994. Some simple effective
approximations to the 2poisson model for probabilistic
weighted retrieval. In Proceedings of the 17th Annual
International ACM SIGIR Conference on Research and
Development in Information Retrieval, pages 232--241.
Gerard Salton. 1970. Automatic processing of foreign language 
documents. Journal of the American Society for
Information Science, 21(3):187--194.
Frank Smadja, Kathleen R. McKeown, and Vasileios Hatzi
vassiloglou. 1996. Translating collocations for bilingual
lexicons: A statistical approach. Computational Linguistics, 
22(1):1--38.
Kaoru Yamamoto and Yuji Matsumoto. 2000. Acquisition 
of phraselevel bilingual correspondence using dependency 
structure. In Proceedings of the 18th International 
Conference on Computational Linguistics, pages
933--939.

