LANGUAGE MODELING FOR MULTIDOMAIN SPEECHDRIVEN TEXT RETRIEVAL

Katunobu Itou 1 , Atsushi Fujii 2 , Tetsuya Ishikawa 2
1 National Institute of Advanced Industrial Science and Technology
111 Chuuou Daini Umezono, Tsukuba, 3058568, Japan, Email: itou@ni.aist.go.jp
2 University of Library and Information Science
12 Kasuga, Tsukuba, 3058550, Japan, Email: {fujii,ishikawa}@ulis.ac.jp

ABSTRACT
We report experimental results associated with speechdriven text
retrieval, which facilitates retrieving information in multiple domains 
with spoken queries. Since users speak contents related to
a target collection, we produce language models used for speech
recognition based on the target collection, so as to improve both
the recognition and retrieval accuracy. Experiments using existing
test collections combined with dictated queries showed the effectiveness 
of our method.


5. REFERENCES
[1] John S. Garofolo, Ellen M. Voorhees, Vincent M. Stanford,
and Karen Sparck Jones, ``TREC6 1997 spoken document
retrieval track overview and results,'' in Proceedings of the
6th Text REtrieval Conference, 1997, pp. 83--91.
[2] J. Barnett, S. Anderson, J. Broglio, M. Singh, R. Hudson, and
S. W. Kuo, ``Experiments in spoken queries for document
retrieval,'' in Proceedings of Eurospeech97, 1997, pp. 1323--
1326.
[3] Fabio Crestani, ``Word recognition errors and relevance feed
back in spoken query processing,'' in Proceedings of the
Fourth International Conference on Flexible Query Answering Systems, 2000, pp. 267--281.
[4] Lalit. R. Bahl, Frederick Jelinek, and Robert L. Mercer, ``A
maximum linklihood approach to continuous speech recognition,
'' IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 5, no. 2, pp. 179--190, 1983.
[5] T. Kawahara, A. Lee, T. Kobayashi, K. Takeda, N. Minematsu, 
S. Sagayama, K. Itou, A. Ito, M. Yamamoto, A. Ya
mada, T. Utsuro, and K. Shikano, ``Free software toolkit for
Japanese large vocabulary continuous speech recognition,''
in Proceedings of the 6th International Conference on Spoken 
Language Processing, 2000, pp. 476--479.
[6] K. Itou, M. Yamamoto, K. Takeda, T. Takezawa, T. Matsuoka, 
T. Kobayashi, K. Shikano, and S. Itahashi, ``The design 
of the newspaperbased Japanese large vocabulary continuous 
speech recognition corpus,'' in ICSLP98, 1998, pp.
3261--3264.
[7] S. E. Robertson and S. Walker, ``Some simple effective 
approximations to the 2poisson model for probabilistic 
weighted retrieval,'' in Proceedings of the 17th Annual
International ACM SIGIR Conference on Research and Development 
in Information Retrieval, 1994, pp. 232--241.
[8] Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Yoshitaka 
Hirano, Hiroshi Matsuda, and Masayuki Asahara,
``Japanese morphological analysis system ChaSen version
2.0 manual 2nd edition,'' Tech. Rep. NAISTISTR99009,
NAIST, 1999.
[9] National Institute of Informatics, Proceedings of the 2nd
NTCIR Workshop Meeting on Evaluation of Chinese &
Japanese Text Retrieval and Text Summarization, 2001.
[10] Satoshi Sekine and Hitoshi Isahara, ``IREX: IR and IE evaluation 
project in Japanese,'' in Proceedings of the 2nd International 
Conference on Language Resources and Evaluation,
2000, pp. 1475--1480.