Automating the Construction of Authority
Files in Digital Libraries: A Case Study

James C. French 1 , Allison L. Powell 1 , Eric Schulman 2 and John L. Pfaltz 1
Email: ffrench---alp4g---jlpg@cs.virginia.edu, eschulma@nrao.edu
1 Department of Computer Science ???
University of Virginia
Charlottesville, Virginia 22903 USA
2 National Radio Astronomy Observatory y
520 Edgemont Road
Charlottesville, Virginia 229032475 USA

Abstract. The issue of quality control has become increasingly important 
as more online databases are integrated into digital libraries. This
can have a dramatic effect on the search effectiveness of an online system. 
Authority work, the need to discover and reconcile variant forms
of strings in bibliographic entries, will become more difficult. Spelling
variants, misspellings, translation and transliteration differences all increase 
the difficulty of retrieving information. This paper is a case study
of our efforts to automate the creation of an authority file for authors' institutional 
affiliations in the Astrophysics Data System. The techniques
surveyed here for the detection and categorization of variant forms have
broader applicability and may be used to help automate authority work
for other bibliographic fields.


References
1. H. A. Abt. Institutional Productivities. Publications of the Astronomical Society
of the Pacific, 105:794--798, 1993.
2. A. Accomazzi, G. Eichhorn, M. J. Kurtz, C. S. Grant, and S. S. Murray. The ADS
Article Service Data Holdings and Access Method. In G. Hunt and H. Payne, editors, 
Astronomical Data Analysis Software and Systems VI, volume 125 of A.S.P.
Conference Series, pages 357--360, 1997.
3. L. Auld. Authority Control: An EightyYear Review. Library Resources & Technical 
Services, 26:319--330, 1982.
4. C. L. Borgman and S. L. Siegfried. Getty's Synoname and its Cousins: A Survey
of Applications of Personal NameMatching Algorithms. Journal of the American
Society for Information Science, 43(7):459--476, 1992.
5. J. R. Davis. Creating a Networked Computer Science Technical Report Library.
DLib Magazine, Sept. 1995.
6. J. C. French, A. L. Powell, and E. Schulman. Applications of Approximate Word
Matching in Information Retrieval. In 6th International Conference on Information 
and Knowledge Management (CIKM'97), Las Vegas, Nevada, 1014 November
1997. (to appear).
7. P. A. V. Hall and G. R. Dowling. Approximate String Matching. Computing Surveys, 
12(4):381--402, Dec. 1980.
8. K. Kukich. Techniques for Automatically Correcting Words in Text. Computing
Surveys, 24(4):377--440, Dec. 1992.
9. R. Lowrance and R. A. Wagner. An Extension of the StringtoString Correction
Problem. Journal of the ACM, 22(2):177--183, Apr. 1975.
10. E. T. O'Neill and D. VizineGoetz. Quality Control in Online Databases. Annual
Review of Information Science and Technology, 23:125--156, 1988.
11. E. Schulman, J. C. French, A. L. Powell, S. S. Murray, G. Eichhorn, and M. J.
Kurtz. The Sociology of Astronomical Publication Using ADS and ADAMS. In
G. Hunt and H. Payne, editors, Astronomical Data Analysis Software and Systems
VI, volume 125 of A.S.P. Conference Series, pages 361--364, 1997.
12. E. Schulman, A. L. Powell, J. C. French, G. Eichhorn, M. J. Kurtz, and S. S. Murray. 
Using the ADS Database to Study Trends in Astronomical Publication. Bulletin 
of the American Astronomical Society, 28(4):1281, 1996.
13. S. L. Siegfried and J. Bernstein. Synoname: The Getty's New Approach to Pattern
Matching for Personal Names. Computers and the Humanities, 25(4):211--226,
1991.
14. D. M. Strong, Y. W. Lee, and R. Y. Wang. Data Quality in Context. Communications 
of the ACM, 40(5):103--110, May 1997.
15. A. G. Taylor. Authority Files in Online Catalogs: An Investigation of Their Value.
Cataloging & Classification Quarterly, 4(3):1--17, 1984.
16. V. Trimble. Postwar growth in the length of astronomical and other scientific
papers. Publications of the Astronomical Society of the Pacific, 96:1007--1016,
1984.
17. R. A. Wagner and M. J. Fischer. The StringtoString Correction Problem. Journal 
of the ACM, 21(1):168--173, Jan. 1974.
18. M. E. Williams and L. Lannom. Lack of Standardization of the Journal Title Data
Element in Databases. Journal of the American Society for Information Science,
32(3):229--233, May 1981.
19. J. Zobel and P. Dart. Phonetic String Matching: Lessons from Information Retrieval. 
In Proc. 19th Inter. Conf. on Research and Development in Information Retrieval (SIGIR'96), pages 166--172, Aug. 1996.

