Genetic Algorithms to Optimise CBR Retrieval*

Jacek Jarmulak1, Susan Craw1, and Ray Rowe2

1 School of Computer and Mathematical Sciences

The Robert Gordon University, Aberdeen, AB25 iHO, UK
{jacek,s.craw}@scms.rgu.ac.uk
2 AstraZeneca, Silk Road Business Park

Macclesfield, Cheshire SK10 2NA, UK




Abstract. Knowledge in a case-based reasoning (CBR) system is often
more extensive than simply the cases, therefore knowledge engineering
may still be very demanding. This paper offers a first step towards an
automated knowledge acquisition and refinement tool for non-case CBR
knowledge. A data-driven approach is presented where a Genetic Algorithm 
learns effective feature selection for inducing case-base index, and
feature weights for similarity measure for case retrieval. The optimisation 
can be viewed as knowledge acquisition or maintenance depending
on whether knowledge is being created or refined. Opt imising CI3R retrieval 
is achieved using cases from the case-base and only minimal expert
input, and so can he easily applied to an evolving case-base or a changing
environment. Experiments with a real tablet formulation problem show
the gains of simultaneously optimising the index and similarity measure.
Provided that the available data represents the problem domain well, the
optimisation has good generalisation properties and the domain knowledge 
extracted is comparable to expert knowledge.
References

1.	D. Aha. Feature weighting for lazy learning algorithms. In II. Liu and H. Motoda,
editors, Feature Extraction, Construction and Selection: A Data Mining Perspective. 
Norwell MA: Kiuwer, 1998.
2.	D. W. Aha and R. L. Bankert. Feature selection for case-based classification of
cloud types: An empirical comparison. In Proceedings of the AAAI-94 Workshop
on Case-Based Reasoning, pages 106112. AAAI Press, Seattle, 1994.
3.	H. Almuallim and T. G. Dietterich. Efficient algorithms for identifying relevant
features. In Proceedings of the Ninth Conference on Artificial Intelligence, pages
3845. Morgan Kaufman, Vancouver, 1992.
4.	A. L. Blum and P. Langley. Selection of relevant features and examples in machine
learning. Artificial Intelligence, 97(1-2) :245271, 1997.
5.	C. Cardie. Using decision trees to improve case-based learning. In Proceedings of
the 10th ICML, pages 2532. Morgan Kaufmann, 1993.
6.	S. Craw, N. Wiratunga, and R. Rowe. Case-based design for tablet formulation.
In Proceedings of the Fourth European Workshop on Case-Based Reasoning, pages
358--369, Dublin, Eire, 1998. Springer.
7.	P. Cunningham and A. Bonzano. Knowledge engineering issues in developing a
ease-based reasoning application. Knowledge-Based Systems, 12, 1999.
8.	J. Jarmulak and S. Craw. Genetic algorithms for feature selection and weighting.
In S. S. Anand, A. Aamodt, and D. W. Aha, editors, IJCAI-99 Workshop on
Automating the Construction of Case-Based Reasoners, pages 2833, 1999.
9.	C. John, R. Kohavi, and K. Pileger. Irrelevant features and the subset selection
problem, In W. W. Cohen and H. Hirsh, editors, Machine Learning: Proceedings
of the 11th International Conference, pages 121-129. Morgan Kaufmann, 1994.
[0.	J. D. Kelly and L. Davis. A hybrid genetic algorithm for classification. In Proceedings 
of the 12th IJCAI, pages 645650, Sidney, Australia, 1991.
11.	R. Kohavi, P. Langley, and Y. Yun. The utility of feature weighting in nearest-neighbor 
algorithms. In Proceedings of the European Conference on Machine Learn-
ing (ECML-97), 1997.
[2.	I. Kononenko. Estimating attributes: Analysis and extenstions of relief. In Proceedings 
of the European Conference on Machine Learning (ECML-97), Catania,
Italy, 1994.
13.	C. Gatley, J. Tait, and J. MacIntyre. A case-based reasoning tool for vibration
analysis. In R. Milne, A. Macintosh, and M. Bramer, editors, Applications and
Innovations in Expert Systems VI: Proceedings of the BCS Expert Systems 98
Conference, pages 132146, Cambridge, December 1998, 1998. Springer-Verlag.
14.	M. M. Richter. Introduction. In M. Lenz, B. Bartsch-Sprl, H.-D. Burkhard, and
S. Wess, editors, Case-Based Reasoning Technology: From Foundations to Appli-
cations, Lecture Notes in Artificial Intelligence 1400. Springer Verlag, 1998.
15.	D. B. Skalak. Prototype and feature selection by sampling and random mutation
hill-climbing algorithms. In Proceedings of the Eleventh International Conference
on Machine Learning, pages 293301, New Brunswick, New Jersey, 1994.
[6.	D. Wettchereck and D. W. Aba. Weighting features. In Proceedings of the 1st
International Conference on CBR (ICCBR-95), pages 347358, 1995.
17.	D. R. Wilson and T. R. Martinez. Instance-based learning with genetically derived
attribute weights. In Proceedings of the International Conference on Artificial
Intelligence, Expert Systems, and Neural Networks (AIE96), pages 1114, 1996.
18.	D. It. Wilson and T. R. Martinez. Improved heterogenous distance functions.
Journal of Artificial Intelligence Research, 6:134, 1997.
19.	J. Yang and V. Honavar. Feature subset selection using a genetic algorithm. In
Motoda and Liu, editors, Feature Extraction, Construction and Selection - A Data
Mining Perspective. Klnwer, 1998.
