LEARNING TO LEARN :  The acquisition, consolidation, and transfer 
                          of task knowledge within neural networks 

                                    A REFERENCE LIST 

COMPILED BY:   Daniel L. Silver, Lorien Y. Pratt, and Jonathn Baxter
 Maintained by Danny Silver, Department of Computer Science, 
 University of Western Ontario, London, Ontario N6A 5B7, Canada.
 phone: (519)473-6168, email: dsilver@csd.uwo.ca

MOTIVATION: 
 The majority of efforts in neural network research, and more generally 
 in inductive learning, have focussed on a ``tabula rasa'' approach:
 acquired concepts are based solely on a set of training examples.  These
 approaches do not take into account any previously learned representational
 information or search experience.  Recently, there have been a number
 of efforts in a variety of areas that consider how best to capitalize on
 background knowledge to learn faster, or to learn more accurately with 
 fewer examples.  This can be considered a major portion of the problem of 
 "learning to learn".
 
//////////////////////////////////////////////////////////////////////////

The following list of reference material is not meant to be complete.  
Please forgive any errors and send corrections and any new references to 
dsilver@csd.uwo.ca.


PROBLEMS WITH KNOWLEDGE/REPRESENTATION TRANSFER
========================================================================
Noel E. Sharkey and Amanda J.C. Sharkey, 1994: Understanding catastrophic 
interference in neural nets, Department of Computer Science Research Report 
CS-94-4, University of Sheffield, UK.

Noel E. Sharkey and Amanda J.C. Sharkey, 1994: Interference and discrimination 
in neural net memory, Department of Computer Science Research Report CS-94-?,
University of Sheffield, UK. 

Noel Sharkey, John Neary, and Amanda Sharkey, 1995: Searching weight space
for backpropagation solution types, Department of Computer Science Research
Report CS-95-?, University of Sheffield, UK.

Leonard Hammey, 1995:  Analysis of the error surface of the XOR network 
with two hidden nodes, Department of Computer Science Computing Report 95/167C,i
Macquarie University, Australia, February, 1995.

John F. Kolen and Jordan Pollack, 1990: Scenes from Exclusive-Or: Back 
Propagation is Sensitive to Initial Conditions, Proceedings of the Twelfth 
Annual Conference of the Cognitive Science Society, July, 1990, Cambridge, MA.

Ken McRae and P. A. Hetherington, 1993:  Catastrophic interference is
eliminated in pretrained networks.  In Proceedings of the Fifteenth
Annual Conference of the Cognitive Science Society, p. 723-728.
Hillsdale NJ: Erlbaum.


ADVANCES IN TRANSFER TECHNIQUES 
========================================================================
Murre, J.M.J. (in press): Transfer of learning in backpropagation and in
related neural network models. To appear in J. Levy, D. Bairaktaris, J.
Bullinaria, & P. Cairns (Eds.), Connectionist Models of Memory and
Language. London: UCL Press. 

Lorien Y. Pratt, Jack Mostow, and Candace A. Kamm, 1991: Direct Transfer
of Learned Information among Neural Networks Proceedings of the Ninth
National Conference on Artificial Intelligence (AAAI-91).. AAAI, July 1991
Pages 584-589.  

L. Y. Pratt, 1993: Discriminability-Based transfer between neural networks.
In C. L. Giles and S. J. Hanson and J.D.  Cowan, editors, Advances in Neural 
Information Processing Systems 5, Morgan Kaufmann, San Mateo, CA, 1993. 
Pages 204-211.  

Lorien Y. Pratt, 1993: Non-literal transfer Among Neural Network Learners. 
In R.J. Mammone, editor, Artificial Neural Networks for Speech and Vision,
Chapman & Hall, 1994. Pages 143-169 

Lorien Y. Pratt, 1994: Experiments on the transfer of knowledge between
neural networks. In S. Hanson, G. Drastal, and R. Rivest, editors, 
Computational Learning Theory and Natural Learning Systems, Constraints
and Prospects, MIT Press, 1994. Pages 523-560 

L. Y. Pratt and A. N. Christensen, 1994: Relaxing the hyperplane assumption
in the analysis and modification of back-propagation networks. In Robert
Trappl, ed., Cybernetics and Systems '94 . World Scientific, Singapore, 1994.
Pages 1711-1718. 

L. Y. Pratt and V. I. Gough, 1994: Improving discriminability based transfer
by modifying the IM metric to use sigmoidal activations. In Robert Trappl,
ed., Cybernetics and Systems '94 . World Scientific, Singapore, 1994. Pages
1719-1726.  


KNOWLEDGE BASED METHODS 
========================================================================
Jude W. Shavlik and Geoffrey G. Towell, 1989: Combining Explanation-based 
learning and Artificial Neural Networks, June, 1989, Proceedings of the 
Sixth International Workshop on Machine Learning, Cornell University, Morgan 
Kaufmann, Palo Alto, CA 94303-9953.

Jude W. Shavlik and Geoffrey G. Towell, 1989: Combining Explanation-Based and 
Neural Learning: An algorithm and Empirical Results, June, 1989, Department of 
Computer Science, University of Wisconsin, Madison, Wisconsin.

Jude W. Shavlik, 1989: Acquiring Recursive Concepts with Explanation-Based 
Learning, August, 1989, Proceedings of the Eleventh International Joint 
Conference on Artificial Intelligence, Morgan Kaufmann, 2929 Campus Drive, 
Suite 260, San Mateo, CA  94403.

Jude Shavlik, 1992: Integrating Explanatory and Neural Approaches to Machine 
Learning; in Computational Learning Theory and Natural Learning Systems, 
Constraints and Prospects, editors: S. Hanson and G. Drastal and R. Rivest, 
MIT Press, 1992. 

Geoffrey G. Towell and Jude W. Shavlik, 1992: Interpretation of Artificial 
Neural Networks: Mapping Knowledge-Based Neural Networks into Rules,
Advances in Neural Information Processing Systems 4, San Mateo, CA,
Morgan Kaufmann, August 21, 1991, p. 977--984.

G. G. Towell and J. W. Shavlik, 1993: Extracting Refined Rules from 
Knowledge-Based Neural Networks, Machine Learning,  13, p. 71-101, 
December 8, 1993.

Tom Mitchell and Sebastian Thrun, 1993:  Explanation based neural
network learning for robot control, NIPS 5 pp 287-294, 1993.

Tom Fawcett and Paul Utgoff, 1993: Automatic feature generation
for Problem Solving Systems, COINS Tech-Report 92-9, 1992.

Steven Suddarth and Y Kergoisien, 1990: Rule injection hints as a means of
improving network performance and learning time, Proceedings
of the EURASIP workshop on Neural Networks, 1990.


SEQUENTIAL/COMPOSITIONAL LEARNING 
========================================================================
Satinder P. Singh, 1992: Transfer of learning by composing solutions for 
elemental sequential tasks, Machine Learning, 1992.

Satinder .P. Singh, 1994: The efficient learning of multiple task sequences,
Machine Learning, Dept. of Computer Science, Univ. of Mass.

R.A. Jacobs, 1990: Task decomposition through competition in a modular
connectionist architecture.  PhD thesis, COINS Department, University of
Massachusetts, Amherst, Mass.


META-LEARNING I - LEARNING REPRESENTATION 
========================================================================
James L. McClelland and Bruce L. McNaughton and Randall C. O'Reilly, 1994: 
Why there are complementary learning systems in the hippocampus and neocortex:
Insights from the successes and failures of connectionist models of learning 
and memory, Technical Report PDP.CNS.94.1, Department of Psychology, 
Carnegie Mellon University, Pittsurgh, PA.
Jonathan Baxter, 1995: Learning internal representations, PhD Thesis,
Department of Mathematics and Statistics, The Flinders University of South
Australia, 1995.

Jonathan Baxter, 1995: Learning internal representations, Proceedings of the
Eighth International Conference on Computational Learning Theory, Santa Cruz,
CA, 1995, ACM Press (to appear).

Jonathan Baxter, 1995: The canonical metric for vector quantisation.  Submitted
to Information and Computation, 1995.

Jonathan Baxter, 1992: The evolution of learning algorithms for artificial
neural networks, Complex Systems, IOS Press, 1992.

Daniel L. Silver, 1994:  The retention and transfer of classifier task 
knowledge in artificial neural networks.  Proceedings of the UWORCS Conference,
Department of Computer Science, University of Western Ontario,
September, 1994.

Daniel L. Silver, 1995:  Toward a model of consolidation: The retention and
transfer of neural net task knowledge.  Submitted to NIPS'95.

Robert M. French, 1991: Using semi-distributed representations to overcome
catastrophic forgetting in connectionist networks, CRCC Technical Report 
51-1991, Center for research on Concepts and Cognition, Indiana Univeristy.

Robert M. French, 1994: Interactive tandem networks and the sequential
learning problem,  CRCC Technical Report, Center for Research on Concepts 
and Cognition, Indiana University.

Robert M. French, 1994: Dynamically constraining connectionist networks to 
produce distributed, orthogonal representations to reduce catastrophic 
interference, Proceedings of the 16th Annual Cognitive Science Society
Conference, 1994.

Jurgen H. Schmidhuber, 1994: On learning how to learn learning strategies,
Technical Report FKI-198-94, Fakultat fur Informatik, Technische Univeristat
Munchen, Germany, Januray, 1995.

J. H. Schmidhuber, 1993: A neural network that embeds its own meta-levels,
Proc. of the International Conference on Neural Networks '93, San Francisco,
IEEE, 1993.

J. H. Schmidhuber, 1993: A self-referential weight matrix,
Proceedings of the International Conference on Artificial Neural Networks, 
Amsterdam, Springer, 1993, p. 446-451.

J. H. Schmidhuber, 1987: Evolutionary principles in self-referential learning,
or on learning how to learn: the meta-meta-... hook, Institut fur Informatik,
Technical Report, Technische Universitat Munchen, 1987.


META-LEARNING II - LEARNING SEARCH 
========================================================================
Sebastian Thrun and Tom M.Mitchell, 1993:  Lifelong Robot Learning, 
Technical Report IAI-TR-93-7, Institute for Informatics III, University of
Bonn, Germany, July, 1993.

Sebastian Thrun, 1994: A Lifelong Learning Perspective for Mobile Robot Control,
Proceedings of the IEEE Conference on Intelligent Robots and Systems, IEEE,
September 12-16, 1994, (to appear).

Sebastian Thrun and Tom M.Mitchell, 1994:  Learning on more thing, Technical
Report CMU-CS-94-184, Scholl of Computer Science, Carnegie Mellon University,
Pittsburgh, PA.

Sebastian Thrun and Anton Schwatrz, 1994: Finding structure in reinforcement
learning, accepted at NIPS'94, Denver, CO, December, 1994.

D.K. Naik and Richard J. Mammone, 1993: Learning by learning in neural networks,
Artificial Neural Networks for Speech and Vision; ed: Richard J. Mammone,
Chapman and Hall, London.

D. K. Naik and R. J. Mammone and A. Agarwal, 1992: Meta-Neural Network 
approach to learning by learning, in Intelligence Engineering Systems 
through Artificial Neural Networks, The American Society of Mechanical 
Engineers, ASME Press, 1992, vol. 2, p. 245--252.

Richard A. Caruana, 1993: Multitask Learning: A Knowledge-Based Source of 
Inductive Bias, Proceedings of the tenth international conference on machine 
learning, June, 1993, University of Massachusetts, p. 41-48.

MISCELLANEOUS:
========================================================================
Kruschke, J. K. 1993: Human category learning: Implications for back
propagation models. Connection Science, v.5, pp.3-36. 

C. Lee Giles and Christian W. Omlin, 1993: Rule Refinement with Recurrent 
Neural Networks, Proceedings of the IEEE International Conference on 
Neural Networks, San Francisco, CA, March, 1993,