Learning Language in Logic (LLL)

[1]
James Cussens. Issues in learning language in logic. In Antonis C. Kakas and Fariba Sadri, editors, Computational Logic: Logic Programming and Beyond, volume 2408 of LNAI, pages 491-505. Springer, Berlin, 2002.
Selected issues concerning the use of logical representations in machine learning of natural language are discussed. It is argued that the flexibility and expressivity of logical representations are particularly useful in more complex natural language learning tasks. A number of inductive logic programming (ILP) techniques for natural language are analysed including the CHILL system, abduction and the incorporation of linguistic knowledge, including active learning. Hybrid approaches integrating ILP with manual development environments and probabilistic techniques are advocatd.

[2]
Stephen Pulman and James Cussens. Grammar learning using Inductive Logic Programming. Oxford University Working Papers in Linguistics, Philology and Phonetics, 6:31-45, May 2001.
This paper gives a brief introduction to a particular machine learning method known as inductive logic programming. It is argued that this method, unlike many current statistically based machine learning methods, implies a view of grammar learning that bears close affinity to the views linguists have of the logical problem of language acquisition. Two experiments in grammar learning using this technique are described, using a unification grammar formalism, and positive-only data.

[3]
James Cussens and Stephen Pulman. Incorporating linguistics constraints into inductive logic programming. In Proceedings of CoNLL2000 and LLL2000, pages 184-193, Lisbon, September 2000. ACL.
We report work on effectively incorporating linguistic knowledge into grammar induction. We use a highly interactive bottom-up inductive logic programming (ILP) algorithm to learn `missing' grammar rules from an incomplete grammar. Using linguistic constraints on, for example, head features and gap threading, reduces the search space to such an extent that, in the small-scale experiments reported here, we can generate and store all candidate grammar rules together with information about their coverage and linguistic properties. This allows an appealingly simple and controlled method for generating linguistically plausible grammar rules. Starting from a base of highly specific rules, we apply least general generalisation and inverse resolution to generate more general rules. Induced rules are ordered, for example by coverage, for easy inspection by the user and at any point, the user can commit to a hypothesised rule and add it to the grammar. Related work in ILP and computational linguistics is discussed.

[4]
Sašo Džeroski, James Cussens, and Suresh Manandhar. An introduction to inductive logic programming and learning language in logic. In James Cussens and Sašo Džeroski, editors, Learning Language in Logic, volume 1925 of LNAI. Springer, 2000.
This chapter introduces Inductive Logic Programming (ILP) and Learning Language in Logic (LLL). No previous knowledge of logic programming, ILP or LLL is assumed. Elementary topics are covered and more advanced topics are discussed. For example, in the ILP section we discuss subsumption, inverse resolution, least general generalisation, relative least general generalisation, inverse entailment, saturation, refinement and abduction. We conclude with an overview of this volume and pointers to future work.

[5]
James Cussens and Stephen Pulman. Experiments in inductive chart parsing. In James Cussens and Sašo Džeroski, editors, Learning Language in Logic, volume 1925 of LNAI. Springer, 2000.
We use Inductive Logic Programming (ILP) within a chart-parsing framework for grammar learning. Given an existing grammar G, together with some sentences which G can not parse, we use ILP to find the ``missing'' grammar rules or lexical items. Our aim is to exploit the inductive capabilities of chart parsing, i.e. the ability to efficiently determine what is needed for a parse. For each unparsable sentence, we find actual edges and *needed edges*: those which are needed to allow a parse. The former are used as background knowledge for the ILP algorithm (P-Progol) and the latter are used as examples for the ILP algorithm. We demonstrate our approach with a number of experiments using context-free grammars and a feature grammar.

[6]
James Cussens and Stephen Pulman. Experiments in inductive chart parsing. In James Cussens, editor, Learning Language in Logic Workshop Notes (LLL99), pages 72-83, Bled, Slovenia, 1999.
We use Inductive Logic Programming (ILP) within a chart-parsing framework for grammar learning. Given an existing grammar G, together with some sentences which G can not parse, we use ILP to find the ``missing'' grammar rules or lexical items. Our aim is to exploit the inductive capabilities of chart parsing, i.e. the ability to efficiently determine what is needed for a parse. For each unparsable sentence, we find actual edges and *needed edges*: those which are needed to allow a parse. The former are used as background knowledge for the ILP algorithm (P-Progol) and the latter are used as examples for the ILP algorithm. We demonstrate our approach with a number of experiments using context-free grammars and a feature grammar.

[7]
James Cussens, Sašo Džeroski, and Tomaž Erjavec. Morphosyntactic tagging of Slovene using Progol. In Sašo Džeroski and Peter Flach, editors, Inductive Logic Programming: Proc. of the 9th International Workshop (ILP-99), Bled, Slovenia, June 1999. Springer-Verlag.
We consider the task of tagging Slovene words with morphosyntactic descriptions (MSDs). MSDs contain not only part-of-speech information but also attributes such as gender and case. In the case of Slovene there are 2,083 possible MSDs. P-Progol was used to learn morphosyntactic disambiguation rules from annotated data (consisting of 161,314 examples) produced by the MULTEXT-East project. P-Progol produced 1,148 rules taking 36 hours. Using simple grammatical background knowledge, e.g. looking for case disagreement, P-Progol induced 4,094 clauses in eight parallel runs. These rules have proved effective at detecting and explaining incorrect MSD annotations in an independent test set, but have not so far produced a tagger comparable to other existing taggers in terms of accuracy.

[8]
James Cussens. Notes on inductive logic programming methods in natural language processing (European work). Manuscript, August 1998.
The aim of these notes is to analyse ILP methods which have been applied to NLP, drawing exclusively on work conducted in Europe.

[9]
James Cussens. Part-of-speech tagging using Progol. In Inductive Logic Programming: Proceedings of the 7th International Workshop (ILP-97). LNAI 1297, pages 93-108. Springer, 1997.
A system for `tagging' words with their part-of-speech (POS) tags is constructed. The system has two components: a lexicon containing the set of possible POS tags for a given word, and rules which use a word's context to eliminate possible tags for a word. The Inductive Logic Programming (ILP) system Progol is used to induce these rules in the form of definite clauses. The final theory contained 885 clauses. For background knowledge, Progol uses a simple grammar, where the tags are terminals and predicates such as nounp (noun phrase) are nonterminals. Progol was altered to allow the caching of information about clauses generated during the induction process which greatly increased efficiency. The system achieved a per-word accuracy of 96.4% on known words drawn from sentences without quotation marks. This is on a par with other tagging systems induced from the same data [DaeZavBerGil96-WVLC96,Bri94-AAAI94,CutKupPedSib92-ANLP92] which all have accuracies in the range 96-97%. The per-sentence accuracy was 49.5%.

[10]
James Cussens, David Page, Stephen Muggleton, and Ashwin Srinivasan. Using Inductive Logic Programming for Natural Language Processing. In W. Daelemans, T. Weijters, and A. van der Bosch, editors, ECML'97 -- Workshop Notes on Empirical Learning of Natural Language Tasks, pages 25-34, Prague, 1997. University of Economics. Invited keynote paper.
We summarise recent work on using Inductive Logic Programming (ILP) for Natural Language Processing (NLP). ILP performs learning in a first-order logical setting, and is thus well-suited to induce over the various structured representations used in NLP. We present Stochastic Logic Programs (SLPs) and demonstrate their use in ILP when learning from positive examples only. We also give accounts of work on learning grammars from children's books and part-of-speech tagging.

[11]
James Cussens. Part-of-speech disambiguation using ILP. Technical Report PRG-TR-25-96, Oxford University Computing Laboratory, 1996.


Inductive Logic Programming (non-LLL)

[1]
James Cussens. Using prior probabilities and density estimation for relational classification. In David Page, editor, Inductive Logic Programming: Proceedings of the 8th International Conference (ILP-98), volume 1446 of Lecture Notes in Artificial Intelligence, pages 106-115. Springer, 1998.

[2]
James Cussens. Bayesian Inductive Logic Programming with explicit probabilistic bias. Technical Report PRG-TR-24-96, Oxford University Computing Laboratory, 1996.

[3]
James Cussens. Review of ``Interactive Theory Revision: An Inductive Logic Programming Approach''. Journal of Logic and Computation, 1994.


Machine Learning (non-ILP)

[1]
David Page, Fenghuang Zhan, James Cussens, Michael Waddell, Johanna Hardin, Bart Barlogie, and John Shaughnessy, Jr. Comparative data mining for microarrays: A case study based on multiple myeloma. Technical Report 1453, Computer Sciences Department, University of Wisconsin, Nov 2002.
Supervised machine learning and data mining tools have become popular for the analysis of gene expression microarray data. They have the potential to uncover new therapeutic targets for diseases, to predict how patients will respond to specific treatments, and to uncover regulatory relationships among genes in normal and disease situations. Comparative experiments are needed to identify the advantages of the leading supervised learning algorithms for microarray data, as well as to give direction in methodological decisions. This paper compares support vector machines, Bayesian networks, decision trees, boosted decision trees, and voting (ensembles of decision stumps) on a new microarray data set for cancer with over 100 samples. The paper provides evidence for several important lessons for mining microarray data, including: (1) Bayes nets and ensembles perform at least as well as other approaches but arguably provide more direct insight; (2) the common practice of throwing out low or negative average differences, or those accompanied by an absent call, is a mistake; (3) looking for consistent differences in expression may be more important than large differences.

[2]
James Cussens. Machine learning. IEE Journal of Computing and Control, 7(4):164-168, August 1996.

[3]
James Cussens. A Bayesian analysis of algorithms for learning finite functions. In Armand Prieditis and Stuart Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference (ML95), pages 142-149, San Francisco, CA, 1995. Morgan Kaufmann Publishers.
We consider algorithms for learning functions f: X rightarrow Y, where X and Y are finite, and there is assumed to be no noise in the data. Learning algorithms, alg , are connected with galg , the set of prior probability distributions for which they are optimal. A method for constructing galg from alg is given and the relationship between the various galg is discussed. Improper algorithms are identified as those for which galg has zero volume. Improper algorithms are investigated using linear algebra and two examples of improper algorithms are given. This framework is then applied to the question of choosing between competing algorithms. ``Leave-one-out'' cross-validation is hence characterised as a crude method of ML-II prior selection. We conclude by examining how the mathematical results bear on practical problems and by discussing related work, as well as suggesting future work.

[4]
James Cussens. A Bayesian engineering approach to building limit protection systems. ISSAFE project report, Glasgow Caledonian University, August 1995.

[5]
James Cussens. Using MARS to learn the ILS task. ISSAFE project report ISSAFE 9039/1B/GCUC/3031/R/A, Glasgow Caledonian University, August 1995.

[6]
James Cussens. Industrial applications of decision trees. ISSAFE project report ISSAFE 9039/1B/GCUC/3034/R/A, Glasgow Caledonian University, May 1995.

[7]
James Cussens. Behaviour cloning approaches to limit protection. ISSAFE project report ISSAFE 9039/1/GCUC/3018/R/A, Glasgow Caledonian University, April 1995.

[8]
James Cussens. Some short descriptions of important learning algorithms. ISSAFE project report ISSAFE 9039/11/GCUC/3027/R/A, Glasgow Caledonian University, February 1995.

[9]
James Cussens. Approaches to `validation' in machine learning: A short survey. ISSAFE project report ISSAFE 9039/1B/GCUC/3017/R/A, Glasgow Caledonian University, July 1994.

[10]
James Cussens. Behaviour cloning in dynamic environments. ISSAFE project report, Glasgow Caledonian University, October 1994.

[11]
James Cussens. Using machine learning to control a pole and cart simulator. ISSAFE project report ISSAFE 9039/2/GCUC/3009/R/1, Glasgow Caledonian University, July 1994.

[12]
James Cussens. Bayes and pseudo-Bayes estimates of conditional probability and their reliability. In Pavel B. Brazdil, editor, Machine Learning: ECML-93, pages 136-152. Lecture Notes in Artificial Intelligence 667, Springer-Verlag, 1993.
Various ways of estimating probabilities, mainly within the Bayesian framework, are discussed. Their relevance and application to machine learning is given, and their relative performance empirically evaluated. A method of accounting for noisy data is given and also applied. The reliability of estimates is measured by a significance measure, which is also empirically tested. We briefly discuss the use of likelihood ratio as a significance measure.

[13]
James Cussens. Estimating rule accuracies from training data. ECAI-92 Workshop Notes, 1992.


Uncertainty in AI

[1]
James Cussens. Individuals, relations and structures in probabilistic models. In Lise Getoor and David Jensen, editors, IJCAI Workshop on Learning Statistical Models from Relational Data (SRL2003), pages 32-36, Acapulco, Mexico, August 2003.
Relational data is equivalent to non-relational structured data. It is this equivalence which permits probabilistic models of relational data. Learning of probabilistic models for relational data is possible because one item of structured data is generally equivalent to many related data items. Succession and inclusion are two relations that have been well explored in the statistical literature. A description of the relevant statistical approaches is given. The representation of relational data via Bayesian nets is examined, and compared with PRMs. The paper ends with some cursory remarks on structured objects.

[2]
Vítor Santos Costa, David Page, Maleeha Qazi, and James Cussens. CLP( cal BN): Constraint logic programming for probabilistic knowledge. In Uffe Kjærulff and Christopher Meek, editors, Proceedings of the Nineteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-2003), pages 517-524, Acapulco, Mexico, 2003. Morgan Kaufmann.
In Datalog, missing values are represented by Skolem constants. More generally, in logic programming missing values, or existentially-quantified variables, are represented by terms built from Skolem functors. In an analogy to probabilistic relational models (PRMs), we wish to represent the joint probability distribution over missing values in a database or logic program using a Bayesian network. This paper presents an extension of logic programs that makes it possible to specify a joint probability distribution over terms built from Skolem functors in the program. Our extension is based on constraint logic programming (CLP), so we call the extended language CLP( cal BN). We show that CLP( cal BN) subsumes PRMs; this greater expressivity carries both advantages and disadvantages for CLP( cal BN). We also show that algorithms from inductive logic programming (ILP) can be used with only minor modification to learn CLP( cal BN) programs. An implementation of CLP( cal BN) is publicly available as part of YAP Prolog at http://www.cos.ufrj.br/~vitor/Yap/clpbn.

[4]
Nicos Angelopoulos and James Cussens. Prolog issues of an MCMC algorithm. In U. Geske, O. Bartenstein, M. Hannebauer, and O. Yoshie, editors, Web-Knowledge Management and Decision Support - Selected Papers from the 14th International Conference on Applications of Prolog, volume 2543 of LNAI, pages 191-202. Springer, Berlin, 2002.

[5]
James Cussens. Integrating probabilistic and logical reasoning. In David Corfield and Jon Williamson, editors, Foundations of Bayesianism, volume 24 of Applied Logic Series, pages 241-260. Kluwer, Dordrecht, 2001. This is an updated version of the ETAI paper of the same name.

[6]
Nicos Angelopoulos and James Cussens. Prolog issues of an MCMC algorithm. In Proceedings of the 14th International Conference of Applications of Prolog, pages 246-253, Tokyo, October 2001.

[7]
Nicos Angelopoulos and James Cussens. Markov chain Monte Carlo using tree-based priors on model structure. In Jack Breese and Daphne Koller, editors, Proceedings of the Seventeenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-2001), Seattle, August 2001. Morgan Kaufmann.
We present a general framework for defining priors on model structure and sampling from the posterior using the Metropolis-Hastings algorithm. The key ideas are that structure priors are defined via a probability tree and that the proposal distribution for the Metropolis-Hastings algorithm is defined using the prior, thereby defining a cheaply computable acceptance probability. We have applied this approach to Bayesian net structure learning using a number of priors and proposal distributions. Our results show that these must be chosen appropriately for this approach to be successful.

[8]
James Cussens. Statistical aspects of stochastic logic programs. In Tommi Jaakkola and Thomas Richardson, editors, Artificial Intelligence and Statistics 2001: Proceedings of the Eighth International Workshop, pages 181-186, Key West, Florida, January 2001. Morgan Kaufmann.
Stochastic logic programs (SLPs) and the various distributions they define are presented with a stress on their characterisation in terms of Markov chains. Sampling, parameter estimation and structure learning for SLPs are discussed. The application of SLPs to Bayesian learning, computational linguistics and computational biology are considered. Lafferty's Gibbs-Markov models are compared and contrasted with SLPs.

[9]
James Cussens. Parameter estimation in stochastic logic programs. Machine Learning, 44(3):245-271, 2001.
Stochastic logic programs (SLPs) are logic programs with labelled clauses which define a log-linear distribution over refutations of goals. The log-linear distribution provides, by marginalisation, a distribution over variable bindings, allowing SLPs to compactly represent quite complex distributions. We analyse the fundamental statistical properties of SLPs addressing issues concerning infinite derivations, `unnormalised' SLPs and impure SLPs. After detailing existing approaches to parameter estimation for log-linear models and their application to SLPs, we present a new algorithm called failure-adjusted maximisation (FAM). FAM is an instance of the EM algorithm that applies specifically to normalised SLPs and provides a closed-form for computing parameter updates within an iterative maximisation approach. We empirically show that FAM works on some small examples and discuss methods for applying it to bigger problems.

[10]
James Cussens. Stochastic logic programs: Sampling, inference and applications. In Proceedings of the Sixteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-2000), pages 115-122, San Francisco, CA, 2000. Morgan Kaufmann.
Algorithms for exact and approximate inference in stochastic logic programs (SLPs) are presented, based respectively, on variable elimination and importance sampling. We then show how SLPs can be used to represent prior distributions for machine learning, using (i) logic programs and (ii) Bayes net structures as examples. Drawing on existing work in statistics, we apply the Metropolis-Hasting algorithm to construct a Markov chain which samples from the posterior distribution. A Prolog implementation for this is described. We also discuss the possibility of constructing explicit representations of the posterior.

[11]
James Cussens. Attribute-value and relational learning: A statistical viewpoint. In Luc De Raedt and Stefan Kramer, editors, Proceedings of the ICML-2000 Workshop on Attribute-Value and Relational Learning: Crossing the Boundaries, pages 35-39, 2000.
In this extended abstract, rather than crossing the boundary between attribute-value and relational learning, we place ourselves above any such boundary and look down on the problem from the point of view of general principles of statistical inference. We do not pretend that this paper gives a full account of all relevant issues, but argue that starting from this generalised viewpoint and working down towards actual learning problems (e.g. decision tree learning, regression, ILP, etc) makes it easier to find the essential contrasts and similarities between different learning problems. Our primary goal (not achieved here) is to abstract away from superficial issues, such as the concrete syntactic representation of a problem or worse the sociological origin of an approach.

[12]
James Cussens. Loglinear models for first-order probabilistic reasoning. In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 126-133, San Francisco, CA, 1999. Morgan Kaufmann Publishers.
Recent work on loglinear models in probabilistic constraint logic programming is applied to first-order probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) composed of labelled and unlabelled definite clauses to define the proof probabilities. We have a conservative extension of first-order reasoning, so that, for example, there is a one-one mapping between logical and random variables. We show how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data. We also compare the presented framework with other approaches to first-order probabilistic reasoning.

[13]
James Cussens. Integrating probabilistic and logical reasoning. Electronic Transactions on Artificial Intelligence, 3(B):79-103, 1999. Selected Articles from the Machine Intelligence 16 Workshop.
We examine the vexed question of connections between logical and probabilistic reasoning. The reasons for making such a connection are examined. We give an account of recent work which uses loglinear models to make the connection. We conclude with an analysis of various existing approaches combining logic and probability.

[14]
James Cussens, Anthony Hunter, and Ashwin Srinivasan. Generating explicit orderings for non-monotonic logics. In Proc. of the Eleventh National Conference on Artificial Intelligence (AAAI-93), pages 420-425. MIT Press, 1993.

[15]
James Cussens and Anthony Hunter. Using maximum entropy in a defeasible logic with probabilistic semantics. In B. Bouchon-Meunier, L. Valverde, and R.R. Yager, editors, IPMU'92 - Advanced Techniques in Artificial Intelligence, pages 43-52. Lecture Notes in Computer Science 682, Springer-Verlag, 1993.

[16]
James Cussens and Anthony Hunter. Using defeasible logic for a window on a probabilistic database: some preliminary notes. In R. Kruse and P. Seigel, editors, Symbolic and Quantitative Approaches for Uncertainty, pages 146-152. Lecture Notes in Computer Science 548, Springer-Verlag, 1991.


Statistics


Philosophy

[1]
James Cussens. Deductive reasoning and statistical inference. In Brian Everitt and David C. Howell, editors, Encyclopedia of Statistics in Behavioral Science. Wiley, 2005. To appear.
Connections and contrasts between deductive reasoning and statistical inference are given. Bayesian statistical inference is analysed, including an account of its relation to deductive logic and the status of prior distributions. Classical statistical inference is investigated with a focus on hypothesis testing.

[2]
James Cussens. Leibniz and Boole on logic and probability. Unpublished and unsubmitted, so far!, 2002.
Combined logical-probabilistic frameworks are very old; often the result of grand attempts to formulate a calculus of general reasoning. A careful account of the assumptions and achievements of these systems can help us address issues in contemporary logical-probabilistic frameworks. This paper aims to provide such an account drawing on the pioneering work of Leibniz and Boole. In Leibniz, we find the first account of epistemic probability in terms of possible worlds. In Boole, we see the extent to which a combined logical-probabilistic calculus is possible using a propositional representation.

[3]
James Cussens. Deduction, induction and probabilistic support. Synthese, 108(1):1-10, July 1996.
Elementary results concerning the connections between deductive relations and probabilistic support are given. These are used to show that Popper-Miller's result is a special case of a more general result, and that their result is not ``very unexpected'' as claimed. According to Popper-Miller, a purely inductively supports b only if they are ``deductively independent''---but this means that neg a vdash b. Hence, it is argued that viewing induction as occurring only in the absence of deductive relations, as Popper-Miller sometimes do, is untenable. Finally, it is shown that Popper-Miller's claim that deductive relations determine probabilistic support is untrue. In general, probabilistic support can vary greatly with fixed deductive relations as determined by the relevant Lindenbaum algebra.

[4]
James Cussens. Interpretations of Probability, Nonstandard Analysis and Confirmation Theory. PhD thesis, King's College, London, 1991.
The first chapter presents Bayesian confirmation theory. We then construct infinitesimal numbers and use them to represent the probability of unrefuted hypotheses of standard probability zero. Popper's views on the nature of hypotheses, of probability and confirmation are criticised. It is shown that Popper conflates total confirmation with weight of evidence. It is argued that Popper's corroboration can be represented in a Bayesian formalism. Popper's propensity theory is discussed. A modified propensity interpretation is presented where probabilities are defined relative to descriptions of generating conditions. The logical interpretation is briefly discussed and rejected. A Bayesian account of estimating the values of objective probabilities is given, and some of its properties are proved. Belief functions are then compared with probabilities. It is concluded that belief functions offer a more elegant representation of the impact of evidence. Both measures are then discussed in relation to various betting procedures designed to elicit their values from an individual's belief state. De Finetti's arguments concerning `coherence' are discussed. It is then shown that it is not possible to use bets to derive belief function values unless the better is allowed to vary the amount of the stake. Hume's thinking on induction is discussed. It is argued that some of the problems of Popper's philosophy derive from Hume's. The Popper-Miller argument is presented and criticised. It is concluded that the core of the argument is valid, but of limited applicability. The correspondence between probabilistic support and deductive relations is discussed. There are two appendices. The first criticises Popper's view on the connection between the content and testability of a hypothesis. The second concerns a nonstandard probability measure proposed in 1967.

[5]
James Cussens. Review of ``Chance and Structure: An Essay on the Logical Foundations of Probability''. History and Philosophy of Logic, 11(1):116-117, 1990.