[Cussens, Forthcoming]
James Cussens. Leibniz on probability and statistics. In Maria Rosa Antognazza, editor, Oxford Handbook on Leibniz. Oxford University Press, Forthcoming.

[Cussens et al., 2013]
James Cussens, Mark Bartlett, Elinor M. Jones, and Nuala A. Sheehan. Maximum likelihood pedigree reconstruction using integer linear programming. Genetic Epidemiology, 37(1):69-83, Janary 2013.
Large population biobanks of unrelated individuals have been highly successful in detecting common genetic variants affecting diseases of public health concern. However, they lack the statistical power to detect more modest gene-gene and gene-environment interaction effects or the effects of rare variants for which related individuals are ideally required. In reality, most large population studies will undoubtedly contain sets of undeclared relatives, or pedigrees. Although a crude measure of relatedness might sometimes suffice, having a good estimate of the true pedigree would be much more informative if this could be obtained efficiently. Relatives are more likely to share longer haplotypes around disease susceptibility loci and are hence biologically more informative for rare variants than unrelated cases and controls. Distant relatives are arguably more useful for detecting variants with small effects because they are less likely to share masking environmental effects. Moreover, the identification of relatives enables appropriate adjustments of statistical analyses that typically assume unrelatedness. We propose to exploit an integer linear programming optimisation approach to pedigree learning, which is adapted to find valid pedigrees by imposing appropriate constraints. Our method is not restricted to small pedigrees and is guaranteed to return a maximum likelihood pedigree. With additional constraints, we can also search for multiple high-probability pedigrees and thus account for the inherent uncertainty in any particular pedigree reconstruction. The true pedigree is found very quickly by comparison with other methods when all individuals are observed. Extensions to more complex problems seem feasible.

[Cussens, 2012a]
James Cussens. Column generation for exact BN learning: Work in progress. In Proc. ECAI-2012 workshop on COmbining COnstraint solving with MIning and LEarning (CoCoMile 2012), 2012.

[Cussens, 2012b]
James Cussens. Online Bayesian inference for the parameters of PRISM programs. Machine Learning, 2012. http://dx.doi.org/10.1007/s10994-012-5305-8.
This paper presents a method for approximating posterior distributions over the parameters of a given PRISM program. A sequential approach is taken where the distribution is updated one datapoint at a time. This makes it applicable to online learning situations where data arrives over time. The method is applicable whenever the prior is a mixture of products of Dirichlet distributions. In this case the true posterior will be a mixture of very many such products. An approximation is effected by merging products of Dirichlet distributions. An analysis of the quality of the approximation is presented. Due to the heavy computational burden of this approach, the method has been implemented in the Mercury logic programming language. Initial results using a hidden Markov model and a probabilistic graph are presented.

[Cussens, 2012c]
James Cussens. An upper bound for BDeu local scores. In Proc. ECAI-2012 workshop on algorithmic issues for inference in graphical models (AIGM 2012), 2012.
An upper bound on BDeu log local scores is derived using an existing upper bound on the beta function with r variables. Two bounds on this bound are derived, one of which is suitable for pruning the search for optimal parent sets of a variable in Bayesian network learning. Empirical results concerning the tightness of bounds are given.

[Cussens, 2011a]
James Cussens. Approximate Bayesian computation for the parameters of PRISM programs. In P. Frasconi and F. A. Lisi, editors, Proc. 20th International Conference on Inductive Logic Programming (ILP 2010), pages 38-46, Firenze, 2011. Springer.

[Cussens, 2011b]
James Cussens. Bayesian network learning with cutting planes. In Fabio G. Cozman and Avi Pfeffer, editors, Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pages 153-160, Barcelona, 2011. AUAI Press.
The problem of learning the structure of Bayesian networks from complete discrete data with a limit on parent set size is considered. Learning is cast explicitly as an optimisation problem where the goal is to find a BN structure which maximises log marginal likelihood (BDe score). Integer programming, specifically the SCIP framework, is used to solve this optimisation problem. Acyclicity constraints are added to the integer program (IP) during solving in the form of cutting planes. Finding good cutting planes is the key to the success of the approach--the search for such cutting planes is effected using a sub-IP. Results show that this is a particularly fast method for exact BN learning.

[Bartlett et al., 2010a]
Mark Bartlett, Iain Bate, and James Cussens. Instruction cache prediction using Bayesian networks. In Proc. ECAI 2010 - 19th European Conference on Artificial Intelligence, pages 1099-1100, 2010. Short paper.

[Bartlett et al., 2010b]
Mark Bartlett, Iain Bate, and James Cussens. Learning Bayesian networks for improved instruction cache analysis. In Proceedings of the Ninth International Conference on Machine Learning and Applications (ICMLA 2010), pages 417-423. IEEE, 2010.

[Cussens, 2010a]
James Cussens. Induction. In Claude Sammut and Geoffrey I. Webb, editors, Encyclopedia of Machine Learning, pages 519-522. Springer, 2010.

[Cussens, 2010b]
James Cussens. Maximum likelihood pedigree reconstruction using integer programming. In Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB-10), Edinburgh, July 2010.

[Liverani et al., 2010]
Silvia Liverani, James Cussens, and Jim Q. Smith. Searching a multivariate partition space using weighted MAX-SAT. In F. Masulli, L. Peterson, and R. Tagliaferri, editors, Proceedings of the Sixth International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) 2009, LNBI 6160, pages 240-253. Springer, 2010.

[Cussens, 2009]
James Cussens. On generative parameterisations of Markov logic networks. In Proc. SRL 09, Leuven, July 2009.

[Fisher and Cussens, 2009]
Barnaby Fisher and James Cussens. Further Inductive Mercury Programming and IMP0.5. Technical Report YCS-2009-437, Dept. of Computer Science, University of York, March 2009.
We explore the use of Mercury for Inductive Logic Programming and present IMP0.5, the product of our research. Mercury is a compiled logic programming language with modern features, which requires the user to write type, mode and determinism declarations for each of their predicates. This information is used by the Mercury compiler to optimise generated code, which, amongst other things, enables Mercury to provide faster execution than Prolog. IMP0.5 is an ILP system, and an ILP software library, which contains re-usable modules of ILP related code. Our aim in creating IMP0.5 as a software library was to significantly reduce the effort required to implement new ILP algorithms. We feel this has been achieved and provide implementations of some typical ILP algorithms, the most notable being the Aleph default algorithm. Since Mercury is a purely declarative language run-time assertion of induced hypotheses is prohibited. Therefore, hypotheses are represented as ground terms, and, to enable fast cover testing, interpreted with a problem specific interpreter, which is generated at compile-time. We also use a cover set representation based on RL-Trees, which is very space efficient and thus beneficial for large searches. Empirical results are generally good, especially for complex background knowledge, and in one case shows a 56 times speed-up compared with Aleph.

[Angelopoulos and Cussens, 2008]
Nicos Angelopoulos and James Cussens. Bayesian learning of Bayesian networks with informative priors. Annals of Mathematics and Artificial Intelligence, 54(1):53-98, 2008.
This paper presents and evaluates an approach to Bayesian model averaging where the models are Bayesian nets (BNs). A comprehensive study of the literature on structural priors for BNs is conducted. A number of prior distributions are defined using stochastic logic programs and the MCMC Metropolis-Hastings algorithm is used to (approximately) sample from the posterior. We use proposals which are tightly coupled to the priors which give rise to cheaply computable acceptance probabilities. Experiments using data generated from known BNs have been conducted to evaluate the method. The experiments used 6 different BNs and varied: the structural prior, the parameter prior, the Metropolis-Hasting proposal and the data size. Each experiment was repeated three times with different random seeds to test the robustness of the MCMC-produced results. Our results show that with effective priors (i) robust results are produced and (ii) informative priors improve results significantly.

[Cussens, 2008]
James Cussens. Bayesian network learning by compiling to weighted MAX-SAT. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI 2008), pages 105-112, Helsinki, 2008. AUAI Press.
The problem of learning discrete Bayesian networks from data is encoded as a weighted MAX-SAT problem and the MaxWalkSat local search algorithm is used to address it. For each dataset, the per-variable summands of the (BDeu) marginal likelihood for different choices of parents (`family scores') are computed prior to applying MaxWalkSat. Each permissible choice of parents for each variable is encoded as a distinct propositional atom and the associated family score encoded as a `soft' weighted single-literal clause. Two approaches to enforcing acyclicity are considered: either by encoding the ancestor relation or by attaching a total order to each graph and encoding that. The latter approach gives better results. Learning experiments have been conducted on 21 synthetic datasets sampled from 7 BNs. The largest dataset has 10,000 datapoints and 60 variables producing (for the `ancestor' encoding) a weighted CNF input file with 19,932 atoms and 269,367 clauses. For most datasets, MaxWalkSat quickly finds BNs with higher BDeu score than the `true' BN. The effect of adding prior information is assessed. It is further shown that Bayesian model averaging can be effected by collecting BNs generated during the search.

[Santos Costa et al., 2008]
Vítor Santos Costa, David Page, and James Cussens. CLP( cal BN): Constraint logic programming for probabilistic knowledge. In Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton, editors, Probabilistic Inductive Logic Programming, volume 4911 of Lectures Notes in Computer Science, pages 156-188. Springer, Berlin, 2008. doi = 10.1007/978-3-540-78652-8_6.
In Datalog, missing values are represented by Skolem constants. More generally, in logic programming missing values, or existentially quantified variables, are represented by terms built from Skolem functors. The CLP( cal BN ) language represents the joint probability distribution over missing values in a database or logic program by using constraints to represent Skolem functions. Algorithms from inductive logic programming (ILP) can be used with only minor modification to learn CLP( cal BN ) programs. An implementation of CLP( cal BN ) is publicly available as part of YAP Prolog at http://www.ncc.up.pt/~vsc/Yap.

[Cussens, 2007a]
James Cussens. Bayesian classification and regression trees. Expert Update, 9(3):37-42, 2007.

[Cussens, 2007b]
James Cussens. Logic-based formalisms for statistical relational learning. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning. MIT Press, Cambridge, MA, 2007.

[Cussens, 2007c]
James Cussens. Model equivalence of PRISM programs. In Proceedings of the Dagstuhl seminar: Probabilistic, Logical and Relational Learning - A Further Synthesis, 2007.
The problem of deciding the probability model equivalence of two PRISM programs is addressed. In the finite case this problem can be solved (albeit slowly) using techniques from algebraic statistics, specifically the computation of elimination ideals and Gröbner bases. A very brief introduction to algebraic statistics is given. Consideration is given to cases where shortcuts to proving/disproving model equivalence are available.

[Fisher and Cussens, 2007]
Barnaby Fisher and James Cussens. Inductive Mercury programming. In Stephen Muggleton and Ramon Otero, editors, Inductive Logic Programming: Proceedings of the 16th International Conference (ILP-06), Santiago de Compostela, 2007. Springer.
We investigate using the Mercury language to implement and design ILP algorithms, presenting our own ILP system IMP. Mercury provides faster execution than Prolog. Since Mercury is a purely declarative language, run-time assertion of induced clauses is prohibited. Instead IMP uses a problem-specific interpreter of ground representations of induced clauses. The interpreter is used both for cover testing and bottom clause generation. The Mercury source for this interpreter is generated automatically from the user's background knowledge using MOOSE, a Mercury parser generator. Our results include some encouraging results on IMP's cover testing speed, but overall IMP is still generally a little slower than ALEPH.

[Angelopoulos and Cussens, 2006]
Nicos Angelopoulos and James Cussens. Exploiting independence for branch operations in Bayesian learning of C&RTs. In Luc De Raedt, Thomas Dietterich, Lise Getoor, and Stephen H. Muggleton, editors, Probabilistic, Logical and Relational Learning - Towards a Synthesis, number 05051 in Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 2006. Internationales Begegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany. [date of citation: 2006-01-01].

[Kazakov et al., 2006]
Dimitar Kazakov, James Cussens, and Suresh Manandhar. On the duality of semantics and syntax: The PP attachment case. Technical Report YCS-2006-409, University of York, 2006.

[Angelopoulos and Cussens, 2005a]
Nicos Angelopoulos and James Cussens. Exploiting informative priors for Bayesian classification and regression trees. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), pages 641-646, Edinburgh, 2005.
A general method for defining informative priors on statistical models is presented and applied specifically to the space of classification and regression trees. A Bayesian approach to learning such models from data is taken, with the Metropolis-Hastings algorithm being used to approximately sample from the posterior. By only using proposal distributions closely tied to the prior, acceptance probabilities are easily computable via marginal likelihood ratios, whatever the prior used. Our approach is empirically tested by varying (i) the data, (ii) the prior and (iii) the proposal distribution. A comparison with related work is given.

[Angelopoulos and Cussens, 2005b]
Nicos Angelopoulos and James Cussens. Tempering for Bayesian C&RT. In Proceedings of the 22nd International Conference on Machine Learning (ICML05), pages 17-24, Bonn, 2005.
This paper concerns the experimental assessment of tempering as a technique for improving Bayesian inference for C&RT models. Full Bayesian inference requires the computation of a posterior over all possible trees. Since exact computation is not possible Markov chain Monte Carlo (MCMC) methods are used to produce an approximation. C&RT posteriors have many local modes: tempering aims to prevent the Markov chain getting stuck in these modes. Our results show that a clear improvement is achieved using tempering.

[Cussens, 2005a]
James Cussens. Deductive reasoning and statistical inference. In Brian S. Everitt and David C. Howell, editors, Encyclopedia of Statistics in Behavioral Science, volume 1, pages 472-475. Wiley, Chichester, 2005.
Connections and contrasts between deductive reasoning and statistical inference are given. Bayesian statistical inference is analysed, including an account of its relation to deductive logic and the status of prior distributions. Classical statistical inference is investigated with a focus on hypothesis testing.

[Cussens, 2005b]
James Cussens. Integrating by separating: Combining probability and logic with ICL, PRISM and SLPs. APRIL project report, January 2005.
This report describes the close relationship that obtains between the ICL, PRISM and SLP frameworks. The common feature of these frameworks is that a purely probabilistic component and a purely logical component are connected to produce a hybrid model. A hidden Markov model (HMM) is used as a running example. The Uniqueness Condition, which allows these frameworks to represent statistical models is discussed, and the consequences of using a weakened version of the Uniqueness Condition briefly explored. `Lazy' sampling, based on SLD-resolution, is discussed.

[Angelopoulos and Cussens, 2004a]
Nicos Angelopoulos and James Cussens. Extended stochastic logic programs for informative priors over C&RTs. In Rui Camacho, Ross King, and Ashwin Srinivasan, editors, Proceedings of the work-in-progress track of the Fourteenth International Conference on Inductive Logic Programming (ILP04), pages 7-11, Porto, September 2004.
A general method for defining informative priors on statistical models is presented and applied specifically to the space of classification and regression trees. Our aim is towards a Bayesian approach to learning such models from data, with the Metropolis-Hastings algorithm used to sample from the posterior. We present some preliminary results where we empirically tested the methodology.

[Angelopoulos and Cussens, 2004b]
Nicos Angelopoulos and James Cussens. On the implementation of MCMC proposals over stochastic logic programs. In Colloquium on Implementation of Constraint and LOgic Programming Systems. Satellite workshop to ICLP'04, Saint-Malo, France, 2004.

[de Melo and Cussens, 2004]
Wolfgang David Cirilo de Melo and James Cussens. Leibniz on estimating the uncertain: An English translation of De incerti aestimatione with commentary. Leibniz Review, 14:31-53, 2004.
Leibniz's De incerti aestimatione, which contains his solution to the division problem, has not received much attention, let alone much appreciation. This is surprising because it is in this work that the definition of probability in terms of equally possible cases appears for the first time. The division problem is used to establish and test probability theory; it can be stated as follows: if two players agree to play a game in which one has to win a certain number of rounds in order to win the pool, but if they break the game off before either of them has won the required number of rounds, how should the pool be distributed? Our article has two aims: it provides the readers with the first English translation of De incerti aestimatione, and it also gives them a brief commentary that explains Leibniz's philosophical and mathematical concepts necessary in order to understand this work. The translation is as literal as possible throughout; it shows how Leibniz struggled at times to find a solution to the division problem and how he approached it from different angles. The commentary discusses Leibniz's views on four key concepts: fairness, hope, authority and possibility. The commentary then outlines how Leibniz attempted to solve the problem of division.

[Cussens, 2003]
James Cussens. Individuals, relations and structures in probabilistic models. In Lise Getoor and David Jensen, editors, IJCAI Workshop on Learning Statistical Models from Relational Data (SRL2003), pages 32-36, Acapulco, Mexico, August 2003.
Relational data is equivalent to non-relational structured data. It is this equivalence which permits probabilistic models of relational data. Learning of probabilistic models for relational data is possible because one item of structured data is generally equivalent to many related data items. Succession and inclusion are two relations that have been well explored in the statistical literature. A description of the relevant statistical approaches is given. The representation of relational data via Bayesian nets is examined, and compared with PRMs. The paper ends with some cursory remarks on structured objects.

[Santos Costa et al., 2003]
Vítor Santos Costa, David Page, Maleeha Qazi, and James Cussens. CLP( cal BN): Constraint logic programming for probabilistic knowledge. In Uffe Kjærulff and Christopher Meek, editors, Proceedings of the Nineteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-2003), pages 517-524, Acapulco, Mexico, 2003. Morgan Kaufmann.
In Datalog, missing values are represented by Skolem constants. More generally, in logic programming missing values, or existentially-quantified variables, are represented by terms built from Skolem functors. In an analogy to probabilistic relational models (PRMs), we wish to represent the joint probability distribution over missing values in a database or logic program using a Bayesian network. This paper presents an extension of logic programs that makes it possible to specify a joint probability distribution over terms built from Skolem functors in the program. Our extension is based on constraint logic programming (CLP), so we call the extended language CLP( cal BN). We show that CLP( cal BN) subsumes PRMs; this greater expressivity carries both advantages and disadvantages for CLP( cal BN). We also show that algorithms from inductive logic programming (ILP) can be used with only minor modification to learn CLP( cal BN) programs. An implementation of CLP( cal BN) is publicly available as part of YAP Prolog at http://www.cos.ufrj.br/~vitor/Yap/clpbn.

[Angelopoulos and Cussens, 2002]
Nicos Angelopoulos and James Cussens. Prolog issues of an MCMC algorithm. In U. Geske, O. Bartenstein, M. Hannebauer, and O. Yoshie, editors, Web-Knowledge Management and Decision Support - Selected Papers from the 14th International Conference on Applications of Prolog, volume 2543 of LNAI, pages 191-202. Springer, Berlin, 2002.

[Cussens, 2002a]
James Cussens. Issues in learning language in logic. In Antonis C. Kakas and Fariba Sadri, editors, Computational Logic: Logic Programming and Beyond, volume 2408 of LNAI, pages 491-505. Springer, Berlin, 2002.
Selected issues concerning the use of logical representations in machine learning of natural language are discussed. It is argued that the flexibility and expressivity of logical representations are particularly useful in more complex natural language learning tasks. A number of inductive logic programming (ILP) techniques for natural language are analysed including the CHILL system, abduction and the incorporation of linguistic knowledge, including active learning. Hybrid approaches integrating ILP with manual development environments and probabilistic techniques are advocatd.

[Cussens, 2002b]
James Cussens. Leibniz and Boole on logic and probability. Unpublished and unsubmitted, so far!, 2002.
Combined logical-probabilistic frameworks are very old; often the result of grand attempts to formulate a calculus of general reasoning. A careful account of the assumptions and achievements of these systems can help us address issues in contemporary logical-probabilistic frameworks. This paper aims to provide such an account drawing on the pioneering work of Leibniz and Boole. In Leibniz, we find the first account of epistemic probability in terms of possible worlds. In Boole, we see the extent to which a combined logical-probabilistic calculus is possible using a propositional representation.

[Page et al., 2002]
David Page, Fenghuang Zhan, James Cussens, Michael Waddell, Johanna Hardin, Bart Barlogie, and John Shaughnessy, Jr. Comparative data mining for microarrays: A case study based on multiple myeloma. Technical Report 1453, Computer Sciences Department, University of Wisconsin, Nov 2002.
Supervised machine learning and data mining tools have become popular for the analysis of gene expression microarray data. They have the potential to uncover new therapeutic targets for diseases, to predict how patients will respond to specific treatments, and to uncover regulatory relationships among genes in normal and disease situations. Comparative experiments are needed to identify the advantages of the leading supervised learning algorithms for microarray data, as well as to give direction in methodological decisions. This paper compares support vector machines, Bayesian networks, decision trees, boosted decision trees, and voting (ensembles of decision stumps) on a new microarray data set for cancer with over 100 samples. The paper provides evidence for several important lessons for mining microarray data, including: (1) Bayes nets and ensembles perform at least as well as other approaches but arguably provide more direct insight; (2) the common practice of throwing out low or negative average differences, or those accompanied by an absent call, is a mistake; (3) looking for consistent differences in expression may be more important than large differences.

[Angelopoulos and Cussens, 2001a]
Nicos Angelopoulos and James Cussens. Markov chain Monte Carlo using tree-based priors on model structure. In Jack Breese and Daphne Koller, editors, Proceedings of the Seventeenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-2001), pages 16-23, Seattle, August 2001. Morgan Kaufmann.
We present a general framework for defining priors on model structure and sampling from the posterior using the Metropolis-Hastings algorithm. The key ideas are that structure priors are defined via a probability tree and that the proposal distribution for the Metropolis-Hastings algorithm is defined using the prior, thereby defining a cheaply computable acceptance probability. We have applied this approach to Bayesian net structure learning using a number of priors and proposal distributions. Our results show that these must be chosen appropriately for this approach to be successful.

[Angelopoulos and Cussens, 2001b]
Nicos Angelopoulos and James Cussens. Prolog issues of an MCMC algorithm. In Proceedings of the 14th International Conference of Applications of Prolog, pages 246-253, Tokyo, October 2001.

[Cussens, 2001a]
James Cussens. Integrating probabilistic and logical reasoning. In David Corfield and Jon Williamson, editors, Foundations of Bayesianism, volume 24 of Applied Logic Series, pages 241-260. Kluwer, Dordrecht, 2001. This is an updated version of the ETAI paper of the same name.

[Cussens, 2001b]
James Cussens. Parameter estimation in stochastic logic programs. Machine Learning, 44(3):245-271, 2001.
Stochastic logic programs (SLPs) are logic programs with labelled clauses which define a log-linear distribution over refutations of goals. The log-linear distribution provides, by marginalisation, a distribution over variable bindings, allowing SLPs to compactly represent quite complex distributions. We analyse the fundamental statistical properties of SLPs addressing issues concerning infinite derivations, `unnormalised' SLPs and impure SLPs. After detailing existing approaches to parameter estimation for log-linear models and their application to SLPs, we present a new algorithm called failure-adjusted maximisation (FAM). FAM is an instance of the EM algorithm that applies specifically to normalised SLPs and provides a closed-form for computing parameter updates within an iterative maximisation approach. We empirically show that FAM works on some small examples and discuss methods for applying it to bigger problems.

[Cussens, 2001c]
James Cussens. Statistical aspects of stochastic logic programs. In Tommi Jaakkola and Thomas Richardson, editors, Artificial Intelligence and Statistics 2001: Proceedings of the Eighth International Workshop, pages 181-186, Key West, Florida, January 2001. Morgan Kaufmann.
Stochastic logic programs (SLPs) and the various distributions they define are presented with a stress on their characterisation in terms of Markov chains. Sampling, parameter estimation and structure learning for SLPs are discussed. The application of SLPs to Bayesian learning, computational linguistics and computational biology are considered. Lafferty's Gibbs-Markov models are compared and contrasted with SLPs.

[Pulman and Cussens, 2001]
Stephen Pulman and James Cussens. Grammar learning using Inductive Logic Programming. Oxford University Working Papers in Linguistics, Philology and Phonetics, 6:31-45, May 2001.
This paper gives a brief introduction to a particular machine learning method known as inductive logic programming. It is argued that this method, unlike many current statistically based machine learning methods, implies a view of grammar learning that bears close affinity to the views linguists have of the logical problem of language acquisition. Two experiments in grammar learning using this technique are described, using a unification grammar formalism, and positive-only data.

[Cussens, 2000a]
James Cussens. Attribute-value and relational learning: A statistical viewpoint. In Luc De Raedt and Stefan Kramer, editors, Proceedings of the ICML-2000 Workshop on Attribute-Value and Relational Learning: Crossing the Boundaries, pages 35-39, 2000.
In this extended abstract, rather than crossing the boundary between attribute-value and relational learning, we place ourselves above any such boundary and look down on the problem from the point of view of general principles of statistical inference. We do not pretend that this paper gives a full account of all relevant issues, but argue that starting from this generalised viewpoint and working down towards actual learning problems (e.g. decision tree learning, regression, ILP, etc) makes it easier to find the essential contrasts and similarities between different learning problems. Our primary goal (not achieved here) is to abstract away from superficial issues, such as the concrete syntactic representation of a problem or worse the sociological origin of an approach.

[Cussens, 2000b]
James Cussens. Stochastic logic programs: Sampling, inference and applications. In Proceedings of the Sixteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-2000), pages 115-122, San Francisco, CA, 2000. Morgan Kaufmann.
Algorithms for exact and approximate inference in stochastic logic programs (SLPs) are presented, based respectively, on variable elimination and importance sampling. We then show how SLPs can be used to represent prior distributions for machine learning, using (i) logic programs and (ii) Bayes net structures as examples. Drawing on existing work in statistics, we apply the Metropolis-Hasting algorithm to construct a Markov chain which samples from the posterior distribution. A Prolog implementation for this is described. We also discuss the possibility of constructing explicit representations of the posterior.

[Cussens and Dzersoki, 2000]
James Cussens and Saso Dzersoki, editors. Learning Language in Logic, volume 1925 of LNAI. Springer, Berlin, September 2000. LNAI State-of-the-Art Survey.

[Cussens and Frisch, 2000]
James Cussens and Alan Frisch, editors. Proceedings of the 10th Interrnational Conference on Inductive Logic Programming (ILP 2000), volume 1866 of LNAI, London, July 2000. Springer.

[Cussens and Pulman, 2000a]
James Cussens and Stephen Pulman. Experiments in inductive chart parsing. In James Cussens and Saso Dzeroski, editors, Learning Language in Logic, volume 1925 of LNAI. Springer, 2000.
We use Inductive Logic Programming (ILP) within a chart-parsing framework for grammar learning. Given an existing grammar G, together with some sentences which G can not parse, we use ILP to find the ``missing'' grammar rules or lexical items. Our aim is to exploit the inductive capabilities of chart parsing, i.e. the ability to efficiently determine what is needed for a parse. For each unparsable sentence, we find actual edges and *needed edges*: those which are needed to allow a parse. The former are used as background knowledge for the ILP algorithm (P-Progol) and the latter are used as examples for the ILP algorithm. We demonstrate our approach with a number of experiments using context-free grammars and a feature grammar.

[Cussens and Pulman, 2000b]
James Cussens and Stephen Pulman. Incorporating linguistics constraints into inductive logic programming. In Proceedings of CoNLL2000 and LLL2000, pages 184-193, Lisbon, September 2000. ACL.
We report work on effectively incorporating linguistic knowledge into grammar induction. We use a highly interactive bottom-up inductive logic programming (ILP) algorithm to learn `missing' grammar rules from an incomplete grammar. Using linguistic constraints on, for example, head features and gap threading, reduces the search space to such an extent that, in the small-scale experiments reported here, we can generate and store all candidate grammar rules together with information about their coverage and linguistic properties. This allows an appealingly simple and controlled method for generating linguistically plausible grammar rules. Starting from a base of highly specific rules, we apply least general generalisation and inverse resolution to generate more general rules. Induced rules are ordered, for example by coverage, for easy inspection by the user and at any point, the user can commit to a hypothesised rule and add it to the grammar. Related work in ILP and computational linguistics is discussed.

[Dzeroski et al., 2000]
Saso Dzeroski, James Cussens, and Suresh Manandhar. An introduction to inductive logic programming and learning language in logic. In James Cussens and Saso Dzeroski, editors, Learning Language in Logic, volume 1925 of LNAI. Springer, 2000.
This chapter introduces Inductive Logic Programming (ILP) and Learning Language in Logic (LLL). No previous knowledge of logic programming, ILP or LLL is assumed. Elementary topics are covered and more advanced topics are discussed. For example, in the ILP section we discuss subsumption, inverse resolution, least general generalisation, relative least general generalisation, inverse entailment, saturation, refinement and abduction. We conclude with an overview of this volume and pointers to future work.

[Cussens, 1999a]
James Cussens. Integrating probabilistic and logical reasoning. Electronic Transactions on Artificial Intelligence, 3(B):79-103, 1999. Selected Articles from the Machine Intelligence 16 Workshop.
We examine the vexed question of connections between logical and probabilistic reasoning. The reasons for making such a connection are examined. We give an account of recent work which uses loglinear models to make the connection. We conclude with an analysis of various existing approaches combining logic and probability.

[Cussens, 1999b]
James Cussens. Loglinear models for first-order probabilistic reasoning. In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 126-133, San Francisco, CA, 1999. Morgan Kaufmann Publishers.
Recent work on loglinear models in probabilistic constraint logic programming is applied to first-order probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) composed of labelled and unlabelled definite clauses to define the proof probabilities. We have a conservative extension of first-order reasoning, so that, for example, there is a one-one mapping between logical and random variables. We show how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data. We also compare the presented framework with other approaches to first-order probabilistic reasoning.

[Cussens and Pulman, 1999]
James Cussens and Stephen Pulman. Experiments in inductive chart parsing. In James Cussens, editor, Learning Language in Logic Workshop Notes (LLL99), pages 72-83, Bled, Slovenia, 1999.
We use Inductive Logic Programming (ILP) within a chart-parsing framework for grammar learning. Given an existing grammar G, together with some sentences which G can not parse, we use ILP to find the ``missing'' grammar rules or lexical items. Our aim is to exploit the inductive capabilities of chart parsing, i.e. the ability to efficiently determine what is needed for a parse. For each unparsable sentence, we find actual edges and *needed edges*: those which are needed to allow a parse. The former are used as background knowledge for the ILP algorithm (P-Progol) and the latter are used as examples for the ILP algorithm. We demonstrate our approach with a number of experiments using context-free grammars and a feature grammar.

[Cussens et al., 1999]
James Cussens, Saso Dzeroski, and Tomaz Erjavec. Morphosyntactic tagging of Slovene using Progol. In Saso Dzeroski and Peter Flach, editors, Inductive Logic Programming: Proc. of the 9th International Workshop (ILP-99), Bled, Slovenia, June 1999. Springer-Verlag.
We consider the task of tagging Slovene words with morphosyntactic descriptions (MSDs). MSDs contain not only part-of-speech information but also attributes such as gender and case. In the case of Slovene there are 2,083 possible MSDs. P-Progol was used to learn morphosyntactic disambiguation rules from annotated data (consisting of 161,314 examples) produced by the MULTEXT-East project. P-Progol produced 1,148 rules taking 36 hours. Using simple grammatical background knowledge, e.g. looking for case disagreement, P-Progol induced 4,094 clauses in eight parallel runs. These rules have proved effective at detecting and explaining incorrect MSD annotations in an independent test set, but have not so far produced a tagger comparable to other existing taggers in terms of accuracy.

[Cussens, 1998a]
James Cussens. Notes on inductive logic programming methods in natural language processing (European work). Manuscript, August 1998.
The aim of these notes is to analyse ILP methods which have been applied to NLP, drawing exclusively on work conducted in Europe.

[Cussens, 1998b]
James Cussens. Using prior probabilities and density estimation for relational classification. In David Page, editor, Inductive Logic Programming: Proceedings of the 8th International Conference (ILP-98), volume 1446 of Lecture Notes in Artificial Intelligence, pages 106-115. Springer, 1998.

[Cussens, 1997]
James Cussens. Part-of-speech tagging using Progol. In Inductive Logic Programming: Proceedings of the 7th International Workshop (ILP-97). LNAI 1297, pages 93-108. Springer, 1997.
A system for `tagging' words with their part-of-speech (POS) tags is constructed. The system has two components: a lexicon containing the set of possible POS tags for a given word, and rules which use a word's context to eliminate possible tags for a word. The Inductive Logic Programming (ILP) system Progol is used to induce these rules in the form of definite clauses. The final theory contained 885 clauses. For background knowledge, Progol uses a simple grammar, where the tags are terminals and predicates such as nounp (noun phrase) are nonterminals. Progol was altered to allow the caching of information about clauses generated during the induction process which greatly increased efficiency. The system achieved a per-word accuracy of 96.4% on known words drawn from sentences without quotation marks. This is on a par with other tagging systems induced from the same data [DaeZavBerGil96-WVLC96,Bri94-AAAI94,CutKupPedSib92-ANLP92] which all have accuracies in the range 96-97%. The per-sentence accuracy was 49.5%.

[Cussens et al., 1997]
James Cussens, David Page, Stephen Muggleton, and Ashwin Srinivasan. Using Inductive Logic Programming for Natural Language Processing. In W. Daelemans, T. Weijters, and A. van der Bosch, editors, ECML'97 -- Workshop Notes on Empirical Learning of Natural Language Tasks, pages 25-34, Prague, 1997. University of Economics. Invited keynote paper.
We summarise recent work on using Inductive Logic Programming (ILP) for Natural Language Processing (NLP). ILP performs learning in a first-order logical setting, and is thus well-suited to induce over the various structured representations used in NLP. We present Stochastic Logic Programs (SLPs) and demonstrate their use in ILP when learning from positive examples only. We also give accounts of work on learning grammars from children's books and part-of-speech tagging.

[Cussens, 1996a]
James Cussens. Bayesian Inductive Logic Programming with explicit probabilistic bias. Technical Report PRG-TR-24-96, Oxford University Computing Laboratory, 1996.

[Cussens, 1996b]
James Cussens. Deduction, induction and probabilistic support. Synthese, 108(1):1-10, July 1996.
Elementary results concerning the connections between deductive relations and probabilistic support are given. These are used to show that Popper-Miller's result is a special case of a more general result, and that their result is not ``very unexpected'' as claimed. According to Popper-Miller, a purely inductively supports b only if they are ``deductively independent''---but this means that neg a vdash b. Hence, it is argued that viewing induction as occurring only in the absence of deductive relations, as Popper-Miller sometimes do, is untenable. Finally, it is shown that Popper-Miller's claim that deductive relations determine probabilistic support is untrue. In general, probabilistic support can vary greatly with fixed deductive relations as determined by the relevant Lindenbaum algebra.

[Cussens, 1996c]
James Cussens. Effective sample size in a dichotomous process with noise. Communications in Statistics: Theory and Methods, 25(6):1233-1246, 1996.
The effect of noise in a dichotomous process is studied from the Bayesian viewpoint. Winkler's approximation to the posterior distribution in the presence of noise is shown to break down badly near the limits of its application. Information loss is measured using effective sample size. An account of the relationship between effective sample size/information loss and sampling data is given which differs sharply from that of previous work in this area.

[Cussens, 1996d]
James Cussens. Machine learning. IEE Journal of Computing and Control, 7(4):164-168, August 1996.

[Cussens, 1996e]
James Cussens. Part-of-speech disambiguation using ILP. Technical Report PRG-TR-25-96, Oxford University Computing Laboratory, 1996.

[Cussens, 1995a]
James Cussens. A Bayesian analysis of algorithms for learning finite functions. In Armand Prieditis and Stuart Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference (ML95), pages 142-149, San Francisco, CA, 1995. Morgan Kaufmann Publishers.
We consider algorithms for learning functions f: X rightarrow Y, where X and Y are finite, and there is assumed to be no noise in the data. Learning algorithms, alg , are connected with galg , the set of prior probability distributions for which they are optimal. A method for constructing galg from alg is given and the relationship between the various galg is discussed. Improper algorithms are identified as those for which galg has zero volume. Improper algorithms are investigated using linear algebra and two examples of improper algorithms are given. This framework is then applied to the question of choosing between competing algorithms. ``Leave-one-out'' cross-validation is hence characterised as a crude method of ML-II prior selection. We conclude by examining how the mathematical results bear on practical problems and by discussing related work, as well as suggesting future work.

[Cussens, 1995b]
James Cussens. A Bayesian engineering approach to building limit protection systems. ISSAFE project report, Glasgow Caledonian University, August 1995.

[Cussens, 1995c]
James Cussens. Behaviour cloning approaches to limit protection. ISSAFE project report ISSAFE 9039/1/GCUC/3018/R/A, Glasgow Caledonian University, April 1995.

[Cussens, 1995d]
James Cussens. Industrial applications of decision trees. ISSAFE project report ISSAFE 9039/1B/GCUC/3034/R/A, Glasgow Caledonian University, May 1995.

[Cussens, 1995e]
James Cussens. Some short descriptions of important learning algorithms. ISSAFE project report ISSAFE 9039/11/GCUC/3027/R/A, Glasgow Caledonian University, February 1995.

[Cussens, 1995f]
James Cussens. Using MARS to learn the ILS task. ISSAFE project report ISSAFE 9039/1B/GCUC/3031/R/A, Glasgow Caledonian University, August 1995.

[Cussens, 1994a]
James Cussens. Approaches to `validation' in machine learning: A short survey. ISSAFE project report ISSAFE 9039/1B/GCUC/3017/R/A, Glasgow Caledonian University, July 1994.

[Cussens, 1994b]
James Cussens. Behaviour cloning in dynamic environments. ISSAFE project report, Glasgow Caledonian University, October 1994.

[Cussens, 1994c]
James Cussens. Review of ``Interactive Theory Revision: An Inductive Logic Programming Approach''. Journal of Logic and Computation, 1994.

[Cussens, 1994d]
James Cussens. Using machine learning to control a pole and cart simulator. ISSAFE project report ISSAFE 9039/2/GCUC/3009/R/1, Glasgow Caledonian University, July 1994.

[Cussens, 1993]
James Cussens. Bayes and pseudo-Bayes estimates of conditional probability and their reliability. In Pavel B. Brazdil, editor, Machine Learning: ECML-93, pages 136-152. Lecture Notes in Artificial Intelligence 667, Springer-Verlag, 1993.
Various ways of estimating probabilities, mainly within the Bayesian framework, are discussed. Their relevance and application to machine learning is given, and their relative performance empirically evaluated. A method of accounting for noisy data is given and also applied. The reliability of estimates is measured by a significance measure, which is also empirically tested. We briefly discuss the use of likelihood ratio as a significance measure.

[Cussens and Hunter, 1993]
James Cussens and Anthony Hunter. Using maximum entropy in a defeasible logic with probabilistic semantics. In B. Bouchon-Meunier, L. Valverde, and R.R. Yager, editors, IPMU'92 - Advanced Techniques in Artificial Intelligence, pages 43-52. Lecture Notes in Computer Science 682, Springer-Verlag, 1993.

[Cussens et al., 1993]
James Cussens, Anthony Hunter, and Ashwin Srinivasan. Generating explicit orderings for non-monotonic logics. In Proc. of the Eleventh National Conference on Artificial Intelligence (AAAI-93), pages 420-425. MIT Press, 1993.

[Cussens, 1992]
James Cussens. Estimating rule accuracies from training data. ECAI-92 Workshop Notes, 1992.

[Cussens, 1991]
James Cussens. Interpretations of Probability, Nonstandard Analysis and Confirmation Theory. PhD thesis, King's College, London, 1991.
The first chapter presents Bayesian confirmation theory. We then construct infinitesimal numbers and use them to represent the probability of unrefuted hypotheses of standard probability zero. Popper's views on the nature of hypotheses, of probability and confirmation are criticised. It is shown that Popper conflates total confirmation with weight of evidence. It is argued that Popper's corroboration can be represented in a Bayesian formalism. Popper's propensity theory is discussed. A modified propensity interpretation is presented where probabilities are defined relative to descriptions of generating conditions. The logical interpretation is briefly discussed and rejected. A Bayesian account of estimating the values of objective probabilities is given, and some of its properties are proved. Belief functions are then compared with probabilities. It is concluded that belief functions offer a more elegant representation of the impact of evidence. Both measures are then discussed in relation to various betting procedures designed to elicit their values from an individual's belief state. De Finetti's arguments concerning `coherence' are discussed. It is then shown that it is not possible to use bets to derive belief function values unless the better is allowed to vary the amount of the stake. Hume's thinking on induction is discussed. It is argued that some of the problems of Popper's philosophy derive from Hume's. The Popper-Miller argument is presented and criticised. It is concluded that the core of the argument is valid, but of limited applicability. The correspondence between probabilistic support and deductive relations is discussed. There are two appendices. The first criticises Popper's view on the connection between the content and testability of a hypothesis. The second concerns a nonstandard probability measure proposed in 1967.

[Cussens and Hunter, 1991]
James Cussens and Anthony Hunter. Using defeasible logic for a window on a probabilistic database: some preliminary notes. In R. Kruse and P. Seigel, editors, Symbolic and Quantitative Approaches for Uncertainty, pages 146-152. Lecture Notes in Computer Science 548, Springer-Verlag, 1991.

[Cussens, 1990]
James Cussens. Review of ``Chance and Structure: An Essay on the Logical Foundations of Probability''. History and Philosophy of Logic, 11(1):116-117, 1990.