- [Cussens, 2011a]
- James
Cussens.
Approximate
Bayesian computation for the parameters of PRISM programs.
In P. Frasconi and F. A. Lisi, editors, Proc. 20th International
Conference on Inductive Logic Programming (ILP 2010), pages 38-46,
Firenze, 2011. Springer.
- [Cussens,
2011b]
- James Cussens.
Bayesian network
learning with cutting planes.
In Fabio G. Cozman and Avi Pfeffer, editors, Proceedings of the 27th
Conference on Uncertainty in Artificial Intelligence (UAI 2011), pages
153-160, Barcelona, 2011. AUAI Press.
- [Bartlett
et al., 2010a]
- Mark Bartlett, Iain Bate, and James Cussens.
Instruction
cache prediction using Bayesian networks.
In Proc. ECAI 2010 - 19th European Conference on Artificial
Intelligence, pages 1099-1100, 2010.
Short paper.
- [Bartlett
et al., 2010b]
- Mark Bartlett, Iain Bate, and James Cussens.
Learning
Bayesian networks for improved instruction cache analysis.
In Proceedings of the Ninth International Conference on Machine Learning
and Applications (ICMLA 2010), pages 417-423. IEEE, 2010.
- [Cussens, 2010a]
- James Cussens.
Induction.
In Claude Sammut and Geoffrey I. Webb, editors, Encyclopedia of Machine
Learning, pages 519-522. Springer, 2010.
- [Cussens, 2010b]
- James Cussens.
Maximum
likelihood pedigree reconstruction using integer programming.
In Proceedings of the Workshop on Constraint Based Methods for
Bioinformatics (WCB-10), Edinburgh, July 2010.
- [Liverani et al.,
2010]
- Silvia Liverani, James Cussens, and Jim Q. Smith.
Searching a multivariate partition space using weighted MAX-SAT.
In F. Masulli, L. Peterson, and R. Tagliaferri, editors, Proceedings of
the Sixth International Meeting on Computational Intelligence Methods for
Bioinformatics and Biostatistics (CIBB) 2009, LNBI 6160, pages
240-253. Springer, 2010.
- [Cussens,
2009]
- James Cussens.
On
generative parameterisations of Markov logic networks.
In Proc. SRL 09, Leuven, July 2009.
- [Fisher and Cussens,
2009]
- Barnaby Fisher and James Cussens.
Further
Inductive Mercury Programming and IMP0.5.
Technical Report YCS-2009-437, Dept. of Computer Science, University of York,
March 2009.
We explore the use of Mercury for Inductive Logic Programming and
present IMP0.5, the product of our research. Mercury is a compiled logic
programming language with modern features, which requires the user to write
type, mode and determinism declarations for each of their predicates. This
information is used by the Mercury compiler to optimise generated code,
which, amongst other things, enables Mercury to provide faster execution than
Prolog. IMP0.5 is an ILP system, and an ILP software library, which contains
re-usable modules of ILP related code. Our aim in creating IMP0.5 as a
software library was to significantly reduce the effort required to implement
new ILP algorithms. We feel this has been achieved and provide
implementations of some typical ILP algorithms, the most notable being the
Aleph default algorithm. Since Mercury is a purely declarative language
run-time assertion of induced hypotheses is prohibited. Therefore, hypotheses
are represented as ground terms, and, to enable fast cover testing,
interpreted with a problem specific interpreter, which is generated at
compile-time. We also use a cover set representation based on RL-Trees, which
is very space efficient and thus beneficial for large searches. Empirical
results are generally good, especially for complex background knowledge, and
in one case shows a 56 times speed-up compared with Aleph.
- [Angelopoulos and
Cussens, 2008]
- Nicos Angelopoulos and James Cussens.
Bayesian
learning of Bayesian networks with informative priors.
Annals of Mathematics and Artificial Intelligence, 54(1):53-98,
2008.
This paper presents and evaluates an approach to Bayesian model
averaging where the models are Bayesian nets (BNs). A comprehensive study of
the literature on structural priors for BNs is conducted. A number of prior
distributions are defined using stochastic logic programs and the MCMC
Metropolis-Hastings algorithm is used to (approximately) sample from the
posterior. We use proposals which are tightly coupled to the priors which
give rise to cheaply computable acceptance probabilities. Experiments using
data generated from known BNs have been conducted to evaluate the method. The
experiments used 6 different BNs and varied: the structural prior, the
parameter prior, the Metropolis-Hasting proposal and the data size. Each
experiment was repeated three times with different random seeds to test the
robustness of the MCMC-produced results. Our results show that with effective
priors (i) robust results are produced and (ii) informative priors improve
results significantly.
- [Cussens, 2008]
- James
Cussens.
Bayesian
network learning by compiling to weighted MAX-SAT.
In Proceedings of the 24th Conference on Uncertainty in Artificial
Intelligence (UAI 2008), pages 105-112, Helsinki, 2008. AUAI Press.
The problem of learning discrete Bayesian networks from data is
encoded as a weighted MAX-SAT problem and the MaxWalkSat local search
algorithm is used to address it. For each dataset, the per-variable summands
of the (BDeu) marginal likelihood for different choices of parents (`family
scores') are computed prior to applying MaxWalkSat. Each permissible choice
of parents for each variable is encoded as a distinct propositional atom and
the associated family score encoded as a `soft' weighted single-literal
clause. Two approaches to enforcing acyclicity are considered: either by
encoding the ancestor relation or by attaching a total order to each graph
and encoding that. The latter approach gives better results. Learning
experiments have been conducted on 21 synthetic datasets sampled from 7 BNs.
The largest dataset has 10,000 datapoints and 60 variables producing (for the
`ancestor' encoding) a weighted CNF input file with 19,932 atoms and 269,367
clauses. For most datasets, MaxWalkSat quickly finds BNs with higher BDeu
score than the `true' BN. The effect of adding prior information is assessed.
It is further shown that Bayesian model averaging can be effected by
collecting BNs generated during the search.
- [Santos Costa et al., 2008]
- Vítor
Santos Costa, David Page, and James Cussens.
CLP( cal BN):
Constraint logic programming for probabilistic knowledge.
In Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton,
editors, Probabilistic Inductive Logic Programming, volume 4911
of Lectures Notes in Computer Science, pages 156-188. Springer,
Berlin, 2008.
doi = 10.1007/978-3-540-78652-8_6.
In Datalog, missing values are represented by Skolem constants.
More generally, in logic programming missing values, or existentially
quantified variables, are represented by terms built from Skolem functors.
The CLP( cal BN ) language represents the joint probability distribution
over missing values in a database or logic program by using constraints to
represent Skolem functions. Algorithms from inductive logic programming (ILP)
can be used with only minor modification to learn CLP( cal BN ) programs.
An implementation of CLP( cal BN ) is publicly available as part of YAP
Prolog at http://www.ncc.up.pt/~vsc/Yap.
- [Cussens, 2007a]
- James Cussens.
Bayesian classification and regression trees.
Expert Update, 9(3):37-42, 2007.
- [Cussens,
2007b]
- James Cussens.
Logic-based
formalisms for statistical relational learning.
In Lise Getoor and Ben Taskar, editors, Introduction to Statistical
Relational Learning. MIT Press, Cambridge, MA, 2007.
- [Cussens, 2007c]
- James
Cussens.
Model
equivalence of PRISM programs.
In Proceedings of the Dagstuhl seminar: Probabilistic, Logical and
Relational Learning - A Further Synthesis, 2007.
The problem of deciding the probability model equivalence of two
PRISM programs is addressed. In the finite case this problem can be solved
(albeit slowly) using techniques from algebraic statistics,
specifically the computation of elimination ideals and Gröbner bases. A
very brief introduction to algebraic statistics is given. Consideration is
given to cases where shortcuts to proving/disproving model equivalence are
available.
- [Fisher and Cussens,
2007]
- Barnaby Fisher and James Cussens.
Inductive
Mercury programming.
In Stephen Muggleton and Ramon Otero, editors, Inductive Logic
Programming: Proceedings of the 16th International Conference
(ILP-06), Santiago de Compostela, 2007. Springer.
We investigate using the Mercury language to implement and design
ILP algorithms, presenting our own ILP system IMP. Mercury provides faster
execution than Prolog. Since Mercury is a purely declarative language,
run-time assertion of induced clauses is prohibited. Instead IMP uses a
problem-specific interpreter of ground representations of induced clauses.
The interpreter is used both for cover testing and bottom clause generation.
The Mercury source for this interpreter is generated automatically from the
user's background knowledge using MOOSE, a Mercury parser generator. Our
results include some encouraging results on IMP's cover testing speed, but
overall IMP is still generally a little slower than ALEPH.
- [Angelopoulos and Cussens,
2006]
- Nicos Angelopoulos and James Cussens.
Exploiting independence for branch operations in Bayesian
learning of C&RTs.
In Luc De Raedt, Thomas Dietterich, Lise Getoor, and Stephen H. Muggleton,
editors, Probabilistic, Logical and Relational Learning - Towards a
Synthesis, number 05051 in Dagstuhl Seminar Proceedings, Dagstuhl,
Germany, 2006. Internationales Begegnungs- und Forschungszentrum fuer
Informatik (IBFI), Schloss Dagstuhl, Germany.
[date of citation:
2006-01-01].
- [Kazakov et al.,
2006]
- Dimitar Kazakov, James Cussens, and Suresh Manandhar.
On the
duality of semantics and syntax: The PP attachment case.
Technical Report YCS-2006-409, University of York, 2006.
- [Angelopoulos
and Cussens, 2005a]
- Nicos Angelopoulos and James Cussens.
Exploiting
informative priors for Bayesian classification and regression trees.
In Proceedings of the Nineteenth International Joint Conference on
Artificial Intelligence (IJCAI-05), pages 641-646, Edinburgh, 2005.
A general method for defining informative priors on statistical
models is presented and applied specifically to the space of classification
and regression trees. A Bayesian approach to learning such models from data
is taken, with the Metropolis-Hastings algorithm being used to approximately
sample from the posterior. By only using proposal distributions closely tied
to the prior, acceptance probabilities are easily computable via marginal
likelihood ratios, whatever the prior used. Our approach is empirically
tested by varying (i) the data, (ii) the prior and (iii) the proposal
distribution. A comparison with related work is given.
- [Angelopoulos and Cussens,
2005b]
- Nicos Angelopoulos and James Cussens.
Tempering
for Bayesian C&RT.
In Proceedings of the 22nd International Conference on Machine Learning
(ICML05), pages 17-24, Bonn, 2005.
This paper concerns the experimental assessment of tempering
as a technique for improving Bayesian inference for C&RT models. Full
Bayesian inference requires the computation of a posterior over all possible
trees. Since exact computation is not possible Markov chain Monte Carlo
(MCMC) methods are used to produce an approximation. C&RT posteriors have
many local modes: tempering aims to prevent the Markov chain getting stuck in
these modes. Our results show that a clear improvement is achieved using
tempering.
- [Cussens, 2005a]
- James Cussens.
Deductive reasoning and statistical inference.
In Brian S. Everitt and David C. Howell, editors, Encyclopedia of
Statistics in Behavioral Science, volume 1, pages 472-475. Wiley,
Chichester, 2005.
Connections and contrasts between deductive reasoning and
statistical inference are given. Bayesian statistical inference is analysed,
including an account of its relation to deductive logic and the status of
prior distributions. Classical statistical inference is investigated with a
focus on hypothesis testing.
- [Cussens, 2005b]
- James
Cussens.
Integrating
by separating: Combining probability and logic with ICL, PRISM and
SLPs.
APRIL project report, January 2005.
This report describes the close relationship that obtains between
the ICL, PRISM and SLP frameworks. The common feature of these frameworks is
that a purely probabilistic component and a purely logical component are
connected to produce a hybrid model. A hidden Markov model (HMM) is used as a
running example. The Uniqueness Condition, which allows these frameworks to
represent statistical models is discussed, and the consequences of using a
weakened version of the Uniqueness Condition briefly explored. `Lazy'
sampling, based on SLD-resolution, is discussed.
- [Angelopoulos and
Cussens, 2004a]
- Nicos Angelopoulos and James Cussens.
Extended
stochastic logic programs for informative priors over C&RTs.
In Rui Camacho, Ross King, and Ashwin Srinivasan, editors, Proceedings of
the work-in-progress track of the Fourteenth International Conference on
Inductive Logic Programming (ILP04), pages 7-11, Porto, September
2004.
A general method for defining informative priors on statistical
models is presented and applied specifically to the space of classification
and regression trees. Our aim is towards a Bayesian approach to learning such
models from data, with the Metropolis-Hastings algorithm used to sample from
the posterior. We present some preliminary results where we empirically
tested the methodology.
- [Angelopoulos and Cussens,
2004b]
- Nicos Angelopoulos and James Cussens.
On
the implementation of MCMC proposals over stochastic logic programs.
In Colloquium on Implementation of Constraint and LOgic Programming
Systems. Satellite workshop to ICLP'04, Saint-Malo, France, 2004.
- [de Melo and Cussens,
2004]
- Wolfgang David Cirilo de Melo and James Cussens.
Leibniz on estimating the uncertain: An English translation of De
incerti aestimatione with commentary.
Leibniz Review, 14:31-53, 2004.
Leibniz's De incerti aestimatione, which contains his
solution to the division problem, has not received much attention, let
alone much appreciation. This is surprising because it is in this work that
the definition of probability in terms of equally possible cases appears for
the first time. The division problem is used to establish and test
probability theory; it can be stated as follows: if two players agree to play
a game in which one has to win a certain number of rounds in order to win the
pool, but if they break the game off before either of them has won the
required number of rounds, how should the pool be distributed? Our article
has two aims: it provides the readers with the first English translation of
De incerti aestimatione, and it also gives them a brief commentary
that explains Leibniz's philosophical and mathematical concepts necessary in
order to understand this work. The translation is as literal as possible
throughout; it shows how Leibniz struggled at times to find a solution to the
division problem and how he approached it from different angles. The
commentary discusses Leibniz's views on four key concepts: fairness, hope,
authority and possibility. The commentary then outlines how Leibniz attempted
to solve the problem of division.
- [Cussens, 2003]
- James Cussens.
Individuals,
relations and structures in probabilistic models.
In Lise Getoor and David Jensen, editors, IJCAI Workshop on Learning
Statistical Models from Relational Data (SRL2003), pages 32-36,
Acapulco, Mexico, August 2003.
Relational data is equivalent to non-relational structured data. It
is this equivalence which permits probabilistic models of relational data.
Learning of probabilistic models for relational data is possible because one
item of structured data is generally equivalent to many related data items.
Succession and inclusion are two relations that have been well explored in
the statistical literature. A description of the relevant statistical
approaches is given. The representation of relational data via Bayesian nets
is examined, and compared with PRMs. The paper ends with some cursory remarks
on structured objects.
- [Santos Costa et al., 2003]
- Vítor
Santos Costa, David Page, Maleeha Qazi, and James Cussens.
CLP( cal BN):
Constraint logic programming for probabilistic knowledge.
In Uffe Kjærulff and Christopher Meek, editors, Proceedings of the
Nineteenth Annual Conference on Uncertainty in Artificial Intelligence
(UAI-2003), pages 517-524, Acapulco, Mexico, 2003. Morgan Kaufmann.
In Datalog, missing values are represented by Skolem constants.
More generally, in logic programming missing values, or
existentially-quantified variables, are represented by terms built from
Skolem functors. In an analogy to probabilistic relational models (PRMs), we
wish to represent the joint probability distribution over missing values in a
database or logic program using a Bayesian network. This paper presents an
extension of logic programs that makes it possible to specify a joint
probability distribution over terms built from Skolem functors in the
program. Our extension is based on constraint logic programming (CLP), so we
call the extended language CLP( cal BN). We show that CLP( cal BN)
subsumes PRMs; this greater expressivity carries both advantages and
disadvantages for CLP( cal BN). We also show that algorithms from
inductive logic programming (ILP) can be used with only minor modification to
learn CLP( cal BN) programs. An implementation of CLP( cal BN) is
publicly available as part of YAP Prolog at
http://www.cos.ufrj.br/~vitor/Yap/clpbn.
- [Angelopoulos and Cussens,
2002]
- Nicos Angelopoulos and James Cussens.
Prolog issues of an MCMC algorithm.
In U. Geske, O. Bartenstein, M. Hannebauer, and O. Yoshie, editors,
Web-Knowledge Management and Decision Support - Selected Papers from
the 14th International Conference on Applications of Prolog, volume
2543 of LNAI, pages 191-202. Springer, Berlin, 2002.
- [Cussens,
2002a]
- James Cussens.
Issues
in learning language in logic.
In Antonis C. Kakas and Fariba Sadri, editors, Computational Logic: Logic
Programming and Beyond, volume 2408 of LNAI, pages
491-505. Springer, Berlin, 2002.
Selected issues concerning the use of logical representations in
machine learning of natural language are discussed. It is argued that the
flexibility and expressivity of logical representations are particularly
useful in more complex natural language learning tasks. A number of inductive
logic programming (ILP) techniques for natural language are analysed
including the CHILL system, abduction and the incorporation of linguistic
knowledge, including active learning. Hybrid approaches integrating ILP with
manual development environments and probabilistic techniques are
advocatd.
- [Cussens, 2002b]
- James
Cussens.
Leibniz
and Boole on logic and probability.
Unpublished and unsubmitted, so far!, 2002.
Combined logical-probabilistic frameworks are very old; often the
result of grand attempts to formulate a calculus of general reasoning. A
careful account of the assumptions and achievements of these systems can help
us address issues in contemporary logical-probabilistic frameworks. This
paper aims to provide such an account drawing on the pioneering work of
Leibniz and Boole. In Leibniz, we find the first account of epistemic
probability in terms of possible worlds. In Boole, we see the extent to which
a combined logical-probabilistic calculus is possible using a propositional
representation.
- [Page et al., 2002]
- David Page,
Fenghuang Zhan, James Cussens, Michael Waddell, Johanna Hardin, Bart
Barlogie, and John Shaughnessy, Jr.
Comparative
data mining for microarrays: A case study based on multiple myeloma.
Technical Report 1453, Computer Sciences Department, University of Wisconsin,
Nov 2002.
Supervised machine learning and data mining tools have become
popular for the analysis of gene expression microarray data. They have the
potential to uncover new therapeutic targets for diseases, to predict how
patients will respond to specific treatments, and to uncover regulatory
relationships among genes in normal and disease situations. Comparative
experiments are needed to identify the advantages of the leading supervised
learning algorithms for microarray data, as well as to give direction in
methodological decisions. This paper compares support vector machines,
Bayesian networks, decision trees, boosted decision trees, and voting
(ensembles of decision stumps) on a new microarray data set for cancer with
over 100 samples. The paper provides evidence for several important lessons
for mining microarray data, including: (1) Bayes nets and ensembles perform
at least as well as other approaches but arguably provide more direct
insight; (2) the common practice of throwing out low or negative average
differences, or those accompanied by an absent call, is a mistake; (3)
looking for consistent differences in expression may be more important than
large differences.
- [Angelopoulos and
Cussens, 2001a]
- Nicos Angelopoulos and James Cussens.
Markov
chain Monte Carlo using tree-based priors on model structure.
In Jack Breese and Daphne Koller, editors, Proceedings of the Seventeenth
Annual Conference on Uncertainty in Artificial Intelligence
(UAI-2001), pages 16-23, Seattle, August 2001. Morgan Kaufmann.
We present a general framework for defining priors on model
structure and sampling from the posterior using the Metropolis-Hastings
algorithm. The key ideas are that structure priors are defined via a
probability tree and that the proposal distribution for the
Metropolis-Hastings algorithm is defined using the prior, thereby defining a
cheaply computable acceptance probability. We have applied this approach to
Bayesian net structure learning using a number of priors and proposal
distributions. Our results show that these must be chosen appropriately for
this approach to be successful.
- [Angelopoulos and Cussens,
2001b]
- Nicos Angelopoulos and James Cussens.
Prolog issues of an MCMC algorithm.
In Proceedings of the 14th International Conference of Applications of
Prolog, pages 246-253, Tokyo, October 2001.
- [Cussens,
2001a]
- James Cussens.
Integrating probabilistic and logical reasoning.
In David Corfield and Jon Williamson, editors, Foundations of
Bayesianism, volume 24 of Applied Logic Series, pages
241-260. Kluwer, Dordrecht, 2001.
This is an updated version of the ETAI paper of the same name.
- [Cussens, 2001b]
- James Cussens.
Parameter
estimation in stochastic logic programs.
Machine Learning, 44(3):245-271, 2001.
Stochastic logic programs (SLPs) are logic programs with labelled
clauses which define a log-linear distribution over refutations of goals. The
log-linear distribution provides, by marginalisation, a distribution over
variable bindings, allowing SLPs to compactly represent quite complex
distributions. We analyse the fundamental statistical properties of SLPs
addressing issues concerning infinite derivations, `unnormalised' SLPs and
impure SLPs. After detailing existing approaches to parameter estimation for
log-linear models and their application to SLPs, we present a new algorithm
called failure-adjusted maximisation (FAM). FAM is an instance of the
EM algorithm that applies specifically to normalised SLPs and provides a
closed-form for computing parameter updates within an iterative maximisation
approach. We empirically show that FAM works on some small examples and
discuss methods for applying it to bigger problems.
- [Cussens,
2001c]
- James Cussens.
Statistical
aspects of stochastic logic programs.
In Tommi Jaakkola and Thomas Richardson, editors, Artificial Intelligence
and Statistics 2001: Proceedings of the Eighth International Workshop,
pages 181-186, Key West, Florida, January 2001. Morgan Kaufmann.
Stochastic logic programs (SLPs) and the various distributions they
define are presented with a stress on their characterisation in terms of
Markov chains. Sampling, parameter estimation and structure learning for SLPs
are discussed. The application of SLPs to Bayesian learning, computational
linguistics and computational biology are considered. Lafferty's Gibbs-Markov
models are compared and contrasted with SLPs.
- [Pulman and Cussens,
2001]
- Stephen Pulman and James Cussens.
Grammar
learning using Inductive Logic Programming.
Oxford University Working Papers in Linguistics, Philology and
Phonetics, 6:31-45, May 2001.
This paper gives a brief introduction to a particular machine
learning method known as inductive logic programming. It is argued that this
method, unlike many current statistically based machine learning methods,
implies a view of grammar learning that bears close affinity to the views
linguists have of the logical problem of language acquisition. Two
experiments in grammar learning using this technique are described, using a
unification grammar formalism, and positive-only data.
- [Cussens,
2000a]
- James Cussens.
Attribute-value
and relational learning: A statistical viewpoint.
In Luc De Raedt and Stefan Kramer, editors, Proceedings of the
ICML-2000 Workshop on Attribute-Value and Relational Learning: Crossing the
Boundaries, pages 35-39, 2000.
In this extended abstract, rather than crossing the boundary
between attribute-value and relational learning, we place ourselves above any
such boundary and look down on the problem from the point of view of general
principles of statistical inference. We do not pretend that this paper gives
a full account of all relevant issues, but argue that starting from this
generalised viewpoint and working down towards actual learning problems
(e.g. decision tree learning, regression, ILP, etc) makes it easier to find
the essential contrasts and similarities between different learning
problems. Our primary goal (not achieved here) is to abstract away from
superficial issues, such as the concrete syntactic representation of a
problem or worse the sociological origin of an approach.
- [Cussens, 2000b]
- James Cussens.
Stochastic
logic programs: Sampling, inference and applications.
In Proceedings of the Sixteenth Annual Conference on Uncertainty in
Artificial Intelligence (UAI-2000), pages 115-122, San Francisco,
CA, 2000. Morgan Kaufmann.
Algorithms for exact and approximate inference in stochastic logic
programs (SLPs) are presented, based respectively, on variable elimination
and importance sampling. We then show how SLPs can be used to represent prior
distributions for machine learning, using (i) logic programs and (ii) Bayes
net structures as examples. Drawing on existing work in statistics, we apply
the Metropolis-Hasting algorithm to construct a Markov chain which samples
from the posterior distribution. A Prolog implementation for this is
described. We also discuss the possibility of constructing explicit
representations of the posterior.
- [Cussens and Dzersoki,
2000]
- James Cussens and Saso Dzersoki, editors.
Learning
Language in Logic, volume 1925 of LNAI.
Springer, Berlin, September 2000.
LNAI State-of-the-Art Survey.
- [Cussens and Frisch,
2000]
- James Cussens and Alan Frisch, editors.
Proceedings
of the 10th Interrnational Conference on Inductive Logic Programming (ILP
2000), volume 1866 of LNAI, London, July 2000.
Springer.
- [Cussens and Pulman,
2000a]
- James Cussens and Stephen Pulman.
Experiments in inductive chart parsing.
In James Cussens and Saso Dzeroski, editors, Learning Language in
Logic, volume 1925 of LNAI. Springer, 2000.
We use Inductive Logic Programming (ILP) within a chart-parsing
framework for grammar learning. Given an existing grammar G, together with
some sentences which G can not parse, we use ILP to find the ``missing''
grammar rules or lexical items. Our aim is to exploit the inductive
capabilities of chart parsing, i.e. the ability to efficiently determine
what is needed for a parse. For each unparsable sentence, we find actual
edges and *needed edges*: those which are needed to allow a parse. The former
are used as background knowledge for the ILP algorithm (P-Progol) and the
latter are used as examples for the ILP algorithm. We demonstrate our
approach with a number of experiments using context-free grammars and a
feature grammar.
- [Cussens and
Pulman, 2000b]
- James Cussens and Stephen Pulman.
Incorporating
linguistics constraints into inductive logic programming.
In Proceedings of CoNLL2000 and LLL2000, pages 184-193, Lisbon,
September 2000. ACL.
We report work on effectively incorporating linguistic knowledge
into grammar induction. We use a highly interactive bottom-up inductive logic
programming (ILP) algorithm to learn `missing' grammar rules from an
incomplete grammar. Using linguistic constraints on, for example, head
features and gap threading, reduces the search space to such an extent that,
in the small-scale experiments reported here, we can generate and store all
candidate grammar rules together with information about their coverage and
linguistic properties. This allows an appealingly simple and controlled
method for generating linguistically plausible grammar rules. Starting from a
base of highly specific rules, we apply least general generalisation and
inverse resolution to generate more general rules. Induced rules are ordered,
for example by coverage, for easy inspection by the user and at any point,
the user can commit to a hypothesised rule and add it to the grammar. Related
work in ILP and computational linguistics is discussed.
- [Dzeroski
et al., 2000]
- Saso Dzeroski, James Cussens, and Suresh
Manandhar.
An introduction to inductive logic programming and learning language in logic.
In James Cussens and Saso Dzeroski, editors, Learning Language in
Logic, volume 1925 of LNAI. Springer, 2000.
This chapter introduces Inductive Logic Programming (ILP) and
Learning Language in Logic (LLL). No previous knowledge of logic programming,
ILP or LLL is assumed. Elementary topics are covered and more advanced topics
are discussed. For example, in the ILP section we discuss subsumption,
inverse resolution, least general generalisation, relative least general
generalisation, inverse entailment, saturation, refinement and abduction. We
conclude with an overview of this volume and pointers to future
work.
- [Cussens,
1999a]
- James Cussens.
Integrating probabilistic and
logical reasoning.
Electronic Transactions on Artificial Intelligence, 3(B):79-103,
1999.
Selected Articles from the Machine Intelligence 16 Workshop.
We examine the vexed question of connections between logical and
probabilistic reasoning. The reasons for making such a connection are
examined. We give an account of recent work which uses loglinear models to
make the connection. We conclude with an analysis of various existing
approaches combining logic and probability.
- [Cussens, 1999b]
- James Cussens.
Loglinear
models for first-order probabilistic reasoning.
In Proceedings of the Fifteenth Annual Conference on Uncertainty in
Artificial Intelligence (UAI-99), pages 126-133, San Francisco, CA,
1999. Morgan Kaufmann Publishers.
Recent work on loglinear models in probabilistic constraint logic
programming is applied to first-order probabilistic reasoning. Probabilities
are defined directly on the proofs of atomic formulae, and by
marginalisation on the atomic formulae themselves. We use Stochastic Logic
Programs (SLPs) composed of labelled and unlabelled definite clauses to
define the proof probabilities. We have a conservative extension of
first-order reasoning, so that, for example, there is a one-one mapping
between logical and random variables. We show how, in this framework,
Inductive Logic Programming (ILP) can be used to induce the features of a
loglinear model from data. We also compare the presented framework with other
approaches to first-order probabilistic reasoning.
- [Cussens and Pulman,
1999]
- James Cussens and Stephen Pulman.
Experiments
in inductive chart parsing.
In James Cussens, editor, Learning Language in Logic Workshop Notes
(LLL99), pages 72-83, Bled, Slovenia, 1999.
We use Inductive Logic Programming (ILP) within a chart-parsing
framework for grammar learning. Given an existing grammar G, together with
some sentences which G can not parse, we use ILP to find the ``missing''
grammar rules or lexical items. Our aim is to exploit the inductive
capabilities of chart parsing, i.e. the ability to efficiently determine
what is needed for a parse. For each unparsable sentence, we find actual
edges and *needed edges*: those which are needed to allow a parse. The former
are used as background knowledge for the ILP algorithm (P-Progol) and the
latter are used as examples for the ILP algorithm. We demonstrate our
approach with a number of experiments using context-free grammars and a
feature grammar.
- [Cussens et al.,
1999]
- James Cussens, Saso Dzeroski, and Tomaz
Erjavec.
Morphosyntactic
tagging of Slovene using Progol.
In Saso Dzeroski and Peter Flach, editors, Inductive Logic
Programming: Proc. of the 9th International Workshop (ILP-99), Bled,
Slovenia, June 1999. Springer-Verlag.
We consider the task of tagging Slovene words with morphosyntactic
descriptions (MSDs). MSDs contain not only part-of-speech information but
also attributes such as gender and case. In the case of Slovene there are
2,083 possible MSDs. P-Progol was used to learn morphosyntactic
disambiguation rules from annotated data (consisting of 161,314 examples)
produced by the MULTEXT-East project. P-Progol produced 1,148 rules taking 36
hours. Using simple grammatical background knowledge, e.g. looking for case
disagreement, P-Progol induced 4,094 clauses in eight parallel runs. These
rules have proved effective at detecting and explaining incorrect MSD
annotations in an independent test set, but have not so far produced a tagger
comparable to other existing taggers in terms of accuracy.
- [Cussens, 1998a]
- James Cussens.
Notes
on inductive logic programming methods in natural language processing
(European work).
Manuscript, August 1998.
The aim of these notes is to analyse ILP methods which have been
applied to NLP, drawing exclusively on work conducted in
Europe.
- [Cussens, 1998b]
- James Cussens.
Using prior probabilities and density estimation for relational classification.
In David Page, editor, Inductive Logic Programming: Proceedings of the
8th International Conference (ILP-98), volume 1446 of Lecture
Notes in Artificial Intelligence, pages 106-115. Springer, 1998.
- [Cussens, 1997]
- James Cussens.
Part-of-speech
tagging using Progol.
In Inductive Logic Programming: Proceedings of the 7th International
Workshop (ILP-97). LNAI 1297, pages 93-108. Springer, 1997.
A system for `tagging' words with their part-of-speech (POS) tags
is constructed. The system has two components: a lexicon containing the set
of possible POS tags for a given word, and rules which use a word's context
to eliminate possible tags for a word. The Inductive Logic Programming
(ILP) system Progol is used to induce these rules in the form of definite
clauses. The final theory contained 885 clauses. For background knowledge,
Progol uses a simple grammar, where the tags are terminals and predicates
such as nounp (noun phrase) are nonterminals. Progol was altered to
allow the caching of information about clauses generated during the induction
process which greatly increased efficiency. The system achieved a per-word
accuracy of 96.4% on known words drawn from sentences without quotation
marks. This is on a par with other tagging systems induced from the same data [DaeZavBerGil96-WVLC96,Bri94-AAAI94,CutKupPedSib92-ANLP92] which all
have accuracies in the range 96-97%. The per-sentence accuracy was
49.5%.
- [Cussens et al., 1997]
- James
Cussens, David Page, Stephen Muggleton, and Ashwin Srinivasan.
Using
Inductive Logic Programming for Natural Language Processing.
In W. Daelemans, T. Weijters, and A. van der Bosch, editors, ECML'97 --
Workshop Notes on Empirical Learning of Natural Language Tasks, pages
25-34, Prague, 1997. University of Economics.
Invited keynote paper.
We summarise recent work on using Inductive Logic Programming (ILP)
for Natural Language Processing (NLP). ILP performs learning in a first-order
logical setting, and is thus well-suited to induce over the various
structured representations used in NLP. We present Stochastic Logic Programs
(SLPs) and demonstrate their use in ILP when learning from positive examples
only. We also give accounts of work on learning grammars from children's
books and part-of-speech tagging.
- [Cussens, 1996a]
- James Cussens.
Bayesian
Inductive Logic Programming with explicit probabilistic bias.
Technical Report PRG-TR-24-96, Oxford University Computing Laboratory, 1996.
- [Cussens, 1996b]
- James Cussens.
Deduction, induction and probabilistic support.
Synthese, 108(1):1-10, July 1996.
Elementary results concerning the connections between deductive
relations and probabilistic support are given. These are used to show that
Popper-Miller's result is a special case of a more general result, and that
their result is not ``very unexpected'' as claimed. According to
Popper-Miller, a purely inductively supports b only if they are
``deductively independent''---but this means that neg a vdash b.
Hence, it is argued that viewing induction as occurring only in the absence
of deductive relations, as Popper-Miller sometimes do, is untenable. Finally,
it is shown that Popper-Miller's claim that deductive relations determine
probabilistic support is untrue. In general, probabilistic support can vary
greatly with fixed deductive relations as determined by the relevant
Lindenbaum algebra.
- [Cussens, 1996c]
- James Cussens.
Effective sample size in a dichotomous process with noise.
Communications in Statistics: Theory and Methods,
25(6):1233-1246, 1996.
The effect of noise in a dichotomous process is studied from the
Bayesian viewpoint. Winkler's approximation to the posterior distribution in
the presence of noise is shown to break down badly near the limits of its
application. Information loss is measured using effective sample size. An
account of the relationship between effective sample size/information loss
and sampling data is given which differs sharply from that of previous work
in this area.
- [Cussens, 1996d]
- James Cussens.
Machine learning.
IEE Journal of Computing and Control, 7(4):164-168, August
1996.
- [Cussens, 1996e]
- James Cussens.
Part-of-speech
disambiguation using ILP.
Technical Report PRG-TR-25-96, Oxford University Computing Laboratory, 1996.
- [Cussens, 1995a]
- James Cussens.
A
Bayesian analysis of algorithms for learning finite functions.
In Armand Prieditis and Stuart Russell, editors, Machine Learning:
Proceedings of the Twelfth International Conference (ML95), pages
142-149, San Francisco, CA, 1995. Morgan Kaufmann Publishers.
We consider algorithms for learning functions f: X rightarrow Y,
where X and Y are finite, and there is assumed to be no noise in the
data. Learning algorithms, alg , are connected with galg , the set of prior
probability distributions for which they are optimal. A method for
constructing galg from alg is given and the relationship between the
various galg is discussed. Improper algorithms are identified as
those for which galg has zero volume. Improper algorithms are investigated
using linear algebra and two examples of improper algorithms are given. This
framework is then applied to the question of choosing between competing
algorithms. ``Leave-one-out'' cross-validation is hence characterised as a
crude method of ML-II prior selection. We conclude by examining how the
mathematical results bear on practical problems and by discussing related
work, as well as suggesting future work.
- [Cussens, 1995b]
- James Cussens.
A Bayesian engineering approach to building limit protection systems.
ISSAFE project report, Glasgow Caledonian University, August 1995.
- [Cussens, 1995c]
- James Cussens.
Behaviour cloning approaches to limit protection.
ISSAFE project report ISSAFE 9039/1/GCUC/3018/R/A, Glasgow Caledonian
University, April 1995.
- [Cussens, 1995d]
- James Cussens.
Industrial applications of decision trees.
ISSAFE project report ISSAFE 9039/1B/GCUC/3034/R/A, Glasgow Caledonian
University, May 1995.
- [Cussens, 1995e]
- James Cussens.
Some short descriptions of important learning algorithms.
ISSAFE project report ISSAFE 9039/11/GCUC/3027/R/A, Glasgow Caledonian
University, February 1995.
- [Cussens, 1995f]
- James
Cussens.
Using MARS to learn the ILS task.
ISSAFE project report ISSAFE 9039/1B/GCUC/3031/R/A, Glasgow Caledonian
University, August 1995.
- [Cussens, 1994a]
- James Cussens.
Approaches to `validation' in machine learning: A short survey.
ISSAFE project report ISSAFE 9039/1B/GCUC/3017/R/A, Glasgow Caledonian
University, July 1994.
- [Cussens, 1994b]
- James Cussens.
Behaviour cloning in dynamic environments.
ISSAFE project report, Glasgow Caledonian University, October 1994.
- [Cussens, 1994c]
- James Cussens.
Review of ``Interactive Theory Revision: An Inductive Logic Programming
Approach''.
Journal of Logic and Computation, 1994.
- [Cussens, 1994d]
- James Cussens.
Using machine learning to control a pole and cart simulator.
ISSAFE project report ISSAFE 9039/2/GCUC/3009/R/1, Glasgow Caledonian
University, July 1994.
- [Cussens, 1993]
- James Cussens.
Bayes
and pseudo-Bayes estimates of conditional probability and their
reliability.
In Pavel B. Brazdil, editor, Machine Learning: ECML-93, pages
136-152. Lecture Notes in Artificial Intelligence 667, Springer-Verlag,
1993.
Various ways of estimating probabilities, mainly within the
Bayesian framework, are discussed. Their relevance and application to machine
learning is given, and their relative performance empirically evaluated. A
method of accounting for noisy data is given and also applied. The
reliability of estimates is measured by a significance measure, which is also
empirically tested. We briefly discuss the use of likelihood ratio as a
significance measure.
- [Cussens and Hunter, 1993]
- James Cussens and
Anthony Hunter.
Using
maximum entropy in a defeasible logic with probabilistic semantics.
In B. Bouchon-Meunier, L. Valverde, and R.R. Yager, editors, IPMU'92 -
Advanced Techniques in Artificial Intelligence, pages 43-52. Lecture
Notes in Computer Science 682, Springer-Verlag, 1993.
- [Cussens et al., 1993]
- James Cussens, Anthony
Hunter, and Ashwin Srinivasan.
Generating
explicit orderings for non-monotonic logics.
In Proc. of the Eleventh National Conference on Artificial Intelligence
(AAAI-93), pages 420-425. MIT Press, 1993.
- [Cussens, 1992]
- James Cussens.
Estimating
rule accuracies from training data.
ECAI-92 Workshop Notes, 1992.
- [Cussens,
1991]
- James Cussens.
Interpretations
of Probability, Nonstandard Analysis and Confirmation Theory.
PhD thesis, King's College, London, 1991.
The first chapter presents Bayesian confirmation theory. We then
construct infinitesimal numbers and use them to represent the
probability of unrefuted hypotheses of standard probability zero. Popper's
views on the nature of hypotheses, of probability and confirmation are
criticised. It is shown that Popper conflates total confirmation with
weight of evidence. It is argued that Popper's corroboration
can be represented in a Bayesian formalism. Popper's propensity theory is
discussed. A modified propensity interpretation is presented where
probabilities are defined relative to descriptions of generating
conditions. The logical interpretation is briefly discussed and rejected. A
Bayesian account of estimating the values of objective probabilities is
given, and some of its properties are proved. Belief functions are
then compared with probabilities. It is concluded that belief functions offer
a more elegant representation of the impact of evidence. Both measures are
then discussed in relation to various betting procedures designed to
elicit their values from an individual's belief state. De Finetti's arguments
concerning `coherence' are discussed. It is then shown that it is not
possible to use bets to derive belief function values unless the better is
allowed to vary the amount of the stake. Hume's thinking on induction is
discussed. It is argued that some of the problems of Popper's philosophy
derive from Hume's. The Popper-Miller argument is presented and
criticised. It is concluded that the core of the argument is valid, but of
limited applicability. The correspondence between probabilistic support and
deductive relations is discussed. There are two appendices. The first
criticises Popper's view on the connection between the content and
testability of a hypothesis. The second concerns a nonstandard probability
measure proposed in 1967.
- [Cussens and Hunter, 1991]
- James Cussens
and Anthony Hunter.
Using defeasible logic for a window on a probabilistic database: some
preliminary notes.
In R. Kruse and P. Seigel, editors, Symbolic and Quantitative Approaches
for Uncertainty, pages 146-152. Lecture Notes in Computer Science
548, Springer-Verlag, 1991.
- [Cussens, 1990]
- James Cussens.
Review of ``Chance and Structure: An Essay on the Logical Foundations of
Probability''.
History and Philosophy of Logic, 11(1):116-117, 1990.