DLK/1
Project 1: Evolutionary simulation of inherited kinship-driven altruism
[CS]
The aim of the project is to assess the role of a hypothetically inherited
feature (gene) promoting altruism between relatives as a factor for survival.
The approach will simulate an environment in which two species, A and B,
are randomly seeking for food, competing for the same food resources. Food
consumption will result in increased individual energy level. Falling below
a certain energy level would mean death. An encounter of two individuals
of the same species results in the creation of an offspring if (1) their
energy levels are sufficient and (2) they meet each other's criteria based
on present and past energy levels. The initial energy level of the offspring
is subtracted from that of the parents.
Some of the individuals of species A will be carriers of a gene forcing
them to share their energy with the individuals they meet in proportion
to the degree of kinship (shared genes). For example, identical twins would
always make their energy levels equal etc. The gene will be passed with
a certain probability from parent to child.
Evaluation will assess the progressive change of the following two parameters:
-
the ratio between individuals of species A and B;
-
the proportion of gene carriers within species A.
In the simulation, spatial phenomena (food discovery, encounter of another
individual) will be represented as random processes with a certain probability.
In this case, physical distance between individuals is ignored, and the
encounter of each pair is equally probable. Genetic algorithms are most
suited to the implementation of this method.
Note: The project does not aim to imply the existence of such
gene in reality, and indeed nothing of the said above would change if we
assumed altruistic behaviour being inherited not as a gene, but through
upbringing.
Reading:
-
W. D. Hamilton. The genetic evolution of social behaviour. The Journal
of Theoretical Biology, 7, 1-16, 1964.
-
Christopher R. Badcock. PsychoDarwinism: The New Synthesis of Darwin and
Freud, 1995.
-
Jean-Louis Dessalles. Altruism, status and the origin of relevance. Technical
report 96 C 006. ENST, Paris, 1996.
-
David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine
Learning, Addison-Wesley, 1989.
-
J. Barton. Kinship-Driven Altruism in Multi-Agent Systems. Third year project.
CS Dept. Univ. of York. To appear.
DLK/2
Project 2: WordNet-based Optimal Lexical Semantic Tagging [CS]
Aims: The aims of the project are to:
-
develop algorithm A that takes as input a text file T in
which words are tagged with their part of speech (PoS -noun, verb,
preposition, etc.), and assigns an optimal semantic tag (class)
to each noun and verb;
-
use these semantic classes to compare two documents and point out (1) "the
things in common", (2) the differences.
A semantic tag will be considered optimal if it belongs to the optimal
set of semantic tags. A set of semantic tags SST is considered
optimal if it consists of those semantic tags which have minimal entropy
among the set of all relevant semantic tags, and cover the required minimum
of words in the text. For the purposes of the project, the lexical resource
WordNet
will be used as a source of semantic tags.
Background: Natural Language Processing (NLP) aims to
develop tools that are able to retrieve, extract and classify linguistic
information, and, to a certain level, understand language
and provide translation or relevant feedback. The large area of
natural language study is traditionally divided into sub-domains,
e.g. morphology, syntax, semantics, etc. This division can be reflected
in the range and functionality of tools being used. For instance, part-of-speech
tagging serves to assign the corresponding parts of speech:
John/proper_noun likes/verb_3rd_person chocolate/common_noun ./full_stop
Once the parts of speech are assigned to words, the sentence can be
parsed, i.e. syntactically analysed. (Compare with programming languages
- IPL, remember? - where each token is assigned a type: NUM, ID, 'IF',
etc.) However, the PoSs are often not sufficient.
John/noun loves/verb bananas/noun with/preposition chocolate/noun
John/noun loves/verb bananas/noun with/preposition passion/noun
The sentences above have exactly the same PoSs, but different parse
trees. If the grammar is based on PoSs, there is no way of discriminating
between the two sentences. A way around this is to use the word itself
(e.g.
chocolate) instead of the PoS (noun) in the
grammar rule. This technique, called lexicalisation, is not a panacea
though, as it results in a large number of grammar rules with very low
coverage. Semantic tags (classes) can provide the means to describe grammar
rules that do not generate ambiguity yet have sufficient coverage:
NP --> noun/food prep/with noun/food
Method: The semantic classes of words contained in a document
can be used as a crude description of its content. The problem is that
each word can correspond to several semantic classes. The output of the
lexical database WordNet for the word 'bank' is shown in Figure 1 (note
the US spelling).
Overview of noun bank
1. depository financial institution, bank, banking concern, banking
company -- (a financial institution that accepts deposits and channels
the money into lending activities; "he cashed a check at the bank"; "that
bank holds the mortgage on my home")
2. bank -- (sloping land (especially the slope beside a body of
water); "they pulled the canoe up on the bank"; "he sat on the bank of
the river and watched the currents")
3. bank -- (a supply or stock held in reserve especially for future
use (especially in emergencies))
4. bank, bank building -- (a building in which commercial banking
is transacted; "the bank is on the corner of Nassau and Witherspoon")
5. bank -- (an arrangement of similar objects in a row or in tiers;
"he operated a bank of switches")
6. savings bank, coin bank, money box, bank -- (a container (usually
with a slot in the top) for keeping money at home; "the coin bank was empty")
7. bank -- (a long ridge or pile; "a huge bank of earth")
8. bank -- (the funds held by a gambling house or the dealer in
some gambling games; "he tried to break the bank at Monte Carlo")
9. bank, cant, camber -- (a slope in the turn of a road or track;
the outside is higher than the inside in order to reduce the effects of
centrifugal force)
10. bank -- (a flight maneuver; aircraft tips laterally about its
longitudinal axis (especially in turning))
Overview of verb bank
1. bank -- (tip laterally; of boats and aircraft)
2. bank -- (enclose with a bank; "bank roads")
3. bank -- (do business with a bank or keep an account at a bank;
"Where do you bank in this town?")
4. bank -- (be in the banking business)
5. deposit, bank -- (put into a bank account)
6. bank -- (cover with ashes, of fires, to control the rate of
burning)
7. trust, swear, rely, bank -- (have confidence or faith in; "We
can trust in God"; "Rely on your friends"; "bank on your good education")
|
Fig. 1: WordNet output for the word 'bank'
The semantic classes of each word can form a hierarchy (see Fig. 2):
depository financial institution, bank, banking concern, banking
company
=> financial institution, financial
organization
=> institution,
establishment
=> organization, organisation
=> social group
=> group, grouping
|
Fig. 2: WordNet hierarchy of semantic classes
Lexical semantic tags can be obtained by manually annotating the text
- a slow and costly procedure. Here we expect to use measures from Information
Theory, such as entropy (defining the numerical measure of the uncertainty
of an outcome), to find the most informative semantic classes serving to
distinguish among nouns (resp. verbs) present in our text. Tags with the
lowest entropy will be selected and used to
-
estimate the correct semantic tag for each noun and verb in the text
-
compare different texts and provide a summary of their shared and/or specific
content
Reading:
DLK/3
Project 3: Segmentation of Text with Genetic Algorithms [CS]
Background: Although very complex in nature, natural
language is yet another information code, and as
such should be subject to the general principles
of information theory (IT). Unsupervised learning
of language studies the models of language extracted from raw text,
which are based on statistical techniques such as Hidden Markov Models
(HMM) or language biases otherwise related to IT -
Maximum Likelihood or Minimal Description Length (MDL).
Other relevant techniques include de Saussure's analogy or
models based on finite state automata. The impoverished
data renders the task difficult. The prize to win though is
a representation or theory backed by the IT principles,
which could also be of interest to psycholinguists.
On a more practical level, unsupervised learning of language can
provide a cheap alternative to learning from annotated corpora.
The segmentation of text, or other sequential data, is often
used as the first step in data processing. In the area of Natural Language
Processing (NLP), this is done to segment text into tokens (words) as well
as to produce the word constituents (morphemes) of each word. Segmentation
methods vary from sets of hand-written rules to ones learned from annotated
examples where the correct segmentation is provided, to unsupervised learning
of segmentation, in which case only text with no annotation whatsoever
is used.
Unsupervised learning combines a language, or data representation
formalism, with a bias that allows for the definition
of the best representation of a given input in this language, and a search
technique to find this representation. For instance:
Language: represent words as a concatenation of a stem and suffix
Bias : minimise the size of stem and suffix lexicons
Search technique: genetic algorithm (GA)
Then the list of words work, works, worked, bird, birds might
be segmented as work+, work+s, work+ed, bird+, bird+s as
it results in a prefix lexicon {work, bird} and suffix lexicon {'',-s,
-ed} which contain less characters in comparison with any other segmentation
(Kazakov, 1997).
Aims: The project aims at the implementation of a genetic algorithm
(C++) that generates the optimal segmentation of a given text with an MDL-based
bias. The method efficiency will be tested (1) directly on annotated
data or (2) indirectly, evaluating the generative power of the segmentations
produced (e.g. using de Saussure's analogy principle: work+s, work+ing,
and sleep+ing are correct segmentations, because sleep+s
is a valid word).
Method: The method described in (Kazakov, 1997) has two limitations:
the number of segments per word is limited and it cannot be efficiently
applied on more than about a thousand words at a time. Changes in the data
representation formalism should tackle the former issue; for the improvement
of the latter, the GA fitness function (reflecting the bias) would be computed
in a much more efficient manner.
Reading:
-
Dimitar Kazakov. Unsupervised learning of naive morphology with genetic
algorithms. ECML Workshop, Prague, 1997. URL: ftp://ftp.cs.unimaas.nl/pub/ecml97/kazakov-ftp.ps.gz
-
David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine
Learning. Addison-Wesley, 1989.
DLK/4
Project 4: Pretty Print of Graphs [CS/M, CS]
Aim: The aim of the project is to generate a suitable 2D representation
of graphs from a textual description of their vertices and edges. The output
should be in the internal format of xfig or any other comparable
graph editor accepting text input. The task is to minimise the number of
crosspoints between edges. This is possible in the case of the so-called
planar
graphs. Otherwise, the number of crosspoints should be reasonably reduced
(sub-optimal and heuristic-based strategies are allowed).
Method: Use theory of graphs, namely planar graphs, planarity
testing and planar representations. Implement in C or Perl.
Reading:
DLK/5
Project 5: Behaviour Learning in Game Environments (Student-defined
project) [Alex Champandard, CS]
Aim: The project will focus on the use of machine learning methods,
such as Inductive Logic Programming (ILP) and Genetic Algorithms (GA) in
the design of virtual actors for computer games.
Method: Use Quake II game engine, ILP engine Progol, and genetic
algorithms. Implement in C++.
Reading:
-
David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine
Learning. Addison-Wesley, 1989.
-
S. Muggleton. CProgol4.4:
a tutorial introduction. In Inductive Logic Programming and Knowledge
Discovery in Databases. Springer-Verlag, 2000. To appear.
DLK/6
Project 6: Machine Learning of Motion Control in 3D Virtual Environment
(Student-defined project) [Chris Peake, CS]
Aim: The project should develop a mechanism to learn the low-level
motor co-ordination of a virtual character given its initial position,
and high-level task in a simulated 3D environment. Real world properties,
such as gravity and friction could be added to make the simulation more
realistic.
Method: Design and model a virtual character and its environment,
and the underlying data structures...
Reading:
DLK/7
Project 7: Speech Driven Character Animation (Student-defined
project) [Andrew Coates, CS]
Aim:
Method:
-
construct a computer model with a predefined set of actions. The character
need not be complicated, but must have a resting state with several sets
of actions that it can perform starting and ending in this resting state.
-
Use IBM voice recognition SW to recoginse voice commands.
-
Bring steps one and two together in order to control the character in real
time using IBM speech recognition.
Reading:
-
IBM speech recognition software
-
Matthew Brands' Voice Pupperty
-
Computer character animations
-
Essex's speech recognition page
-
Natural Language Understanding
DLK/8
Project 8: Modelling the constraints that influence cycle commuter
choices [CS, CS/M, MScIP]
Aim:
Research in the Department of Sociology has been
investigating the ways in which different factors affect people's willingness
or ability to cycle to work. Influences can come
from a range of different sources - the workplace
itself, domestic circumstances, personal preferences, bicycle technology
and sources external to all of these.
Provisional data on these factors already exists for 100 individuals,
with a further survey planned at two companies
which should yield additional data for up to 500 further individuals.
The objective will be to develop a system which can anticipate whether
or not somebody is able to cycle to work, and if not,
which are the most significant constraints for them.
The system will need not just to rank different constraints but to
incorporate a number of 'packages' of constraints that interact
together. For example, with the relationship between
journey distance (d), the availability of
shower facilities (s), personal fitness (f) and office dress code
(dc), slight changes in the patterning can make a big difference:
-
where d=long, s=no, f=good, dc=casual, somebody should
be able to cycle
-
where d=long, s=no, f=good, dc=smart, this may
form a barrier to cycling
Providing cycle promoters in local authorities with information about
how such patterns are formed, and where they occur across a workforce,
can help make their work more effective.
Method:
Use machine learning algorithms, such as C4.5 and
Progol to analyse data and form hypotheses in the form of decision trees
(in the case of C4.5) and first order logic rules (when Progol is
used). Both C4.5 and Progol can help the
collection of additional data by selecting
the attributes/features, which are relevant
to the hypotheses formed. In addition,
Progol can be provided with potentially
relevant higher-level concepts that are based on
the information provided in the questionnaire, e.g.:
fit_and_well_dressed(Person):-
fit(Person),
well_dressed(Person).
The above condition is only satisfied for those people
who are both fit and well dressed. If Progol includes this composite feature
in the theories learned (and never makes use of
the features `fit' and `well-dressed' separately), that might
be a reason to add this feature to the questionnaire in
this very form, and not as two separate questions.
References:
-
Curtis, Carey & Peter Headicar (1997) 'Targeting
travel awareness campaigns: which individuals are more
likely to switch from car to other transport for the journey
to work?' in Transport Policy 4: 57-65.
-
Davies, D.G., M.E. Halliday, M. Mayes & R.L. Pocock (1997)
Attitudes to Cycling: A Qualitative Study and Conceptual
Methodology Crowthorne, Berks.: Transport Research Laboratory, TRL Report
266
-
Joshi, Mary Sissons & Victoria Senior (1998) 'Journey
to work: the potential for modal shift?' in Health
Education Journal 57: 212-223.
-
J. Quinlan. Induction of decision trees. Machine Learning, 1(1), 1986.
-
S. Muggleton. CProgol4.4: a tutorial introduction. In Inductive Logic
Programming and Knowledge
Discovery in Databases. Springer-Verlag,
2000. To appear.
DLK/9
Project 9: Evolutionary simulation of inherited kinship-driven altruism
[MScIP]
The aim of the project is to assess the role of a hypothetically inherited
feature (gene) promoting altruism between relatives as a factor for survival.
The approach will simulate an environment in which two species, A and B,
are randomly seeking for food, competing for the same food resources. Food
consumption will result in increased individual energy level. Falling below
a certain energy level would mean death. An encounter of two individuals
of the same species results in the creation of an offspring if (1) their
energy levels are sufficient and (2) they meet each other's criteria based
on present and past energy levels. The initial energy level of the offspring
is subtracted from that of the parents.
Some of the individuals of species A will be carriers of a gene forcing
them to share their energy with the individuals they meet in proportion
to the degree of kinship (shared genes). For example, identical twins would
always make their energy levels equal etc. The gene will be passed with
a certain probability from parent to child.
Evaluation will assess the progressive change of the following two parameters:
-
the ratio between individuals of species A and B;
-
the proportion of gene carriers within species A.
An existing tool would be used for the simulations.
Note: The project does not aim to imply the existence of such
gene in reality, and indeed nothing of the said above would change if we
assumed altruistic behaviour being inherited not as a gene, but through
upbringing.
Reading:
-
W. D. Hamilton. The genetic evolution of social behaviour. The Journal
of Theoretical Biology, 7, 1-16, 1964.
-
Christopher R. Badcock. PsychoDarwinism: The New Synthesis of Darwin and
Freud, 1995.
-
Jean-Louis Dessalles. Altruism, status and the origin of relevance. Technical
report 96 C 006. ENST, Paris, 1996.
-
David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine
Learning, Addison-Wesley, 1989.
-
J. Barton. Kinship-Driven Altruism in Multi-Agent Systems. Third year project.
CS Dept. Univ. of York. To appear.
DLK/10 (Joint supervision with Guiem Bernat)
Project 10: Behavioural cloning of football players [Student-defined: Jamie
Hodkinson]
Aim: The project will use machine learning techniques (Inductive
Logic Programming, Decision Trees) to model (copy) the recorded behaviour
of single players within a team or the team as a whole. The project will
use the setting of Robocup as a testbed.
Last change: 28 Feb 2000
kazakov@cs.york.ac.uk