Projects
Indect

Task 4.1 Learning relationships between people and organizations through websites and social networks We envisage that a relationship mining system will generate a labelled graph where the nodes correspond to people and the edges correspond to weighted relationship types and edge attributes correspond to supporting evidence (e.g. link to a web page, email or news report etc). A key research objective will be to explore mechanisms for automated learning such connections.
Task 4.3. Development of high precision tools to find highly specific information in the Internet We plan to build an enhanced search tool that can conduct search using 1. Syntactic and semantic information patterns (e.g. based on Shank‚s Conceptual Dependency framework) and 2. Latent Semantic Analysis. Such a tool is intended to allow ‘meaning-based querying’ by permitting both conventional text queries and more importantly structured queries (for example, semantic or syntactic relations).
Go to the Indect project's page.
Word Sense Induction

Using word senses instead of word forms is essential in many applications such as information retrieval (IR) and machine translation (MT). Word senses are a prerequisite for word sense disambiguation (WSD) algorithms. However, they are usually represented as a fixed-list of definitions of a manually constructed lexical database. There are several disadvantages associated with the fixedlist of senses paradigm. Firstly, lexical databases often contain general definitions and miss many domain specific senses. Secondly, they suffer from the lack of explicit semantic and topical relations between concepts. Thirdly, they often do not reflect the exact content of the context, in which the target word appears. WSI aims to overcome these limitations of hand-constructed lexicons.
Go to the Word Sense Induction project's page.
Persuasive Dialogue

In the field of natural language dialogue, a new trend is exploring persuasive argumentation theories. Applying these theories to human-computer dialogue management could lead to a more comfortable experience for the user and give way to new applications. This research intends to investigate if, by using the long studied theories of rhetoric in philosophy and linguistics, it will be possible to build a more human-like automated dialogue systems. Up to now, persuasive aspects of dialogue have been taken into account only in very specialised areas such as law. The interest of this project is therefore to study how rhetoric and persuasion theories could be applied in some less restricted types of dialogue.
Go to the Persuasive Dialogue project's page.
YourQA

It has been argued that providing a QA system with a dialogue interface would encourage and accommodate the submission of multiple related questions and handle the users requests for clarification. Indeed, information seeking dialogue applications of QA are still at an early stage and often relate to close domains. YourQA system is able to provide both factoid and complex answers such as definitions and descriptions. The dialogue interfaces role is to enable an information seeking, cooperative, inquiry-oriented conversation to support the question answering component.
Go to the YourQA project's page.
YorkQA

This was our first entry at TREC and the system we presented was, due to time constraints, an incomplete prototype. Our main aims were to verify the usefulness of syntactic analysis for QA and to experiment with different semantic distance metrics in view of a more complete and fully integrated future system. To this end we made use of a part-of-speech tagger and NP chunker in conjunction with entity recognition and semantic distance metrics. We also envisaged experimenting with a shallow best first parser but time factors meant integration with the rest of the system was not achieved. Unfortunately due to time constraints no testing and no parameter tuning was carried out prior TREC. This in turn meant that a number of small bugs negatively influenced our results. Moreover it was not possible to carry out experiments in parameter tuning, meaning our system did not achieve optimal performance. Nevertheless we obtained reasonable results, the best score being 18.1% of the questions correct (with lenient judgements).
Go to the YorkQA project's page.
Ubiquitous Digital Assistant

Ubiquitous Digital Assistant DTI Nextwave Project part of the AMADEUS Centre portfolio of projects at York. UDA aims to build light-weight device specific natural language question-answering systems which are automatically generated using machine learning techniques from training data. Although it is possible to construct such systems manually there are two associated problems: firstly, there is a high cost involved in building and evaluating hand crafted systems. Secondly, the range of vocabulary and linguistic coverage of existing systems is very low. The UDA project will address both these issues by employing large QA (question-answer) datasets generated using a broad coverage industrial strength system such as Lexicle’s natural language understanding system. Lexicle’s system will be modified to automatically generate variations of essentially the same question. Lexicle’s system will also be used to automate building device specific question-answer pairs. Machine learning techniques, which are mature, will be employed to learn, from the datasets, device specific (probabilistic) automata that can be deployed on low-power devices.
Go to the Ubiquitous Digital Assistant project's page.
Intelligent Tutoring System

ITS is a joint project between Psyton Associates Ltd and University of York, partially funded by the Department of Trade and Industry (DTI). The project aims to develop an interactive user-friendly web-based tool to assist the development of senior managers in the police service. There are 43 Home Office Police Forces in England and Wales and 7 in Scotland. These forces range in size from 950 to 15000 police officers, with many additional operational support staff. The web-based tool aims to target the 25% middle to senior management roles within each force. The project involves the following key stages:
- Identification of case studies required for building the content.
- Building of intial prototype authoring system.
- Conducting user-trials.
- Enhancement of the prototype system based on user feeback.
- Integration of voice-recognition system.
Go to the Intelligent Tutoring System project's page.
Stochastic Constraint Programming
Many decision problems contain uncertainty. Data about events in the past may not be known exactly due to errors in measuring or difficulties in sampling, whilst data about events in the future may simply not be known with certainty. For example, when scheduling power stations, we need to cope with uncertainty in future energy demands. As a second example, nurse rostering in an accident and emergency department requires us to anticipate variability in workload. As a final example, when constructing a balanced bond portfolio, we must deal with uncertainty in the future price of bonds. To deal with such situations, an extension of constraint programming, called stochastic constraint programming was proposed, in which we distinguish between decision variables, which we are free to set, and stochastic (or observed) variables, which follow some probability distribution.
Go to the Stochastic Constraint Programming project's page.
Lexicle

Together with Patrick Olivier I founded Lexicle which built the first commercial embodied conversational agent (ECA) system. The ECA system enables organisations to employ virtual customer service agents. Talking, gesturing 3D characters you engage in conversation to get the information you require — just as you would with a real agent. You ask her questions, she answers; you refer to something in her response and she understands. You drive the conversation to get the answers you require in your language.
Go to the Lexicle project's page.