Dr Victoria Hodge, Senior Researcher / Developer in AI and Machine Learning.

Aloft: Self-Adaptive Drone Controller Testbed.

Calum Imrie, et al.

In 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS'24), April 15–16, 2024, Lisbon, Portugal. ACM, New York.

A virtual machine containing the artifact can be found here: Aloft GitHub repo

Conference Papers

Abstract

Aerial drones are increasingly being considered as a valuable tool for inspection in safety critical contexts. Nowhere is this more true than in mining operations which present a dynamic and dangerous environment for human operators. Drones can be deployed in a number of contexts including efficient surveying as well as search and rescue missions. Operating in these dynamic contexts is challenging however and requires the drones control software to detect and adapt to conditions at run-time. To help in the development of such systems we present Aloft, a simulation supported testbed for investigating self-adaptive controllers for drones in mines. Aloft utilises the Robot Operating system (ROS) and a model environment using Gazebo to provide a physics-based testing. The simulation environment is constructed from a 3D point cloud collected in a physical mock-up of a mine and contains features expected to be found in real-world contexts. Aloft allows members of the research community to deploy their own self-adaptive controllers into the control loop of the drone to evaluate the effectiveness and robustness of controllers in a challenging environment. To demonstrate our system we provide a self-adaptive drone controller and operating scenario as an exemplar. The self-adaptive drone controller provided utilises a two-layered architecture with a MAPE-K feedback loop. The scenario is an inspection task during which we inject a communications failure. The aim of the controller is to detect this loss of communication and autonomously perform a return home behaviour. Limited battery life presents a constraint on the mission, which therefore means that the drone should complete its mission as fast as possible. Humans, however, might also be present within the environment. This poses a safety risk and the drone must be able to avoid collisions during autonomous flight. In this paper we describe the controller framework and the simulation environment and provide information on how a user might construct and evaluate their own controllers in the presence of disruptions at run-time.

"© {C. Imrie et al. | ACM} 2024. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in SEAMS'24, https://doi.org/10.1145/3643915.3644107"

Autonomous Emergency Triage Support System.

Ol’Tunde Ashaolu, William Lyons, Ioannis Stefanakos, Radu Calinescu, Ibrahim Habli, Victoria Hodge, Chiara Picardi, Katherine Plant and Beverley Townsend

Proceedings of the 2023 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 2023.

Conference Papers

Abstract

Medical staff shortages and growing healthcare demands due to an ageing population mean that many patients face delays in receiving critical care in the emergency departments (EDs) of hospitals worldwide. As such, the use of autonomous, robotics and AI technologies to help streamline the triage of ED patients is of utmost importance. In this paper, we present our ongoing work to develop an autonomous emergency triage support system intended to alleviate the current pressures faced by hospital emergency departments. By employing a combination of robotic and AI techniques, our solution aims to speed up the initial stages of ED triage. Its preliminary evaluation using synthetic patient datasets generated with ED medic input suggests that our solution has the potential to improve the ED triage process, supporting the timely and accurate delivery of patient care in emergency settings..

“© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”

Medical practitioner perspectives on AI in Emergency Triage.

Bev Townsend, Katherine Plant, Victoria Hodge, Ol'Tunde Ashaolu, and Radu Calinescu

Frontiers in Digital Health, 5:1297073. doi: 10.3389/fdgth.2023.1297073, Dec 2023.

Journal Articles

Abstract

Background: A proposed Diagnostic AI System for Robot-Assisted Triage (‘DAISY’) is under development to support Emergency Department (‘ED’) triage following increasing reports of overcrowding and shortage of staff in ED care experienced within National Health Service, England (‘NHS’) but also globally. DAISY aims to reduce ED patient wait times and medical practitioner overload.
Objective: The objective of this study was to explore NHS health practitioners’ perspectives and attitudes towards the future use of AI-supported technologies in ED triage.

Time to Die 2: Improved in-game death prediction in Dota 2.

Charles Ringer, Sondess Missaoui, Victoria Hodge, et al.

Machine Learning with Applications, 100466, Elsevier, April 2023.

Journal Articles

Abstract

Esports games are generally fast paced, and due to the virtual nature of these games, camera positioning can be limited. Therefore, knowing ahead of time where to position cameras, and what to focus a broadcast and associated commentary on, is a key challenge in esports reporting. This gives rise to moment-to-moment prediction within esports matches which can empower broadcasters to better observe and process esports matches. In this work we focus on this moment-to-moment prediction and in particular present techniques for predicting if a player will die within a set number of seconds for the esports title Dota 2. A player death is one of the most consequential events in Dota 2.

Sensors and Data in Mobile Robotics for Localisation.

Victoria Hodge

J. Wang (ed.), Encyclopedia of Data Science and Machine Learning, Chapter 133, (Hershey, PA: IGI Global), 2023, pp. 2223–2238.

Book Chapters

Abstract

The industrial robotics market is predicted to grow to USD 75.3 billion by 2026, at a rate of 12.3% per year. A key driver of this growth is Industry 4.0 digitization, often known as the next industrial (or data) revolution. Industry 4.0 digitization requires smart, flexible, and safe technologies including automation using robots in ever increasing numbers. Industry 4.0 needs autonomous mobile robots with intelligent navigation capabilities and needs to use big data processing techniques to allow these robots to navigate safely and flexibly. This article reviews the techniques used and challenges of one particular aspect of robot navigation: localisation. It focuses on robotic sensors and their data, and how information can be extracted to enable localisation.

Analysing Ultra-Wide Band Positioning for Geofencing in a Safety Assurance Context.

Victoria Hodge, Richard Hawkins, James Hilder, and Ibrahim Habli

ArXiv pre-print, arXiv:2203.05830, 2022. DOI: 10.48550/arXiv.2203.05830

ArXiv E–print

Abstract

There is a desire to move towards more flexible and automated factories. To enable this, we need to assure the safety of these dynamic factories. This safety assurance must be achieved in a manner that does not unnecessarily constrain the systems and thus negate the benefits of flexibility and automation. We previously developed a modular safety assurance approach, using safety contracts, as a way to achieve this. In this case study we show how this approach can be applied to Autonomous Guided Vehicles (AGV) operating as part of a dynamic factory and why it is necessary. We empirically evaluate commercial, indoor fog/edge localisation technology to provide geofencing for hazardous areas in a laboratory. The experiments determine how factors such as AGV speeds, tag transmission timings, control software and AGV capabilities affect the ability of the AGV to stop outside the hazardous areas. We describe how this approach could be used to create a safety case for the AGV operation.

A Survey of Horse Racing Opinions and Perceptions.

Henrietta Patterson and Victoria Hodge

SportRxiv, 2022. DOI: 10.51224/SRXIV.98

SportRxiv E–print

Abstract

With a global reach of 584 million households, horse racing is a globally important sport with 14 million potential UK customers. Although it is the UK’s second-most attended sport, attendances fell by 500,000+ from 2015 to 2019, with particular problems engaging and retaining younger audiences. This study focuses on the Millennial and Gen-Z demographics to discover why audiences show a reduced interest. We analyse the determinants underlying engagement using focus groups and a questionnaire. Our empirical results identify the key factors determining attendance and viewing. Horse racing is exciting and social but there are ethical concerns around horse injuries and horses’ fates. Concerns are far higher than for other competitive sports, and increase systematically as participants get younger. Participants would engage more if openness was increased with this willingness increasing as participants get younger. Horse racing lacks easily identifiable figures and there are concerns around betting, terminology and attendance costs.

Win Prediction in Multi-Player Esports: Live Professional Match Prediction.

Victoria Hodge, Sam Devlin, Nick Sephton, Florian Block, Peter Cowling, and Anders Drachen

IEEE Transactions on Games, 13(4), IEEE, December 2021, pp. 368–379.

Journal Articles

Abstract

Esports are competitive videogames watched by audiences. Most esports generate detailed data for each match that are publicly available. Esports analytics research is focused on predicting match outcomes. Previous research has emphasised pre-match prediction and used data from amateur games, which are more easily available than professional level. However, the commercial value of win prediction exists at the professional level. Furthermore, predicting real-time data is unexplored, as is its potential for informing audiences. Here we present the first comprehensive case study on live win prediction in a professional esport. We provide a literature review for win prediction in a multi-player online battle arena (MOBA) esport. The paper evaluates the first professional-level prediction models for live DotA 2 matches, one of the most popular MOBA games and trials it at a major international esports tournament. Using standard machine learning models, feature engineering and optimization, our model is up to 85\% accurate after 5 minutes of gameplay. Our analyses highlight the need for algorithm evaluation and optimization. Finally, we present implications for the esports/game analytics domains, describe commercial opportunities, practical challenges, and propose a set of evaluation criteria for research on esports win prediction.

Deep reinforcement learning for drone navigation using sensor data.

Victoria Hodge, Richard Hawkins and Rob Alexander.

Neural Computing and Applications, 33, Springer Nature, March 2021, pp. 2015–2033

Journal Articles

Abstract

Mobile robots such as unmanned aerial vehicles (drones) can be used for surveillance, monitoring and data collection in buildings, infrastructure and environments. The importance of accurate and multifaceted monitoring is well known to identify problems early and prevent them escalating. This motivates the need for flexible, autonomous and powerful decision-making mobile robots. These systems need to be able to learn through fusing data from multiple sources. Until very recently, they have been task specific. In this paper, we describe a generic navigation algorithm that uses data from sensors on-board the drone to guide the drone to the site of the problem. In hazardous and safety-critical situations, locating problems accurately and rapidly is vital. We use the proximal policy optimisation deep reinforcement learning algorithm coupled with incremental curriculum learning and long short-term memory neural networks to implement our generic and adaptable navigation algorithm. We evaluate different configurations against a heuristic technique to demonstrate its accuracy and efficiency. Finally, we consider how safety of the drone could be assured by assessing how safely the drone would perform using our navigation algorithm in real-world scenarios.

On the use of AI for Generation of Functional Music to Improve Mental Health.

Duncan Williams, Victoria Hodge, and Chia-Yu Wu.

Frontiers in Artificial Intelligence, vol. 3, Nov 2020.

Journal Articles

Abstract

Increasingly music has been shown to have both physical and mental health benefits including improvements in cardiovascular health, a link to reduction of cases of dementia in elderly populations, and improvements in markers of general mental well-being such as stress reduction. Here, we describe short case studies addressing general mental well-being (anxiety, stress-reduction) through AI-driven music generation. Engaging in active listening and music-making activities (especially for at risk age groups) can be particularly beneficial, and the practice of music therapy has been shown to be helpful in a range of use cases across a wide age range. However, access to music-making can be prohibitive in terms of access to expertize, materials, and cost. Furthermore the use of existing music for functional outcomes (such as targeted improvement in physical and mental health markers suggested above) can be hindered by issues of repetition and subsequent over-familiarity with existing material. In this paper, we describe machine learning approaches which create functional music informed by biophysiological measurement across two case studies, with target emotional states at opposing ends of a Cartesian affective space (a dimensional emotion space with points ranging from descriptors from relaxation, to fear). Galvanic skin response is used as a marker of psychological arousal and as an estimate of emotional state to be used as a control signal in the training of the machine learning algorithm. This algorithm creates a non-linear time series of musical features for sound synthesis “on-the-fly”, using a perceptually informed musical feature similarity model. We find an interaction between familiarity and perceived emotional response. We also report on subsequent psychometric evaluation of the generated material, and consider how these - and similar techniques - might be useful for a range of functional music generation tasks, for example, in nonlinear sound-tracking such as that found in interactive media or video games.

How the Business Model of Customisable Card Games Influences Player Engagement.

Victoria Hodge, Nick Sephton, Sam Devlin, Peter I. Cowling, Nikolaos Goumagias, Jianhua Shao, Kieran Purvis, Ignazio Cabras, Kiran Fernandes and Feng Li.

IEEE Transactions on Games, 11(4), IEEE, December 2019, pp. 374–385

Journal Articles

Abstract

In this article, we analyse the game play data of three popular customisable card games where players build decks prior to game play. We analyse the data from a player engagement perspective, how the business model affects players, how players influence the business model and provide strategic insights for players themselves. Sifa et al. found a lack of cross-game analytics while Marchand and Hennig-Thurau identified a lack of understanding of how a game's business model and strategies affect players. We address both issues. The three games have similar business models but differ in one aspect: the distribution model for the cards used in the game. Our longitudinal analysis highlights this variation's impact. A uniform distribution creates a spread of decks with slowly emerging trends while a random distribution creates stripes of deck building activity that switch suddenly each update. Our method is simple, easily understandable, independent of the specific game's structure and able to compare multiple games. It is applicable to games that release updates and enables comparison across games. Optimising a game's updates strategy is key as it affects player engagement and retention which directly influence businesses' revenues and profitability in the $95 billion global games market.

Time to Die: Death Prediction in Dota 2 using Deep Learning.

Adam Katona, Ryan Spick, Victoria Hodge, Simon Demediuk, Florian Block, Anders Drachen, and James Alfred Walker

Proceedings of IEEE Conference on Games (CoG 2019), London, UK, Aug. 20–23, 2019.

Conference Papers

Abstract

Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player hero being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window. The network is trained on a vast selection of Dota 2 gameplay features and professional/semi-professional level match dataset. Even though death events are rare within a game (1\% of the data), the model achieves 0.377 precision with 0.725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds. An example of the system applied to a Dota 2 match is presented. This model enables real-time micro-predictions of kills in Dota 2, one of the most played esports titles in the world, giving commentators and viewers time to move their attention to these key events.

Time to Die: Death Prediction in Dota 2 using Deep Learning.

Adam Katona, Ryan Spick, Victoria Hodge, Simon Demediuk, Florian Block, Anders Drachen, and James Alfred Walker

ArXiv e–prints, 1906.03939, 2019, (arXiv:1906.03939 [cs.LG])

ArXiv E–print

Abstract

Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player hero being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window. The network is trained on a vast selection of Dota 2 gameplay features and professional/semi-professional level match dataset. Even though death events are rare within a game (1\% of the data), the model achieves 0.377 precision with 0.725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds. An example of the system applied to a Dota 2 match is presented. This model enables real-time micro-predictions of kills in Dota 2, one of the most played esports titles in the world, giving commentators and viewers time to move their attention to these key events.

AI and Automatic Music Generation for Mindfulness.

Duncan Williams, Victoria Hodge, Lina Gega, Damian Murphy, Peter Cowling, and Anders Drachen

Audio Engineering Society: International Conference on Immersive and Interactive Audio, March 27–29, 2019, York, UK

Conference Papers

Abstract

This paper presents an architecture for the creation of emotionally congruent music using machine learning aided sound synthesis. Our system can generate a small corpus of music using Hidden Markov Models; we can label the pieces with emotional tags using data elicited from questionnaires. This produces a corpus of labelled music underpinned by perceptual evaluations. We then analyse participant’s galvanic skin response (GSR) while listening to our generated music pieces and the emotions they describe in a questionnaire conducted after listening. These analyses reveal that there is a direct correlation between the calmness/scariness of a musical piece, the users’ GSR reading and the emotions they describe feeling. From these, we will be able to estimate an emotional state using biofeedback as a control signal for a machine-learning algorithm, which generates new musical structures according to a perceptually informed musical feature similarity model. Our case study suggests various applications including in gaming, automated soundtrack generation, and mindfulness.

A Psychometric Evaluation of Emotional Responses to Horror Music.

Duncan Williams, Chia-Yu Wu, Victoria Hodge, Damian Murphy and Peter Cowling

Audio Engineering Society: 146th International Pro Audio Convention, March 20–23, 2019, Dublin, Ireland

Conference Papers

Abstract

This research explores and designs an effective experimental interface to evaluate people's emotional responses to horror music. We studied methodological approaches by using traditional psychometric techniques to measure emotional responses, including self-reporting, and galvanic skin response (GSR). GSR correlates with psychological arousal. It can help circumvent a problem in self-reporting where people are unwilling to report particular felt responses, or confuse perceived and felt responses. We also consider the influence of familiarity. Familiarity can induce learned emotional responses rather than listeners describing how it actually makes them feel. The research revealed different findings in self-reports and GSR data. Both measurements had an interaction between music and familiarity but show inconsistent results from the perspective of simple effects.

Narrative Bytes: Data-Driven Content Production in Esports.

Florian Block, Victoria Hodge, Stephen Hobson, Nick Sephton, Sam Devlin, Marian F. Ursu, Anders Drachen and Peter I. Cowling

Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video (TVX 18). ACM, New York, NY, USA, 29–41. DOI: https://doi.org/10.1145/3210825.3210833

Conference Papers

Abstract

Esports – video games played competitively that are broadcast to large audiences – are a rapidly growing new form of mainstream entertainment. Esports borrow from traditional TV, but are a qualitatively different genre, due to the high flexibility of content capture and availability of detailed gameplay data. Indeed, in esports, there is access to both real–time and historical data about any action taken in the virtual world. This aspect motivates the research presented here, the question asked being: can the information buried deep in such data, unavailable to the human eye, be unlocked and used to improve the live broadcast compilations of the events? In this paper, we present a large–scale case study of a production tool called Echo, which we developed in close collaboration with leading industry stakeholders. Echo uses live and historic match data to detect extraordinary player performances in the popular esport Dota 2, and dynamically translates interesting data points into audience–facing graphics. Echo was deployed at one of the largest yearly Dota 2 tournaments, which was watched by 25 million people. An analysis of 40 hours of video, over 46,000 live chat messages, and feedback of 98 audience members showed that Echo measurably affected the range and quality of storytelling, increased audience engagement, and invoked rich emotional response among viewers.

An Evaluation of Classification and Outlier Detection Algorithms.

Victoria Hodge and Jim Austin

ArXiv e–prints, 1805.00811, 2018, (arXiv:1805.00811 [stat.ML])

ArXiv E–print

Abstract

This paper evaluates algorithms for classification and outlier detection accuracies in temporal data. We focus on algorithms that train and classify rapidly and can be used for systems that need to incorporate new data regularly. Hence, we compare the accuracy of six fast algorithms using a range of well-known time-series datasets. The analyses demonstrate that the choice of algorithm is task and data specific but that we can derive heuristics for choosing. Gradient Boosting Machines are generally best for classification but there is no single winner for outlier detection though Gradient Boosting Machines (again) and Random Forest are better. Hence, we recommend running evaluations of a number of algorithms using our heuristics.

Win Prediction in Esports: Mixed–Rank Match Prediction in Multi-player Online Battle Arena Games.

Victoria Hodge, Sam Devlin, Nick Sephton, Florian Block, Anders Drachen, and Peter Cowling

ArXiv e–prints, 1711.06498, 2017, (arXiv:1711.06498 [cs.AI])

ArXiv E–print

Abstract

Esports has emerged as a popular genre for players as well as spectators, supporting a global entertainment industry. Esports analytics has evolved to address the requirement for data-driven feedback, and is focused on cyber-athlete evaluation, strategy and prediction. Towards the latter, previous work has used match data from a variety of player ranks from hobbyist to professional players. However, professional players have been shown to behave differently than lower ranked players. Given the comparatively limited supply of professional data, a key question is thus whether mixed-rank match datasets can be used to create data-driven models which predict winners in professional matches and provide a simple in-game statistic for viewers and broadcasters. Here we show that, although there is a slightly reduced accuracy, mixed-rank datasets can be used to predict the outcome of professional matches, with suitably optimized configurations.

Using Association Rule Mining to Predict Opponent Deck Content in Android: Netrunner.

N. Sephton, P. Cowling, S. Devlin, V. Hodge, and N. Slaven

Proceedings of IEEE Computational Intelligence and Games Conference (CIG 2016), Santorini, Greece, Sept. 20–23, 2016, pp. 102–109.

Conference Papers

Abstract

As part of their design, card games often include information that is hidden from opponents and represents a strategic advantage if discovered. A player that can discover this information will be able to alter their strategy based on the nature of that information, and therefore become a more competent opponent. In this paper, we employ association rule-mining techniques for predicting item multisets, and show them to be effective in predicting the content of Netrunner decks. We then apply different modifications based on heuristic knowledge of the Netrunner game, and show the effectiveness of techniques which consider this knowledge during rule generation and prediction.

A Conceptual Framework of Business Model Emerging Resilience.

N. Goumagias, K. Fernandes, I. Cabras, F. Li, J. Shao, S. Devlin, V. Hodge, P. Cowling and D. Kudenko

32nd European Group for Organization Studies (EGOS) Colloquium, Naples, Italy, July 7–9, 2016.

Conference Papers

Abstract

In this paper we introduce an environmentally driven conceptual framework of Business Model change. Business models acquired substantial momentum in academic literature during the past decade. Several studies focused on what exactly constitutes a Business Model (role model, recipe, architecture etc.) triggering a theoretical debate about the Business Model’s components and their corresponding dynamics and relationships. In this paper, we argue that for Business Models as cognitive structures, are highly influenced in terms of relevance by the context of application, which consequently enriches its functionality. As a result, the Business Model can be used either as a role model (benchmarking) or a recipe (strategy). For that purpose, we assume that the Business Model is embedded within the economic (task) environment, and consequently affected by it. Through a typology of the environmental impact on the Business Model productivity, we introduce a conceptual framework that aims to capture the salient features of Business Model emergent resilience as reaction to two types impact: productivity constraining and disturbing.

A Strategic Roadmap for Business Model Change for the Video-games Industry.

N. Goumagias, K. Purvis, K. Fernandes, I. Cabras, F. Li, J. Shao, S. Devlin, V. Hodge, P. Cowling and D. Kudenko

R&D Management Conference, Cambridge, UK, July 3–6, 2016.

Conference Papers

Abstract

The global video games industry has experienced and exponential growth in terms of socioeconomic impact during the last 50 years. Surprisingly, little academic interest is directed towards the industry, particularly in the context of BM Change. As a technologically intensive creative industry, developing studios and publishers experience substantial internal and external forces to identify, and sustain, their competitive advantage. To achieve that, managers are called to systematically explore and exploit, alternative BMs that are compatible with the company’s strategy. We build on empirical analysis of the video–games industry to construct a Toolkit that i) will help practitioners and academics to describe the industrial ecosystem of BMs more accurately, and ii) use it a strategic roadmap for managers to navigate through alternatives for entrepreneurial and growth purposes.

A Hadoop Neural Network for Parallel and Distributed Feature Selection.

Victoria Hodge, Simon O’Keefe and Jim Austin

Neural Networks, 78, Elsevier, June 2016, pp. 24–35.

Journal Articles

Abstract

In this paper, we introduce a theoretical basis for a Hadoop–based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative–memory neural network in Hadoop.

A Digital Repository and Execution Platform for Interactive Scholarly Publications in Neuroscience.

Victoria Hodge, Mark Jessop, Martyn Fletcher, Michael Weeks, Aaron Turner, Tom Jackson, Colin Ingram, Leslie Smith and Jim Austin.

NeuroInformatics, 14(1), Springer, January 2016, pp. 23–40.

Journal Articles

Abstract

The CARMEN Virtual Laboratory (VL) is a cloud-based platform which allows neuroscientists to store, share, develop, execute, reproduce and publicise their work. This paper describes new functionality in the CARMEN VL: an interactive publications repository. This new facility allows users to link data and software to publications. This enables other users to examine data and software associated with the publication and execute the associated software within the VL using the same data as the authors used in the publication. The cloud–based architecture and SaaS (Software as a Service) framework allows vast data sets to be uploaded and analysed using software services. Thus, this new interactive publications facility allows others to build on research results through reuse. This aligns with recent developments by funding agencies, institutions, and publishers with a move to open access research. Open access provides reproducibility and verification of research resources and results. Publications and their associated data and software will be assured of long–term preservation and curation in the repository. Further, analysing research data and the evaluations described in publications frequently requires a number of execution stages many of which are iterative. The VL provides a scientific workflow environment to combine software services into a processing tree. These workflows can also be associated with publications and executed by users. The VL also provides a secure environment where users can decide the access rights for each resource to ensure copyright and privacy restrictions are met.

Wireless Sensor Networks for Condition Monitoring in the Railway Industry: a Survey.

Victoria Hodge, Simon O’Keefe, Michael Weeks and Anthony Moulds

IEEE Transactions on Intelligent Transportation Systems, 16(3), IEEE, June 2015, pp. 1088–1106

Journal Articles

Abstract

In recent years, the range of sensing technologies has expanded rapidly, whereas sensor devices have become cheaper. This has led to a rapid expansion in condition monitoring of systems, structures, vehicles, and machinery using sensors. Key factors are the recent advances in networking technologies such as wireless communication and mobile ad hoc networking coupled with the technology to integrate devices. Wireless sensor networks (WSNs) can be used for monitoring the railway infrastructure such as bridges, rail tracks, track beds, and track equipment along with vehicle health monitoring such as chassis, bogies, wheels, and wagons. Condition monitoring reduces human inspection requirements through automated monitoring, reduces maintenance through detecting faults before they escalate, and improves safety and reliability. This is vital for the development, upgrading, and expansion of railway networks. This paper surveys these wireless sensors network technology for monitoring in the railway industry for analyzing systems, structures, vehicles, and machinery. This paper focuses on practical engineering solutions, principally, which sensor devices are used and what they are used for; and the identification of sensor configurations and network topologies. It identifies their respective motivations and distinguishes their advantages and disadvantages in a comparative review.

Short–Term Prediction of Traffic Flow Using a Binary Neural Network.

Victoria Hodge, Rajesh Krishnan, Jim Austin, John Polak and Tom Jackson

Neural Computing and Applications, 25(7–8), Springer, December 2014, pp. 1639–1655

Journal Articles

Abstract

This paper introduces a binary neural network–based prediction algorithm incorporating both spatial and temporal characteristics into the prediction process. The algorithm is used to predict short–term traffic flow by combining information from multiple traffic sensors (spatial lag) and time series prediction (temporal lag). It extends previously developed Advanced Uncertain Reasoning Architecture (AURA) k–nearest neighbour (k–NN) techniques. Our task was to produce a fast and accurate traffic flow predictor. The AURA k–NN predictor is comparable to other machine learning techniques with respect to recall accuracy but is able to train and predict rapidly. We incorporated consistency evaluations to determine whether the AURA k–NN has an ideal algorithmic configuration or an ideal data configuration or whether the settings needed to be varied for each data set. The results agree with previous research in that settings must be bespoke for each data set. This configuration process requires rapid and scalable learning to allow the predictor to be set-up for new data. The fast processing abilities of the AURA k–NN ensure this combinatorial optimisation will be computationally feasible for real–world applications. We intend to use the predictor to proactively manage traffic by predicting traffic volumes to anticipate traffic network problems.

Outlier Detection in Big Data.

Victoria Hodge

J. Wang (ed.), Encyclopedia of Business Analytics and Optimization, Chapter 157, (Hershey, PA: IGI Global), 2014, pp. 1762–1771.

Book Chapters

Abstract

Outlier detection (or anomaly detection) is a fundamental task in data mining. Outliers are data that deviate from the norm and outlier detection is often compared to “finding a needle in a haystack.” However, the outliers may generate high value if they are found, value in terms of cost savings, improved efficiency, compute time savings, fraud reduction and failure prevention. Detection can identify faults before they escalate with potentially catastrophic consequences. Big Data refers to large, dynamic collections of data. These vast and complex data appear problematic for traditional outlier detection methods to process but, Big Data provides considerable opportunity to uncover new outliers and data relationships. This chapter highlights some of the research issues for outlier detection in Big Data and covers the solutions used and research directions taken along with an analysis of some current outlier detection approaches for Big Data applications.

A HADOOP–Based Framework for Parallel and Distributed Feature Selection.

Victoria Hodge, Tom Jackson and Jim Austin

Technical Report YCS–2013–485, Department of Computer Science, University of York, UK, Sept. 2013.

Tech Reports

Abstract

In this paper, we introduce a theoretical basis for a Hadoop-based framework for parallel and distributed feature selection. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of four feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop MapReduce. Hadoop allows parallel and distributed processing so each feature selector can be processed in parallel and multiple feature selectors can be processed together in parallel allowing multiple feature selectors to be compared. We identify commonalities among the four features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all four feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative–memory neural network in Hadoop.

A Survey of Outlier Detection Methodologies..

Victoria Hodge and Jim Austin

S. Babones (Ed.), Fundamentals of Regression Modeling, SAGE Publications, 2013. ISBN: 9781446208281

Chapter reprint of original work from 2004.

Book Chapters

Abstract

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.

A metric for pattern–matching applications to traffic management.

Richard Mounce, Garry Hollier, Mike Smith, Victoria Hodge, Tom Jackson and Jim Austin

Transportation Research Part C: Emerging Technologies, 29, Elsevier Science, April 2013, pp. 148–155.

Journal Articles

Abstract

This paper considers signal plan selection; the main topic is the design of a system for utilising pattern matching to assist the timely selection of sound signal control plan changes. In this system, historical traffic flow data is continually searched, seeking traffic flow patterns similar to today’s. If, in one of these previous similar situations, (a) the signal plan utilised was different to that being utilised today and (b) it appears that the performance achieved was better than the performance likely to be achieved today, then the system recommends an appropriate signal plan switch. The heart of the system is “similarity”. Two traffic flow patterns (two time series of traffic flows arising from two different days) are said to be “similar” if the distance between them is small; similarity thus depends on how the metric or distance between two time series of traffic flows is defined. A simple example is given which suggests that utilising the standard Euclidean distance between the two sequences comprising cumulatives of traffic flow may be better than utilising the standard Euclidean distance between the original two sequences of traffic flow data. The paper also gives measured on–street public transport benefits which have arisen from using a simple rule–based (traffic–responsive) signal plan selection system, compared with a time–tabled signal plan selection system.

The CARMEN software as a service infrastructure.

Michael Weeks, Mark Jessop, Martyn Fletcher, Victoria Hodge, Tom Jackson and Jim Austin

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1983), 2013

Journal Articles

Abstract

The CARMEN platform allows neuroscientists to share data, metadata, services and workflows, and to execute these services and workflows remotely via a Web portal. This paper describes how we implemented a service-based infrastructure into the CARMEN Virtual Laboratory. A Software as a Service framework was developed to allow generic new and legacy code to be deployed as services on a heterogeneous execution framework. Users can submit analysis code typically written in Matlab, Python, C/C++ and R as non–interactive standalone command–line applications and wrap them as services in a form suitable for deployment on the platform. The CARMEN Service Builder tool enables neuroscientists to quickly wrap their analysis software for deployment to the CARMEN platform, as a service without knowledge of the service framework or the CARMEN system. A metadata schema describes each service in terms of both system and user requirements. The search functionality allows services to be quickly discovered from the many services available. Within the platform, services may be combined into more complicated analyses using the workflow tool. CARMEN and the service infrastructure are targeted towards the neuroscience community; however, it is a generic platform, and can be targeted towards any discipline.

A Binary Neural Network Framework for Attribute Selection and Prediction.

Victoria Hodge, Tom Jackson and Jim Austin

Proceedings of the 4th International Conference on Neural Computation Theory and Applications (NCTA 2012), Barcelona, Spain, October 5–7, 2012, pp. 510–515.

Conference Papers

Abstract

In this paper, we introduce an implementation of the attribute selection algorithm, Correlation-based Feature Selection (CFS) integrated with our k–nearest neighbour (k–NN) framework. Binary neural networks underpin our k–NN and allow us to create a unified framework for attribute selection, prediction and classification. We apply the framework to a real world application of predicting bus journey times from traffic sensor data and show how attribute selection can both speed our k–NN and increase the prediction accuracy by removing noise and redundant attributes from the data.

Enhancing YouShare: the online collaboration research environment for sharing data and services.

Victoria Hodge, Aaron Turner, Martyn Fletcher, Mark Jessop, Michael Weeks, Tom Jackson and Jim Austin

Presented at, Digital Research 2012, Oxford, UK, September 10–12, 2012.

Conference Papers

Abstract

This paper describes recent enhancements to the YouShare platform, the online collaboration environment, which allows researchers to share data and software applications and perform compute–intensive analysis tasks quickly and securely. The enhancements to the platform are a result of user feedback on the current system and technology advancements. These fall into four groups – better handling of searching, use of synonyms, the addition of a workflow tool and enhancements to the infrastructure. The paper outlines these improvements.

Discretisation of Data in a Binary Neural k–Nearest Neighbour Algorithm.

Victoria Hodge and Jim Austin

Technical Report YCS–2012–473, Department of Computer Science, University of York, UK, June 2012.

Tech Reports

Abstract

This paper evaluates several methods of discretisation (binning) within a k–Nearest Neighbour predictor. Our k–NN is constructed using binary neural networks which require continuous-valued data to be discretised to allow it to be mapped to the binary neural framework. Our approach uses discretisation coupled with robust encoding to map data sets onto the binary neural network. In this paper, we compare seven unsupervised discretisation methods for retrieval accuracy (prediction accuracy) across a range of well–known prediction data sets comprising time–series data. We analyse whether there is an optimal discretisation configuration for our k–NN. The analyses demonstrate that the configuration is data specific. Hence, we recommend running evaluations of a number of configurations, varying both the discretisation methods and the number of discretisation bins, using a test data set. This evaluation will pinpoint the optimum configuration for new data sets.

Intelligent Decision Support using Pattern Matching.

Victoria Hodge, Tom Jackson and Jim Austin

Proceedings of the 1st International Workshop on Future Internet Applications for Traffic Surveillance and Management (FIATS–M 2011), Sofia, Bulgaria, October 2011, pp. 44–54.

Conference Papers

Abstract

The aim of our work is to develop Intelligent Decision Support (IDS) tools and techniques to convert traffic data into intelligence to assist network managers, operators and to aid the travelling public. The IDS system detects traffic problems, identifies the likely cause and recommends suitable interventions which are most likely to mitigate congestion of that traffic problem. In this paper, we propose to extend the existing tools to include dynamic hierarchical and distributed processing; algorithm optimisation using natural computation techniques; and, using a meta–learner to short–circuit the optimisation by learning the best settings for specific data set characteristics and using these settings to initialise the GA.

Outlier and Anomaly Detection: A Survey of Outlier and Anomaly Detection Methods

Victoria Hodge

Lambert Academic Publishing | 2011 | ISBN: 978–3–8465–4822–6.

Books

Abstract

An outlier or anomaly is a data point that is inconsistent with the rest of the data population. Outlier or anomaly detection has been used for centuries to detect and remove anomalous observations from data. It is used to monitor vital infrastructure such as utility distribution networks, transportation networks, machinery or computer networks for faults. Detection can identify faults before they escalate with potentially catastrophic consequences. Today, principled and systematic detection techniques are used, drawn from the full gamut of Computer Science and Statistics. The book forms a survey of techniques covering statistical, proximity–based, density–based, neural, natural computation, machine learning, distributed and hybrid systems. It identifies their respective motivations and distinguishes their advantages and disadvantages in a comparative review. It aims to provide the reader with a feel of the diversity and multiplicity of techniques available. The survey should be useful to advanced undergraduate and postgraduate computer and library/information science students and researchers analysing and developing outlier and anomaly detection systems.

Cumulatives and Errors in Pattern Matching for Intelligent Transport Systems.

Garry Hollier, Mike Smith, Victoria Hodge and Jim Austin

Proceedings of 2nd International Conference on Models and Technologies for Intelligent Transportation Systems, Leuven, Belgium, June 22–24, 2011.

Conference Papers

Abstract

Pattern recognition often relies on the distance between patterns, and the Euclidean distance is frequently used. In traffic studies, the sequence of flow cumulatives is more directly related to the character of the flow (e.g., congested or free), and might be expected to supply more meaningful traffic information than would a sequence of unsummed flows. However, if there is noise in the detection procedure, an error in the first count of a sequence would be carried forward to subsequent cumulatives, and would therefore, like errors in general, have a greater effect on the Euclidean distance between cumulatives than it would on that between the raw flows. This paper aims at providing a quantitative measure of the classification errors caused by noise on the Euclidean distance between sequences of cumulatives compared to those arising from distance calculations between raw sequences, and thus supplying a guideline for the situation when cumulatives or raw flows are to used in the presence of noise.

Splitting Rate Modelling for Intelligent Transport Systems.

Mike Smith, Richard Mounce, Garry Hollier, Victoria Hodge and Jim Austin

Proceedings of 2nd International Conference on Models and Technologies for Intelligent Transportation Systems, Leuven, Belgium, June 22–24, 2011.

Conference Papers

Abstract

This paper considers models of within day and day-to-day driver (or traveller) decisions; and focusses on a splitting rate method of modelling these decisions sequentially. The paper begins with a dynamical system proposed in Smith (1984), using route-swapping, and shows how this same system may be applied within a splitting rate model. Some stability and convergence results are to be presented. Finally the paper suggests how signal control may be introduced in a simple and helpful way. This combined routeing / signal control model may be utilized quite easily to yield initial “suggested” signal control interventions corresponding to specific incidents.

Short–Term Traffic Prediction Using a Binary Neural Network.

Victoria Hodge, Rajesh Krishnan, Tom Jackson, Jim Austin and John Polak

43rd Annual UTSG Conference, Open University, Milton Keynes, UK, January 5–7, 2011

Conference Papers

Abstract

This paper presents a binary neural network algorithm for short-term traffic flow prediction. The algorithm can process both univariate and multivariate data from a single traffic sensor using time series prediction (temporal lags) and can combine information from multiple traffic sensors with time series prediction (spatial–temporal lags). The algorithm provides Intelligent Decision Support (IDS) for road network managers to proactively manage problems on the network as the predictions generated may be used to determine if traffic control interventions need to be applied. The algorithm can operate in near-real-time and dynamically; using data from UTC or UTMC systems. It is based on the Advanced Uncertain Reasoning Architecture (AURA) k–nearest neighbour prediction algorithm, which is designed for scalability and fast performance. The AURA k–NN predictor outperforms other machine learning techniques with respect to prediction accuracy and is able to train and predict rapidly. The basic AURA k–NN time series prediction algorithm was extended by incorporating average daily profiles and variable weighting into the prediction in this paper. The average daily profile of a variable is calculated as the average reading of the variable for a particular time of day and day of the week after removing outliers. When data vectors are matched in the AURA k–NN, the daily profile adds an extra dimension to the match. This process was further enhanced by weighting the profile using variable weighting to vary the profile’s significance. It is shown that incorporating these two additional aspects improves the accuracy of the prediction compared to the standard AURA k–NN, resulting in a very fast and accurate traffic prediction tool.

Integrating Information Retrieval with Artificial Neural Networks: Implementing a Modular Information Retrieval System using Artificial Neural Networks

Victoria Hodge

Lambert Academic Publishing | 2010 | ISBN: 978–3–8433–7966–3.

Books

Abstract

Information Retrieval (IR) is a field of computer science investigating the automated storage and retrieval of information, particularly documents. The amount of stored information has expanded rapidly and now, vast repositories of information on almost every conceivable subject are available to be searched. Effective IR involves: understanding the needs of users; handling the vagaries and ambiguities of language and human errors; and developing an efficient and accurate storage and search system. This book provides an analysis of IR techniques and systems assessing their strengths and weaknesses. This analysis then provides the motivation for the proposed system of three integrated modules: a novel spell checker based on a binary neural network; a thesaurus generated from a dynamic growing neural network; and, an efficient word–to–document index. The book provides a detailed description of the implementation and evaluation of the proposed system. The IR analyses and system development should be useful to advanced undergraduate and postgraduate computer and library/information science students and researchers analysing and developing IR systems.

Intelligent Decision Support for Traffic Management.

Rajesh Krishnan, Victoria Hodge, Jim Austin, John Polak, Tom Jackson, Mike Smith and Tzu–Chang Lee

Proceedings of 17th ITS World Congress: (CD–ROM), Busan, Korea, October 25–29, 2010.

Conference Papers

Abstract

Urban traffic control systems such as the widely deployed SCOOT system, incrementally respond to changing traffic conditions. Such systems are often complemented by traffic control centres where road network managers intervene manually to mitigate rapidly developing congestion events. An Intelligent Decision Support (IDS) system developed by the authors within the UK FREEFLOW (FF) project to aid the network managers is presented in this paper. The primary objective of the FF IDS system is to identify traffic congestion in near-real-time and to recommend appropriate traffic control intervention measures. The FF–IDS consists of multiple internal components. A state estimation component monitors live traffic sensor data and determines if there is a congestion problem on the road network. If a problem is identified, a binary neural pattern-matching component is used to identify past time periods with similar congestion events. This is able to rapidly search large historic traffic datasets finding sets of traffic control interventions carried out during similar historical time periods. The effectiveness of each intervention is evaluated using a Performance Index (PI) and the intervention that resulted in the highest improvement in PI is recommended to network managers. The FF-IDS system can also present traffic incidents and equipment faults that occurred during these historical time periods to the network manager as potential causes of the problem. This paper describes the FF–IDS system in detail. The system is currently under development. An early version of the FF–IDS system was trialled using off-line data from London, yielding encouraging preliminary results.

AURA–Alert: The use of binary associative memories for condition monitoring applications.

Jim Austin, Grant Brewer, Tom Jackson and Victoria Hodge

Proceedings of 7th International Conference on Condition Monitoring and Machinery Failure Prevention Technologies: (CM 2010 and MFPT 2010), Stratford–upon–Avon, England, June 22–24, 2010, pp. 699–711.

Conference Papers

Abstract

Many Condition Monitoring (CM) domains are suffering from the dual challenges of substantial increases in the volumes of data being produced and collected by sensing systems, and the challenges of modelling increasing complexity in the remote monitored systems. These two issues give rise to the problem that fast and reliable data mining of CM data is a computationally demanding task for real–time (or near real–time) applications. We present the use of AURA [1], a class of binary associative network built on correlation matrix memories (CMMs), as an underpinning technology for efficient, scalable pattern recognition in complex and large scale CM applications. AURA is a class of binary neural network. However, it has a number of advantages over standard neural network techniques for CM pattern classification tasks. These include; high levels of data compression, one-pass training for on–line training, a scalable architecture that can be readily mapped onto high performance computing platforms, and a sound theoretical basis to determine the bounds of the system operation. We describe applications illustrating how the AURA system can be optimised to create an extremely efficient and scalable k–nearest neighbour classifier for multi–variate models. We will also illustrate how the one-pass training capability of the AURA system can be used as the basis of normality and exception modelling in complex CM systems. This latter application has particularly powerful advantages for fault detection models in domains which are characterised by highly dynamic trends or drifting in the standard operational mode of a system, and which, as a result, are extremely difficult to accurately model. The application of the AURA techniques will be illustrated with industry led exemplars in the transport and energy sectors.

A computationally efficient method for online identification of traffic incidents and network equipment failures.

Victoria Hodge, Rajesh Krishnan, Jim Austin and John Polak.

Transport Science and Technology Congress: TRANSTEC 2010, Delhi, April 4–7, 2010.

Conference Papers

Abstract

Despite the vast wealth of traffic data available, currently there is only limited integration, analysis and utilisation of data in the transport domain. Yet, accurate congestion and incident detection is vital for traffic network operators to allow them to mitigate the cost of traffic incidents. Recurrent (cyclical) traffic congestion tends to be managed using timetabled control measures or through the use of adaptive traffic control systems such as SCOOT and SCATS. However, for non-recurrent congestion with rapid onset, such as the congestion caused by a traffic incident or traffic equipment failure, traffic network operators have to quickly detect the problem and then determine the likely cause before selecting the most appropriate action to both manage the traffic network and mitigate the congestion. This is a complex task requiring specialist knowledge where assistance from automated tools will help facilitate the operator tasks. Automated detection is becoming an increasingly viable option due to the increased use of traffic sensors in the road network. Therefore, the aim of the FREEFLOW project is to provide an Intelligent Decision Support (IDS) tool which is designed to complement existing fixed-time traffic control systems and adaptive systems SCOOT and SCATS. IDS will use traffic sensor data to rapidly identify traffic problems, recommend appropriate interventions that worked in the past for similar problems and assist the traffic network operators to pinpoint the cause of the problem. Recommendations will be displayed to the network operator who will use this knowledge to select the most appropriate course of action. This paper describes and analyses the components of the IDS tool used for identifying incidents and faulty equipment.

On Identifying Spatial Traffic Patterns using Advanced Pattern Matching Techniques.

Rajesh Krishnan, Victoria Hodge, Jim Austin, John Polak and Tzu–Chang Lee.

Proceedings of Transportation Research Board (TRB) 89th Annual Meeting, Washington, D.C., January 10–14, 2010. (DVD–ROM: 2010 TRB 89th Annual Meeting: Compendium of Papers).

Conference Papers

Abstract

The k-nearest neighbor algorithm (k–NN) has been used in the literature for traffic state estimation and prediction over the last decade or so. A number of such multivariate methods use input data from more than one traffic sensor. While a significant amount of discussion can be found in the literature aiming towards optimising the parameters of the k–NN for better accuracy of such models, limited research is available on configuring the k–NN to differentiate between different spatial patterns in the multivariate models. This paper presents an approach to distinguish spatial patterns from one another reliably in traffic variables observed using a number of point–based sensors in a neighbourhood of road links. The application of the proposed approach is demonstrated using AURA, a fast binary pattern matching tool based on neural networks. Two different spatial patterns of traffic congestion plus non–congested situations are simulated using a PARAMICS micro–simulation model. The AURA software is used to identify similar time periods of congestion using data from a congested time period as input using conventional and proposed distance metrics. It is shown that the proposed distance metrics can identify different spatial congestion patterns better than conventional methods. This method will be useful for traffic estimation and prediction methods that use the k–nearest neighbor algorithm or its variants.

A Computationally Efficient Method for Online Identification of Traffic Control Intervention Measures.

Rajesh Krishnan, Victoria Hodge, Jim Austin and John Polak

42nd Annual UTSG Conference, Centre for Sustainable Transport, University of Plymouth, UK: January 5–7, 2010

Conference Papers

Abstract

Adaptive traffic control systems such as SCOOT and SCATS are designed to respond to changes in traffic conditions and provide heuristically optimised traffic signal settings. However, these systems make gradual changes to signal settings in response to changing traffic conditions. In the EPSRC and TSB funded FREEFLOW project, a tool is being designed to rapidly identify severe traffic problems using traffic sensor data and recommend traffic signal plans and UTC parameters that have worked well in the past under similar traffic conditions for immediate implementation. This paper will present an overview of this tool, called the Intelligent Decision Support (IDS),that is designed to complement adaptive traffic control systems. The IDS is essentially a learning based system. It requires an historic database of traffic sensor data and traffic control intervention data for the application area as a knowledge base. The IDS, when deployed online, will monitor traffic sensor data to determine if the network is congested using traffic state estimation models. When IDS identifies congestion in the network, the historic database is queried for similar congestion events, where the similarity is based on both the severity and the spatial pattern of congestion. Traffic control interventions implemented during similar congestion events in the historic database are then evaluated for their effectiveness to mitigate congestion. The most effective traffic control interventions are recommended by IDS for implementation, along with an associated confidence indicator. The IDS is designed to work online against large historic datasets, and is based on traffic state estimation models developed at Imperial College London and pattern matching tools developed at the University of York. The IDS is tested offline using Inductive Loop Detector (ILD) data obtained from the ASTRID system and traffic control intervention data obtained from the UTC system at Transport for London (TfL) during its development. This paper presents the preliminary results using TfL data and outlines future research avenues in the development of IDS.

Data, Intelligent Decision Support and Pattern Matching.

Victoria Hodge, Mike Smith and Jim Austin

Procs of the XIII Meeting of the Euro Working Group on Transportation (EWGT’2009), Advances in Transportation Systems Analysis, Padua, Italy, Sept. 23–25, 2009.

Conference Papers

Abstract

Although a substantial amount of research has examined the constructs of warmth and competence, far less has examined how these constructs develop and what benefits may accrue when warmth and competence are cultivated. Yet there are positive consequences, both emotional and behavioral, that are likely to occur when brands hold perceptions of both. In this paper, we shed light on when and how warmth and competence are jointly promoted in brands, and why these reputations matter.

Optimising Activation of Bus Pre-signals.

Victoria Hodge, Tom Jackson and Jim Austin

Models and Technologies for Intelligent Transportation Systems, Proceedings of the International Conference: (G. Fusco, Ed.), Rome, June 22–23, 2009, pp. 344–353

Conference Papers

Abstract

This report describes preliminary analysis of strategies to activate and deactivate a bus pre-signal using vehicle count data. The bus pre–signal currently operates during preset times to regulate access to a length of road controlled at the other end by vehicle-actuated traffic signals. However, vehicle flows at the pre–signal vary on a daily basis so a more demand-based approach would be more effective. There has been much research performed to optimise pre–signal cycle times and bus priority at pre–signals. We focus on identifying the optimal strategy to activate and deactivate the bus pre–signal using vehicle demand rather than the current fixed time strategy. The ideal strategy should be stable, robust, consistent and timely. We investigate strategies using vehicle counts, queueing theory and estimation and prediction. Our recommended strategy combines aspects of all three areas.

Intelligent Car Park Routeing for Road Traffic.

Victoria Hodge, Mike Smith and Jim Austin

Models and Technologies for Intelligent Transportation Systems, Proceedings of the International Conference: (G. Fusco, Ed.), Rome, June 22–23, 2009, pp. 344–353

Conference Papers

Abstract

The twin problems of congestion and inner-city parking limitations affect many cities. One solution is to promote the use of Park and Ride sites. However, for effective use of the sites, drivers need to know where the sites are and which is the "best" site to use. This work introduces a methodology to pinpoint and guide drivers to the best Park and Ride site from their current location. While drivers may be able to obtain traffic, car park location and free space data individually, the information is not usually coordinated. By fusing up–to–date details of traffic jams, roadworks and accidents coupled with free parking spaces and combining this with a novel route weighting methodology, we are able to ensure that intelligent information is displayed to guide drivers. The method uses optimised data structures and proprietary scoring measures to ensure it is fast and accurate. The method provides a simple and low cost solution through the use of existing technologies to display information to drivers.

A Binary Neural Shape Matcher using Johnson Counters and Chain Codes.

Victoria Hodge, Simon O’Keefe and Jim Austin

NeuroComputing, 72(2009), Elsevier Science, 2009, pp. 693–703.

Journal Articles

Abstract

In this paper, we introduce a neural network-based shape matching algorithm that uses Johnson Counter codes coupled with chain codes. Shape matching is a fundamental requirement in content–based image retrieval systems. Chain codes describe shapes using sequences of numbers. They are simple and flexible. We couple this power with the efficiency and flexibility of a binary associative–memory neural network. We focus on the implementation details of the algorithm when it is constructed using the neural network. We demonstrate how the binary associative–memory neural network can index and match chain codes where the chain code elements are represented by Johnson codes.

Identifying Perceptual Structures In Trademark Images.

Victoria Hodge, Garry Hollier, Jim Austin and John Eakins

Proceedings of Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications (SPPRA 2008), Innsbruck, Austria, February 13–15, 2008.

Conference Papers

Abstract

In this paper we focus on identifying image structures at different levels in figurative (trademark) images to allow higher level similarity between images to be inferred. To identify image structures at different levels, it is desirable to be able to achieve multiple views of an image at different scales and then extract perceptually–relevant shapes from the different views. The three aims of this work are: to generate multiple views of each image in a principled manner, to identify structures and shapes at different levels within images and to emulate the Gestalt principles to guide shape finding. The proposed integrated approach is able to meet all three aims.

Inducing a Perceptual Relevance Shape Classifier.

Victoria Hodge, John Eakins and Jim Austin

Proceedings of ACM CIVR 2007: The 6th International Conference on Image and Video Retrieval, University of Amsterdam, The Netherlands, July 9–11 2007.

Conference Papers

Abstract

In this paper, we develop a system to classify the outputs of image segmentation algorithms as perceptually relevant or perceptually irrelevant with respect to human perception. The work is aimed at figurative images. We previously investigated human visual perception of trademark images and established a body of ground truth data in the form of trademark images and their respective human segmentations. The work indicated that there is a core set of segmentations for each image that people perceive. Here we use this core set of segmentations to train a classifier to classify closed shapes output from an image segmentation algorithm so that the method returns the image segments that match those produced by people. We demonstrate that a perceptual relevance classifier is attainable and identify a good methodology to achieve this. The paper compares MLP, SVM, Bayes and regression classifiers for classifying shapes. MLPs perform best with an overall accuracy of 96.4%.

Layout Indexing of Trademark Images.

Reinier H. van Leuken, M. Fatih Demirci, Victoria Hodge, Jim Austin and Remco C. Veltkamp

Proceedings of ACM CIVR 2007: The 6th International Conference on Image and Video Retrieval, University of Amsterdam, The Netherlands, July 9–11 2007.

Conference Papers

Abstract

Ensuring the uniqueness of trademark images and protecting their identities are the most important objectives for the trademark registration process. To prevent trademark infringement, each new trademark must be compared to a database of existing trademarks. Given a newly designed trademark image, trademark retrieval systems are not only concerned with finding images with similar shapes but also locating images with similar layouts. Performing a linearsearch, i.e., computing the similarity between the query and each database entry and selecting the closest one, is ineffi- cient for large database systems. An effective and efficient indexing mechanism is, therefore, essential to select a small collection of candidates. This paper proposes a framework in which a graph-based indexing schema will be applied to facilitate efficient trademark retrieval based on spatial relations between image components, regardless of mutual shape similarity. Our framework starts by segmenting trademark images into distinct shapes using a shape identification algorithm. Identified shapes are then encoded automatically into an attributed graph whose vertices represent shapes and whose edges show spatial relations (both directional and topological) between the shapes. Using a graph–based indexing schema, the topological structure of the graph as well as that of its subgraphs are represented as vectors in which the components correspond to the sorted Laplacian eigenvalues of the graph or subgraphs. Having established the signatures, the indexing amounts to a nearest neighbour search in a model database. For a query graph and a large graph data set, the indexing problem is reformulated as that of fast selection of candidate graphs whose signatures are close to the query signature in the vector space. An extensive set of recognition trials, including a comparison with manually constructed graphs, show the efficacy of both the automatic graph construction process and the indexing schema.

A Binary Neural Shape Matcher using Johnson Counters and Chain Codes.

Victoria Hodge, Simon O’Keefe and Jim Austin

In Second International Conference on Brain Inspired Cognitive Systems 2006 (BICS 2006), Island of Lesvos, Greece. October 10–14, 2006.

Conference Papers

Abstract

In this paper, we introduce a neural network-based shape matching algorithm that uses Johnson Counter codes coupled with chain codes. Shape matching is a fundamental requirement in content-based image retrieval systems. Chain codes describe shapes using sequences of numbers. They are simple and flexible. We couple this power with the efficiency and flexibility of a binary associative–memory neural network. We focus on the implementation details of the algorithm when it is constructed using the neural network. We demonstrate how the binary associative–memory neural network can index and match chain codes where the chain code elements are represented by Johnson codes.

A Binary Neural Decision Table Classifier.

Victoria Hodge, Simon O’Keefe and Jim Austin

NeuroComputing, 69(16–18), October, 2006, pp. 1850–1859.

Journal Articles

Abstract

In this paper, we introduce a neural network–based decision table algorithm. We focus on the implementation details of the decision table algorithm when it is constructed using the neural network. Decision tables are simple supervised classifiers which, Kohavi demonstrated, can outperform state–of–the–art classifiers such as C4.5. We couple this power with the efficiency and flexibility of a binary associative–memory neural network. Initially, we demonstrate how the binary associative-memory neural network can form the decision table index to map between attribute values and data records and subsequently we show how two attribute selection algorithms can be used to pre–select attributes for this decision table. The attribute selection algorithms are easily implemented within the same binary associative–memory framework producing a tightly coupled, two–tier system allowing attribute selection and decision table indexing. The first attribute selector uses mutual information between attributes and classes to select the attributes that classify best. The second attribute selector uses a probabilistic approach to evaluate randomly selected attribute subsets.

Eliciting Perceptual Ground Truth for Image Segmentation

Victoria Hodge, Garry Hollier, John Eakins and Jim Austin

Proceedings of Image and Video Retrieval, 5th International Conference, CIVR 2006, Tempe, AZ, USA, July 13–15, 2006. H. Sundaram, M.R. Naphade, J.R. Smith, Y. Rui (Eds.), Lecture Notes in Computer Science 4071, pp. 320–329,

Conference Papers

Abstract

In this paper, we investigate human visual perception and establish a body of ground truth data elicited from human visual studies. We aim to build on the formative work of Ren, Eakins and Briggs who produced an initial ground truth database. Human participants were asked to draw and rank their perceptions of the parts of a series of figurative images. These rankings were then used to score the perceptions, identify the preferred human breakdowns and thus allow us to induce perceptual rules for human decomposition of figurative images. The results suggest that the human breakdowns follow well–known perceptual principles in particular the Gestalt laws.

Eliciting Perceptual Ground Truth for Image Segmentation

Victoria Hodge, John Eakins and Jim Austin

Technical Report YCS–2006–401, Department of Computer Science, University of York, UK, January 2006

Tech Reports

Abstract

In this paper, we investigate human visual perception and establish a body of ground truth data elicited from human visual studies. We aim to build on the formative work of Ren, Eakins and Briggs who produced an initial ground truth database. Human subjects were asked to draw and rank their perceptions of the parts of a series of figurative images. These rankings were then used to score the perceptions, identify the preferred human breakdowns and thus allow us to induce perceptual rules for human decomposition of figurative images. The results suggest that the human breakdowns follow well–known perceptual principles in particular the Gestalt laws.

A Binary Neural k–Nearest Neighbour Technique.

Victoria Hodge and Jim Austin

Knowledge and Information Systems, 8(3), Springer–Verlag London Ltd, September 2005, pp. 276–292.

Journal Articles

Abstract

K-Nearest Neighbour (k–NN) is a widely used technique for classifying andclustering data. k–NN is eﬀective but is often criticised for its polynomial run–time growth as k–NN calculates the distance to every other record in the data set for eachrecord in turn. This paper evaluates a novel k–NN classiﬁer with linear growth andfaster run-time built from binary neural networks. The binary neural approach usesrobust encoding to map standard ordinal, categorical and real-valued data sets onto abinary neural network. The binary neural network uses high speed pattern matching torecall the k–best matches. We compare various conﬁgurations of the binary approachto a conventional approach for memory overheads, training speed, retrieval speed andretrieval accuracy. We demonstrate the superior performance with respect to speed andmemory requirements of the binary approach compared to the standard approach andwe pinpoint the optimal conﬁgurations.

A Survey of Outlier Detection Methodologies.

Victoria Hodge and Jim Austin

Artificial Intelligence Review, 22(2), Kluwer Academic Publishers, October 2004, pp. 85–126.

Journal Articles

Abstract

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.

A Binary Neural Decision Table Classifier.

Victoria Hodge, Simon O’Keefe and Jim Austin

Proceedings Brain Inspired Cognitive Systems 2004 (BICS 2004), University of Stirling, UK, August 29–September 1, 2004

Conference Papers

Abstract

In this paper, we introduce a neural network-based decision table algorithm. We focus on the implementation details of the decision table algorithm when it is constructed using the neural network. Decision tables are simple supervised classifiers which, Kohavi demonstrated, can outperform state–of–the–art classifiers such as C4.5. We couple this power with the efficiency and flexibility of a binary associative–memory neural network. We demonstrate how the binary associative-memory neural network can form the decision table index to map between attribute values and data records. We also show how two attribute selection algorithms, which may be used to pre–select the attributes for the decision table, can easily be implemented within the binary associative–memory neural framework. The first attribute selector uses mutual information between attributes and classes to select the attributes that classify best. The second attribute selector uses a probabilistic approach to evaluate randomly selected attribute subsets.

A High Performance k–NN Approach Using Binary Neural Networks.

Victoria Hodge, Ken Lees and Jim Austin

Neural Networks, 17(3), Elsevier Science, April 2004, pp. 441–458.

Journal Articles

Abstract

This paper evaluates a novel k–nearest neighbour (k–NN) classifier built from binary neural networks. The binary neural approach uses robust encoding to map standard ordinal, categorical and numeric data sets onto a binary neural network. The binary neural network uses high speed pattern matching to recall a candidate set of matching records, which are then processed by a conventional k–NN approach to determine the k–best matches. We compare various configurations of the binary approach to a conventional approach for memory overheads, training speed, retrieval speed and retrieval accuracy. We demonstrate the superior performance with respect to speed and memory requirements of the binary approach compared to the standard approach and we pinpoint the optimal configurations.

A Comparison of Standard Spell Checking Algorithms and a Binary Neural Approach.

Victoria Hodge and Jim Austin

IEEE Transactions on Knowledge and Data Engineering, 15(5), September-October 2003, pp. 1073–1081.

Journal Articles

Abstract

Abstract In this paper, we propose a simple, flexible, and efficient hybrid spell checking methodology based upon phonetic matching, supervised learning, and associative matching in the AURA neural system. We integrate Hamming Distance and n–gram algorithms that have high recall for typing errors and a phonetic spell–checking algorithm in a single novel architecture. Our approach is suitable for any spell–checking application though aimed toward isolated word error correction, particularly spell checking user queries in a search engine. We use a novel scoring scheme to integrate the retrieved words from each spelling approach and calculate an overall score for each matched word. From the overall scores, we can rank the possible matches. We evaluate our approach against several benchmark spellchecking algorithms for recall accuracy. Our proposed hybrid methodology has the highest recall rate of the techniques evaluated. The method has a high recall rate and low–computational cost.

Improved AURA k–Nearest Neighbour Approach.

Michael Weeks, Vicky Hodge, Simon O’Keefe, Jim Austin and Ken Lees

Proceedings of International Work–conference on Artificial and Natural Neural Networks (IWANN–2003), Menorca, Spain, June 3–6, 2003. Lecture Notes in Computer Science (LNCS) 2687, Springer Verlag, Berlin.

Conference Papers

Abstract

The k-Nearest Neighbour (kNN) approach is a widely-used technique for pattern classification. Ranked distance measurements to a known sample set determine the classification of unknown samples. Though effective, kNN, like most classification methods does not scale well with increased sample size. This is due to their being a relationship between the unknown query and every other sample in the data space. In order to make this operation scalable, we apply AURA to the kNN problem. AURA is a highly–scalable associative-memory based binary neural-network intended for high-speed approximate search and match operations on large unstructured datasets. Previous work has seen AURA methods applied to this problem as a scalable, but approximate kNN classifier. This paper continues this work by using AURA in conjunction with kernel-based input vectors, in order to create a fast scalable kNN classifier, whilst improving recall accuracy to levels similar to standard kNN implementations.

A Comparison of a Novel Neural Spell Checker and Standard Spell Checking Algorithms.

Victoria Hodge and Jim Austin

Pattern Recognition, 35(11), Elsevier Science, 2002, pp. 2571–2580.

Journal Articles

Abstract

In this paper, we propose a simple and flexible spell checker using efficient associative matching in the AURA modular neural system. Our approach aims to provide a pre-processor for an information retrieval (IR) system allowing the user’s query to be checked against a lexicon and any spelling errors corrected, to prevent wasted searching. IR searching is computationally intensive so much so that if we can prevent futile searches we can minimise computational cost. We evaluate our approach against several commonly used spell checking techniques for memory–use, retrieval speed and recall accuracy. The proposed methodology has low memory use, high speed for word presence checking, reasonable speed for spell checking and a high recall rate.

Hierarchical Word Clustering: automatic thesaurus generation.

Victoria Hodge and Jim Austin

NeuroComputing, 48(1–4), Elsevier Science, 2002, pp. 819–846.

Journal Articles

Abstract

In this paper, we propose a hierarchical, lexical clustering neural network algorithm that automatically generates a thesaurus (synonym abstraction) using purely stochastic information derived from unstructured text corpora and requiring no prior word classifications. The lexical hierarchy overcomes the Vocabulary Problem by accommodating paraphrasing through using synonym clusters and overcomes Information Overload by focusing search within cohesive clusters. We describe existing word categorisation methodologies, identifying their respective strengths and weaknesses and evaluate our proposed approach against an existing neural approach using a benchmark statistical approach and a human generated thesaurus for comparison. We also evaluate our word context vector generation methodology against two similar approaches to investigate the effect of word vector dimensionality and the effect of the number of words in the context window on the quality of word clusters produced. We demonstrate the effectiveness of our approach and its superiority to existing techniques.

Scalability of a Distributed Neural Information Retrieval System.

Michael Weeks, Victoria Hodge and Jim Austin

Presented at, 11th IEEE International Symposium on High Performance Distributed Computing (HPDC–2002), Edinburgh, Uk, July 24–26, 2002.

Conference Papers

Abstract

Summary form only given. AURA (Advanced Uncertain Reasoning Architecture) is a generic family of techniques and implementations intended for high–speed approximate search and match operations on large unstructured datasets. AURA technology is fast, economical, and offers unique advantages for finding near-matches not available with other methods. AURA is based upon a high–performance binary neural network called a correlation matrix memory (CMM). Typically, several CMM elements are used in combination to solve soft or fuzzy pattern–matching problems. AURA takes large volumes of data and constructs a special type of compressed index. AURA finds exact and near–matches between indexed records and a given query, where the query itself may have omissions and errors. The degree of nearness required during matching can be varied through thresholding techniques. The PCI-based PRESENCE (Parallel Structured Neural Computing Engine) card is a hardware-accelerator architecture for the core CMM computations needed in AURA–based applications. The card is designed for use in low–cost workstations and incorporates 128 MByte of low–cost DRAM for CMM storage. To investigate the scalability of the distributed AURA system, we implement a word–to–document index of an AURA–based information retrieval system, called MinerTaur, over a distributed PRESENCE CMM.

A Hardware Accelerated Novel IR System.

Michael Weeks, Victoria Hodge and Jim Austin

Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and Network–based Processing (PDP–2002), Las Palmas de Gran Canaria, Spain, January 9–11, 2002.

Conference Papers

Abstract

AURA (Advanced Uncertain Reasoning Architecture) is a generic family of techniques and implementations intended for high-speed approximate search and match operations on large unstructured datasets. This paper continues the AURA II (Advanced Uncertain Reasoning Architecture) project’s research into distributed binary Correlation Matrix Memory (CMM) based upon the PRESENCE (PaRallEl Structured Neural Computing Engine) hardware architecture [14]. Previous work has described how CMMs can be seamlessly implemented onto multiple hardware PRESENCE cards to accelerate core CMM operations. To demonstrate the system, this paper describes how a novel CMM-based information retrieval (IR) system, called MinerTaur, was implemented using multiple PRESENCE cards distributed across a cluster.

Integrating Information Retrieval & Neural Networks.

Victoria Hodge

PhD Thesis, Department of Computer Science, University of York, Heslington, York, UK, Sept. 2001.

Thesis

Abstract

Due to the proliferation of information in databases and on the Internet, users are overwhelmed leading to Information Overload. It is impossible for humans to index and search such a wealth of information by hand so automated indexing and searching techniques are required. In this dissertation, we explore current Information Retrieval (IR) techniques and their shortcomings and we consider how more sophisticated approaches can be developed to aid retrieval. Current techniques can be slow due to the sheer volume of the search space although faster ones are being developed. Matching is often poor, as the quantity of retrievals does not necessarily indicate quality retrievals. Many current approaches simply return the documents containing the greatest number of ‘query words’. A methodology is desired to: process documents unsupervised; generate an index using a data structure that is memory efficient, speedy, incremental and scalable; identify spelling mistakes in the query and suggest alternative spellings; handle paraphrasing of documents and synonyms for both indexing and searching; to focus retrieval by minimising the search space; and, finally calculate the query-document similarity from statistics autonomously derived from the text corpus. We describe our IR system named MinerTaur, developed using both the AURA modular neural system and a hierarchical, growing self-organising neural technique based on Growing Cell Structures which we call TreeGCS. We integrate three modules in MinerTaur: a spell checker; a hierarchical thesaurus generated from corpus statistics inferred by the system; and, a word-document matrix to efficiently store the associations between the documents and their constituent words. We describe each module individually and evaluate each against comparative data structures and benchmark implementations. We identify improved memory usage, spelling recall accuracy, cluster quality and training and recall times for the modules. Finally we compare MinerTaur against a benchmark IR system, SMART developed at Cornell University, and reveal superior recall and precision for MinerTaur versus SMART.

An Evaluation of Phonetic Spell Checkers.

Victoria Hodge, and Jim Austin

Technical Report YCS 338(2001), Department of Computer Science, University of York, UK, Sept. 2001.

Tech Reports

Abstract

In the work reported here, we describe a phonetic spell-checking algorithm integrating aspects of Soundex and Phonix. We increase the number of letter codes compared to Soundex and Phonix. We also integrate phonetic rules but use far less than Phonix where retrieval may be slow due to the computational cost of comparing the input to a large list of transformation rules. Our algorithm aims to repair spelling errors where the user has substituted homophones in place of the correct spelling. We evaluate our algorithm by comparing it to three alternative spell-checking algorithms and three benchmark spell checkers (MS Word 97 & 2000 and UNIX ‘ispell’) using a list of phonetic spelling errors. We find that our approach has superior recall (percentage of correct matches retrieved) to the alternative approaches although the higher recall is at the expense of precision (number of possible matches retrieved). We intend our phonetic spell checker to be integrated into an existing spell checker so the precision will be improved by integration thus high recall is the aim for our approach in this paper.

A Novel Binary Spell Checker.

Victoria Hodge and Jim Austin

Proceedings of the International Conference on Artificial Neural Networks (ICANN’2001), Vienna, Austria, 25–29 August, 2001. In, Dorffner, Bischof & Hornik (Eds), Lecture Notes in Computer Science (LNCS) 2130, Springer Verlag, Berlin

Conference Papers

Abstract

n this paper we propose a simple, flexible and efficient hybrid spell checking methodology based upon phonetic matching, supervised learning and associative matching in the AURA neural system. We evaluate our approach against several benchmark spell-checking algorithms for recall accuracy. Our proposed hybrid methodology has the joint highest top 10 recall rate of the techniques evaluated. The method has a high recall rate and low computational cost.

An Integrated Neural IR System.

Victoria Hodge and Jim Austin

Proceedings of the 9th European Symposium on Artificial Neural Networks (ESANN’2001), Bruges, Belgium, 25–27 April 2001, pp. 265–270.

Conference Papers

Abstract

Over the years the amount and range of electronic text stored on the WWW has expanded rapidly, overwhelming both users and tools designed to index and search the information. It is impossible to index the WWW dynamically at query time due to the sheer volume so the index must be pre–compiled and stored in a compact but incremental data structure as the information is ever–changing. Much of the text is unstructured so a data structure must be constructed from such text, storing associations between words and the documents that contain them. The index must be able to index fine–grained word-based associations and also handle more abstract concepts such as synonym groups. A search tool is also required to link to the index and enable the user to pinpoint their required information. We describe such a system we have developed in an integrated hybrid neural architecture and evaluate our system against the benchmark SMART system for retrieval accuracy: recall and precision.

An Evaluation of Standard Retrieval Algorithms and a Binary Neural Approach.

Victoria Hodge and Jim Austin

Neural Networks, 14(3), Elsevier Science, April 2001, pp. 287–303.

Journal Articles

Abstract

In this paper we evaluate a selection of data retrieval algorithms for storage efficiency, retrieval speed and partial matching capabilities using a large Information Retrieval dataset. We evaluate standard data structures, for example inverted file lists and hash tables, but also a novel binary neural network that incorporates: single–epoch training, superimposed coding and associative matching in a binary matrix data structure. We identify the strengths and weaknesses of the approaches. From our evaluation, the novel neural network approach is superior with respect to training speed and partial match retrieval time. From the results, we make recommendations for the appropriate usage of the novel neural approach.

Hierarchical Growing Cell Structures: TreeGCS.

Victoria Hodge and Jim Austin

IEEE Transactions on Knowledge and Data Engineering, Special Issue on Connectionist Models for Learning in Structured Domains, 13(2), March 2001, pp. 207–218.

Journal Articles

Abstract

We propose a hierarchical clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of B. Fritzke (1993). Our algorithm refines and builds upon the GCS base, overcoming an inconsistency in the original GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. Our algorithm is unsupervised, flexible, and dynamic and we have imposed no additional parameters on the underlying GCS algorithm. Our ultimate aim is a hierarchical clustering neural network that is both consistent and stable and identifies the innate hierarchical structure present in vector-based data. We demonstrate improved stability of the GCS foundation and evaluate our algorithm against the hierarchy generated by an ascendant hierarchical clustering dendrogram. Our approach emulates the hierarchical clustering of the dendrogram. It demonstrates the importance of the parameter settings for GCS and how they affect the stability of the clustering.

Hierarchical Growing Cell Structures: TreeGCS.

Victoria Hodge and Jim Austin

Proceedings of the Fourth International Conference on Knowledge–Based Intelligent Engineering Systems (KES’2000), Brighton, UK, August 30–Sept. 1, 2000

Conference Papers

Abstract

We propose a hierarchical, unsupervised clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of Fritzke. Our algorithm improves an inconsistency in the GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. We demonstrate improved stability of the GCS foundation by alternating the input vector order on each presentation. We evaluate our automatically produced cluster hierarchy against that generated by an ascendant hierarchical clustering dendrogram. We use a small dataset to illustrate how our approach emulates the hierarchical clustering of the dendrogram, regardless of the input vector order.

An Evaluation of Standard Retrieval Algorithms and a Weightless Neural Approach.

Victoria Hodge and Jim Austin

Proceedings of the IEEE–INNS–ENNS International Joint Conference on Neural Networks (IJCNN’2000), Italy, July 24–27, 2000

Conference Papers

Abstract

Many computational processes require efficient algorithms, those that both store and retrieve data efficiently and rapidly. In this paper we evaluate a selection of data structures for storage efficiency, retrieval speed and partial matching capabilities using a large information retrieval dataset. We evaluate standard data structures, for example inverted file lists and hash tables but also a novel binary neural network that incorporates superimposed coding, associative matching and row-based retrieval. We identify the strengths and weaknesses of the approaches. The novel neural network approach is superior with respect to training speed and partial match retrieval time.

Papers from the AAAI Workshop.

Victoria Hodge and Jim Austin, Co-Chairs

Technical Report WS–99–04, Association for the Advancement of Artificial Intelligence (AAAI) Press, Palo Alto, USA, 63 pp., ISBN 978–1–57735–088–0.

Tech Reports

Abstract

Current AI methods lack the flexibility and reliability of biological information processing systems and, although a great deal is known about the construction of biological systems, this knowledge has had little impact on main stream AI. If we are to progress toward building machines with the abilities of the natural computing systems, closer collaboration between those studying biological information processing systems and AI and neural computing is essential. This workshop wasspecifically designed to bring these two groups together, with the aim of providing indicators on how the brain may organize and process information, so that this knowledge may initiate new ways to think about computation. The workshop focused on topics of common interest to neurobiologists and those working in neural networks and other approaches to intelligent systems. It focusrf on the low-level mechanisms involved in biological systems and how these may be exploited by the brain to bring about intelligent behavior.