Degree

  • B.Sc.York

    Bachelor of Science (B.Sc. Hons) in Computer Science

    University of York

Doctorate

  • Ph.D. York

    Ph.D. in Computer Science

    Integrating Information Retrieval & Neural Networks

    University of York


Research Projects

  • image

    Reconfigurable Sensor Drones Project

    The project will combine AI & robotics to develop a reconfigurable sensor drone for monitoring of infrastructure and environments, and search & rescue.

    The Adaptive and Autonomous Robotics Module (AARM) Project will develop a mobile robotics module and monitoring software for drones. AARM can analyse buildings, infrastructure and environments, and perform search & rescue

    The module is a set of hot-swappable sensor and processing plates that clip together, attach to the drone, communicate with each other and send their data for on-site analysis. The plates can be configured into a module either by a user or by a robotic arm. The hot-swappable plates can be reconfigured or faulty plates replaced as needed.

    The plates stream their data for analysis by the latest artificial intelligence (AI) technologies. Our cutting edge AI software detects anomalies and guides the drone.

    AARM has the potential to work in conjunction with:

    • UAVs (drones),
    • UGVs (buggies),
    • to work in swarms of robots and drones,
    • to incorporate telepresence to allow remote interactive operation, and
    • to generate real-time augmented reality (AR) displays of buildings, infrastructure and environments for human operators. Overlaying sensor analyses, heatmaps and alerts onto video images from the drone in real-time.

    Download our flyer to find out more ... (PDF format)

  • image

    Win Prediction in esports

    Esports games are variable and fast paced and can be complex. If we can predict which team is most likely to win then we can create a simple overall statistic for the esports audience.

    Esports are competitive video games, usually played by teams. Over 320 million people worldwide watched or played esports in 2016. If esports fans were a country, they would be the fourth largest country in the world (just behind USA). Fans follow teams, watch games online and even attend competitions in large arenas around the world.

    Esports games are variable and fast paced. In esports, the tactics and game balance change much faster than in traditional sports. However, some esports are complex and have a high barrier to entry for new viewers. A big challenge in esports is making games understandable to broadcasters and the audience. During esports games, there are many statistics displayed to the audience. For most viewers, the number of statistics and meaning of these statistics can be confusing. It is often difficult to tell who is leading as the statistics can be contradictory. To make games understandable, we can use game analytics. Game analytics produces understandable information that is similar to the traditional sports statistics familiar to most viewers. If we can predict which team is leading and most likely to win then we will create a simple overall statistic for broadcasters and the audience.

    By far the most popular games with viewers are professional games. However, there are not enough data from professional games to use to predict the likely winners. Therefore, a key question for us is, “can data from non-professional players be used to predict winners in professional games?” We have found that our machine learning methods can predict the winners of professional games using non-professional data, with sufficient accuracy. We just need to spend time tuning our methods to ensure high enough accuracy.

    In the future, we plan to see if our methods can predict winners in other videogames.

  • image

    NEMOG Project

    The NEMOG Project focused on data mining for collective game intelligence. NEMOG used game analytics to unlock the potential for scientific and social benefits in digital games.

    The NEMOG project aimed to unlock the potential for scientific and social benefits in digital games. The numbers of games sold and the numbers of game hours played mean that we only need to persuade a small fraction of the games industry to consider the potential for social and scientific benefit to achieve a massive benefit for society, and potentially to start a movement that will lead to mainstream distribution of games aimed at scientific and social benefits.

    NEMOG analysed the current state of the digital games industry, by engaging directly with games companies and with industry network associations. It conducted research into sustainable business models for digital games, and particularly for games with scientific and social goals. These showed how businesses can start up and grow to develop a new generation of games with the potential to improve society.

    Every action in an online game, from an in-game purchase to a simple button push, generates a piece of network data. This is a truly immense source of information about player behaviours and preferences. We explored what online data is available now and might become available in the future, investigated the issues around gathering such data, and developed new data mining algorithms to better understand game players as an avenue for making better games, societal impact and scientific research.

    The NEMOG Project was an inter-disciplinary collaboration between research teams from York, Durham and Northumbria Universities and CASS Business School at City University London in partnership with industrial collaborators.

    ** Image from: The Science in Video Games!

  • image

    NEWTON Project

    NEWTON aimed to develop an autonomous, wireless sensor network and intelligent system for condition monitoring and anomaly detection in structures and infrastructure.

    NEWTON produced and evaluated a low–cost wireless sensor network and intelligent monitoring system for corrosion and stress. The system had specific applications in condition monitoring and anomaly detection of railways and in–service Non–Destructive Evaluation (NDE) for nuclear applications. It also provided rapid inspection of: anomalies in large structures, new materials, ageing infrastructures, safety critical systems for aircraft, power plants, oil/gas/water pipelines and offshore infrastructure.

    The NEWTON Project work was undertaken jointly by cross–disciplinary research team from Newcastle, Sheffield and York Universities, in collaboration with industrial partners.

    NEWTON investigated:

    • low cost and low power consumption sensor technologies,
    • novel approaches of RFID–based passive sensing networks,
    • spectrally efficient and reliable communications,
    • autonomous data fusion and feature extraction,
    • signal processing methods for nonlinear system identification and analysis,
    • outlier and anomaly detection in wireless sensor data,
    • cloud–based computing architectures and decision making,
    • intelligent system management.

    York NEWTON work:

    • Hadoop architecture and support for networking. At University of York, we developed a cloud–based software architecture for data storage, management and analysis using Apache Hadoop.
    • Outlier and anomaly detection. At University of York, we investigated finding outliers in wireless sensor data to allow anomaly detection and condition monitoring of infrastructure.

    ** Metal Rail Bridge image from: FreeDigitalPhotos.net

  • image

    FREEFLOW Project

    FREEFLOW developed data mining and decision support tools for traffic operators and individual travellers. It aimed to improve traffic management and operation by turning data into intelligence

    Traffic monitoring systems are collecting ever more detailed and timely data about transport networks. These data were integrated and mined using tools that: detected patterns and anomalies in traffic data; and, used pattern matching to find similar historical patterns. We then took suitable measures to assist traffic network operators and to improve the current traffic flow using both historical knowledge and traffic modelling.

    The FREEFLOW Project involved 15 partners from academia, industry and government.

    The FREEFLOW Project operated on three sites.

    • In London, FREEFLOW analysed Park Lane and Hyde Park Corner, a key London route which is as busy as many UK motorways, and selected appropriate traffic signal settings and other measures to mitigate congestion.
    • In Maidstone, the project focused on managing the urban / inter–urban interface and displaying sets of messages on variable message signs to improve travel times.
    • Work in York focused on bus punctuality and selecting the most appropriate traffic signal plans and messages to display to maintain punctuality.

    FREEFLOW Project - Matching traffic data in a binary neural network.

    Data are stored and matched


    At University of York, we developed a k-nearest neighbour based pattern matching tool using neural networks. The tool was based on Correlation Matrix Memories (CMMs) which are binary associative neural networks. CMMs can store large amounts of data and allow fast searches.

    We converted traffic data variables (such as data from sensors embedded in the road or from buses) into vectors using a quantisation process. These vectors were then stored in a historical database of vectors in the CMM.

    As new traffic data were generated, we turned these new data into query vectors using the quantisation process. We applied the query vector to the neural network to find the k best matching historical time periods using kernel–based vector similarity and incorporating spatio-temporal aspects.

    Finally, we provided advice to the traffic operator by cross-referencing operator logs for traffic control interventions made during the k best matching time periods; calculated a quality score for each of these interventions (how well it worked); and, thus, recommended to the operator the intervention likely to be most effective for the current situation. We also used the neural network to predict variable values to plug gaps in the data; to overcome a sensor failure; or to look ahead and anticipate congestion problems. We extrapolated and produced a prediction of the future traffic value by averaging the variable value across the set of matches retrieved by the neural network.

    ** Image shows: Traffic at night by Petr Kratochvil

  • image

    CARMEN Project

    The Carmen project developed a web–based portal for users to run compute-intensive research while recording what they had done. It used a three–tier architecture and a cloud-based infrastructure.

    The Carmen project developed a web-based portal for users to run compute-intensive research while recording what they had done. I worked on all aspects of the portal’s three–tier architecture. The back-end was built in Java and SQL to manage data storage, security and service enactment across a cloud-based infrastructure. I also collaborated on the front-end web interface developed in Javascript.

    The CARMEN Virtual Laboratory (VL) is a cloud–based platform which allows neuroscientists to store, share, develop, execute, reproduce and publicise their work.

    Carmen produced an interactive publications repository. The repository allows users to link data and software to publications and means that other users can examine data and software associated with the publication and execute the associated software within the VL using the same data as the authors used in the publication.

    The cloud–based architecture and SaaS (Software as a Service) framework allows users to upload and analyse vast data sets using software services. This new interactive publications facility allows others to build on research results through reuse. It aligns with recent developments by funding agencies, institutions, and publishers with a move to open access research. Open access provides reproducibility and verification of research resources and results. Publications and their associated data and software will be assured of long-term preservation and curation in the repository.

    Further, analysing research data and the evaluations described in publications frequently requires a number of execution stages many of which are iterative. The Carmen VL provides a scientific workflow environment to combine software services into a processing tree. These workflows can also be associated with publications and executed by users.

    The VL provides a secure environment where users can decide the access rights for each resource to ensure copyright and privacy restrictions are met.


Filter by type:

Sort by year:

Win Prediction in Multi-Player Esports: Live Professional Match Prediction.

Victoria Hodge, Sam Devlin, Nick Sephton, Florian Block, Peter Cowling, and Anders Drachen
Accepted for IEEE Transactions on Games, IEEE, October 2019.
Journal Paper

Abstract

Esports are competitive videogames watched by audiences. Most esports generate detailed data for each match that are publicly available. Esports analytics research is focused on predicting match outcomes. Previous research has emphasised pre-match prediction and used data from amateur games, which are more easily available than professional level. However, the commercial value of win prediction exists at the professional level. Furthermore, predicting real-time data is unexplored, as is its potential for informing audiences. Here we present the first comprehensive case study on live win prediction in a professional esport. We provide a literature review for win prediction in a multi-player online battle arena (MOBA) esport. The paper evaluates the first professional-level prediction models for live DotA 2 matches, one of the most popular MOBA games and trials it at a major international esports tournament. Using standard machine learning models, feature engineering and optimization, our model is up to 85\% accurate after 5 minutes of gameplay. Our analyses highlight the need for algorithm evaluation and optimization. Finally, we present implications for the esports/game analytics domains, describe commercial opportunities, practical challenges, and propose a set of evaluation criteria for research on esports win prediction.

Time to Die: Death Prediction in Dota 2 using Deep Learning.

Adam Katona, Ryan Spick, Victoria Hodge, Simon Demediuk, Florian Block, Anders Drachen and James Alfred Walker
Proceedings of IEEE Conference on Games (CoG 2019), London, UK, Aug. 20–23, 2019.
Conference Papers

Abstract

Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player hero being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window. The network is trained on a vast selection of Dota 2 gameplay features and professional/semi-professional level match dataset. Even though death events are rare within a game (1\% of the data), the model achieves 0.377 precision with 0.725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds. An example of the system applied to a Dota 2 match is presented. This model enables real-time micro-predictions of kills in Dota 2, one of the most played esports titles in the world, giving commentators and viewers time to move their attention to these key events.

Time to Die: Death Prediction in Dota 2 using Deep Learning.

Adam Katona, Ryan Spick, Victoria Hodge, Simon Demediuk, Florian Block, Anders Drachen and James Alfred Walker
ArXiv e–prints, 1906.03939, 2019, (arXiv:1906.03939 [cs.LG])
ArXiv E–print

Abstract

Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player hero being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window. The network is trained on a vast selection of Dota 2 gameplay features and professional/semi-professional level match dataset. Even though death events are rare within a game (1\% of the data), the model achieves 0.377 precision with 0.725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds. An example of the system applied to a Dota 2 match is presented. This model enables real-time micro-predictions of kills in Dota 2, one of the most played esports titles in the world, giving commentators and viewers time to move their attention to these key events.

AI and Automatic Music Generation for Mindfulness.

Duncan Williams, Victoria Hodge, Lina Gega, Damian Murphy, Peter Cowling and Anders Drachen
Audio Engineering Society: International Conference on Immersive and Interactive Audio, March 27–29, 2019, York, UK
Conference Papers

Abstract

This paper presents an architecture for the creation of emotionally congruent music using machine learning aided sound synthesis. Our system can generate a small corpus of music using Hidden Markov Models; we can label the pieces with emotional tags using data elicited from questionnaires. This produces a corpus of labelled music underpinned by perceptual evaluations. We then analyse participant’s galvanic skin response (GSR) while listening to our generated music pieces and the emotions they describe in a questionnaire conducted after listening. These analyses reveal that there is a direct correlation between the calmness/scariness of a musical piece, the users’ GSR reading and the emotions they describe feeling. From these, we will be able to estimate an emotional state using biofeedback as a control signal for a machine-learning algorithm, which generates new musical structures according to a perceptually informed musical feature similarity model. Our case study suggests various applications including in gaming, automated soundtrack generation, and mindfulness.

A Psychometric Evaluation of Emotional Responses to Horror Music.

Duncan Williams, Chia-Yu Wu, Victoria Hodge, Damian Murphy and Peter Cowling
Audio Engineering Society: 146th International Pro Audio Convention, March 20–23, 2019, Dublin, Ireland
Conference Papers

Abstract

This research explores and designs an effective experimental interface to evaluate people's emotional responses to horror music. We studied methodological approaches by using traditional psychometric techniques to measure emotional responses, including self-reporting, and galvanic skin response (GSR). GSR correlates with psychological arousal. It can help circumvent a problem in self-reporting where people are unwilling to report particular felt responses, or confuse perceived and felt responses. We also consider the influence of familiarity. Familiarity can induce learned emotional responses rather than listeners describing how it actually makes them feel. The research revealed different findings in self-reports and GSR data. Both measurements had an interaction between music and familiarity but show inconsistent results from the perspective of simple effects.

Narrative Bytes: Data-Driven Content Production in Esports.

Florian Block, Victoria Hodge, Stephen Hobson, Nick Sephton, Sam Devlin, Marian F. Ursu, Anders Drachen and Peter I. Cowling
Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video (TVX 18). ACM, New York, NY, USA, 29–41. DOI: https://doi.org/10.1145/3210825.3210833
Conference Papers

Abstract

Esports – video games played competitively that are broadcast to large audiences – are a rapidly growing new form of mainstream entertainment. Esports borrow from traditional TV, but are a qualitatively different genre, due to the high flexibility of content capture and availability of detailed gameplay data. Indeed, in esports, there is access to both real–time and historical data about any action taken in the virtual world. This aspect motivates the research presented here, the question asked being: can the information buried deep in such data, unavailable to the human eye, be unlocked and used to improve the live broadcast compilations of the events? In this paper, we present a large–scale case study of a production tool called Echo, which we developed in close collaboration with leading industry stakeholders. Echo uses live and historic match data to detect extraordinary player performances in the popular esport Dota 2, and dynamically translates interesting data points into audience–facing graphics. Echo was deployed at one of the largest yearly Dota 2 tournaments, which was watched by 25 million people. An analysis of 40 hours of video, over 46,000 live chat messages, and feedback of 98 audience members showed that Echo measurably affected the range and quality of storytelling, increased audience engagement, and invoked rich emotional response among viewers.

An Evaluation of Classification and Outlier Detection Algorithms.

Victoria J. Hodge and Jim Austin
ArXiv e–prints, 1805.00811, 2018, (arXiv:1805.00811 [stat.ML])
ArXiv E–print

Abstract

This paper evaluates algorithms for classification and outlier detection accuracies in temporal data. We focus on algorithms that train and classify rapidly and can be used for systems that need to incorporate new data regularly. Hence, we compare the accuracy of six fast algorithms using a range of well-known time-series datasets. The analyses demonstrate that the choice of algorithm is task and data specific but that we can derive heuristics for choosing. Gradient Boosting Machines are generally best for classification but there is no single winner for outlier detection though Gradient Boosting Machines (again) and Random Forest are better. Hence, we recommend running evaluations of a number of algorithms using our heuristics.

How the Business Model of Customisable Card Games Influences Player Engagement.

Victoria J. Hodge, Nick Sephton, Sam Devlin, Peter I. Cowling, Nikolaos Goumagias, Jianhua Shao, Kieran Purvis, Ignazio Cabras, Kiran Fernandes and Feng Li.
IEEE Transactions on Games, IEEE, March 2018, online preprint.
Journal Paper

Abstract

In this article, we analyse the game play data of three popular customisable card games where players build decks prior to game play. We analyse the data from a player engagement perspective, how the business model affects players, how players influence the business model and provide strategic insights for players themselves. Sifa et al. found a lack of cross-game analytics while Marchand and Hennig-Thurau identified a lack of understanding of how a game's business model and strategies affect players. We address both issues. The three games have similar business models but differ in one aspect: the distribution model for the cards used in the game. Our longitudinal analysis highlights this variation's impact. A uniform distribution creates a spread of decks with slowly emerging trends while a random distribution creates stripes of deck building activity that switch suddenly each update. Our method is simple, easily understandable, independent of the specific game's structure and able to compare multiple games. It is applicable to games that release updates and enables comparison across games. Optimising a game's updates strategy is key as it affects player engagement and retention which directly influence businesses' revenues and profitability in the $95 billion global games market.

Win Prediction in Esports: Mixed–Rank Match Prediction in Multi-player Online Battle Arena Games.

Victoria Hodge, Sam Devlin, Nick Sephton, Florian Block, Anders Drachen, and Peter Cowling
ArXiv e–prints, 1711.06498, 2017, (arXiv:1711.06498 [cs.AI])
ArXiv E–print

Abstract

Esports has emerged as a popular genre for players as well as spectators, supporting a global entertainment industry. Esports analytics has evolved to address the requirement for data-driven feedback, and is focused on cyber-athlete evaluation, strategy and prediction. Towards the latter, previous work has used match data from a variety of player ranks from hobbyist to professional players. However, professional players have been shown to behave differently than lower ranked players. Given the comparatively limited supply of professional data, a key question is thus whether mixed-rank match datasets can be used to create data-driven models which predict winners in professional matches and provide a simple in-game statistic for viewers and broadcasters. Here we show that, although there is a slightly reduced accuracy, mixed-rank datasets can be used to predict the outcome of professional matches, with suitably optimized configurations.

Using Association Rule Mining to Predict Opponent Deck Content in Android: Netrunner.

N. Sephton, P. Cowling, S. Devlin, V. Hodge, and N. Slaven
Proceedings of IEEE Computational Intelligence and Games Conference (CIG 2016), Santorini, Greece, Sept. 20–23, 2016, pp. 102–109.
Conference Papers

Abstract

As part of their design, card games often include information that is hidden from opponents and represents a strategic advantage if discovered. A player that can discover this information will be able to alter their strategy based on the nature of that information, and therefore become a more competent opponent. In this paper, we employ association rule-mining techniques for predicting item multisets, and show them to be effective in predicting the content of Netrunner decks. We then apply different modifications based on heuristic knowledge of the Netrunner game, and show the effectiveness of techniques which consider this knowledge during rule generation and prediction.

A Conceptual Framework of Business Model Emerging Resilience.

N. Goumagias, K. Fernandes, I. Cabras, F. Li, J. Shao, S. Devlin, V. Hodge, P. Cowling and D. Kudenko
32nd European Group for Organization Studies (EGOS) Colloquium, Naples, Italy, July 7–9, 2016.
Conference Papers

Abstract

In this paper we introduce an environmentally driven conceptual framework of Business Model change. Business models acquired substantial momentum in academic literature during the past decade. Several studies focused on what exactly constitutes a Business Model (role model, recipe, architecture etc.) triggering a theoretical debate about the Business Model’s components and their corresponding dynamics and relationships. In this paper, we argue that for Business Models as cognitive structures, are highly influenced in terms of relevance by the context of application, which consequently enriches its functionality. As a result, the Business Model can be used either as a role model (benchmarking) or a recipe (strategy). For that purpose, we assume that the Business Model is embedded within the economic (task) environment, and consequently affected by it. Through a typology of the environmental impact on the Business Model productivity, we introduce a conceptual framework that aims to capture the salient features of Business Model emergent resilience as reaction to two types impact: productivity constraining and disturbing.

A Strategic Roadmap for Business Model Change for the Video-games Industry.

N. Goumagias, K. Purvis, K. Fernandes, I. Cabras, F. Li, J. Shao, S. Devlin, V. Hodge, P. Cowling and D. Kudenko
R&D Management Conference, Cambridge, UK, July 3–6, 2016.
Conference Papers

Abstract

The global video games industry has experienced and exponential growth in terms of socioeconomic impact during the last 50 years. Surprisingly, little academic interest is directed towards the industry, particularly in the context of BM Change. As a technologically intensive creative industry, developing studios and publishers experience substantial internal and external forces to identify, and sustain, their competitive advantage. To achieve that, managers are called to systematically explore and exploit, alternative BMs that are compatible with the company’s strategy. We build on empirical analysis of the video–games industry to construct a Toolkit that i) will help practitioners and academics to describe the industrial ecosystem of BMs more accurately, and ii) use it a strategic roadmap for managers to navigate through alternatives for entrepreneurial and growth purposes.

A Hadoop Neural Network for Parallel and Distributed Feature Selection.

Victoria J. Hodge, Simon O’Keefe and Jim Austin
Neural Networks, 78, Elsevier, June 2016, pp. 24–35.
Journal Paper

Abstract

In this paper, we introduce a theoretical basis for a Hadoop–based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative–memory neural network in Hadoop.

A Digital Repository and Execution Platform for Interactive Scholarly Publications in Neuroscience.

Victoria J. Hodge, et al.
NeuroInformatics, 14(1), Springer, January 2016, pp. 23–40.
Journal Paper

Abstract

The CARMEN Virtual Laboratory (VL) is a cloud-based platform which allows neuroscientists to store, share, develop, execute, reproduce and publicise their work. This paper describes new functionality in the CARMEN VL: an interactive publications repository. This new facility allows users to link data and software to publications. This enables other users to examine data and software associated with the publication and execute the associated software within the VL using the same data as the authors used in the publication. The cloud–based architecture and SaaS (Software as a Service) framework allows vast data sets to be uploaded and analysed using software services. Thus, this new interactive publications facility allows others to build on research results through reuse. This aligns with recent developments by funding agencies, institutions, and publishers with a move to open access research. Open access provides reproducibility and verification of research resources and results. Publications and their associated data and software will be assured of long–term preservation and curation in the repository. Further, analysing research data and the evaluations described in publications frequently requires a number of execution stages many of which are iterative. The VL provides a scientific workflow environment to combine software services into a processing tree. These workflows can also be associated with publications and executed by users. The VL also provides a secure environment where users can decide the access rights for each resource to ensure copyright and privacy restrictions are met.

Wireless Sensor Networks for Condition Monitoring in the Railway Industry: a Survey.

Victoria J. Hodge, Simon O’Keefe, Michael Weeks and Anthony Moulds
IEEE Transactions on Intelligent Transportation Systems, 16(3), IEEE, June 2015, pp. 1088–1106
Journal Paper

Abstract

In recent years, the range of sensing technologies has expanded rapidly, whereas sensor devices have become cheaper. This has led to a rapid expansion in condition monitoring of systems, structures, vehicles, and machinery using sensors. Key factors are the recent advances in networking technologies such as wireless communication and mobile ad hoc networking coupled with the technology to integrate devices. Wireless sensor networks (WSNs) can be used for monitoring the railway infrastructure such as bridges, rail tracks, track beds, and track equipment along with vehicle health monitoring such as chassis, bogies, wheels, and wagons. Condition monitoring reduces human inspection requirements through automated monitoring, reduces maintenance through detecting faults before they escalate, and improves safety and reliability. This is vital for the development, upgrading, and expansion of railway networks. This paper surveys these wireless sensors network technology for monitoring in the railway industry for analyzing systems, structures, vehicles, and machinery. This paper focuses on practical engineering solutions, principally, which sensor devices are used and what they are used for; and the identification of sensor configurations and network topologies. It identifies their respective motivations and distinguishes their advantages and disadvantages in a comparative review.

Short–Term Prediction of Traffic Flow Using a Binary Neural Network.

Victoria J. Hodge, Rajesh Krishnan, Jim Austin, John Polak and Tom Jackson
Neural Computing and Applications, 25(7–8), Springer, December 2014, pp. 1639–1655
Journal Paper

Abstract

This paper introduces a binary neural network–based prediction algorithm incorporating both spatial and temporal characteristics into the prediction process. The algorithm is used to predict short–term traffic flow by combining information from multiple traffic sensors (spatial lag) and time series prediction (temporal lag). It extends previously developed Advanced Uncertain Reasoning Architecture (AURA) k–nearest neighbour (k–NN) techniques. Our task was to produce a fast and accurate traffic flow predictor. The AURA k–NN predictor is comparable to other machine learning techniques with respect to recall accuracy but is able to train and predict rapidly. We incorporated consistency evaluations to determine whether the AURA k–NN has an ideal algorithmic configuration or an ideal data configuration or whether the settings needed to be varied for each data set. The results agree with previous research in that settings must be bespoke for each data set. This configuration process requires rapid and scalable learning to allow the predictor to be set-up for new data. The fast processing abilities of the AURA k–NN ensure this combinatorial optimisation will be computationally feasible for real–world applications. We intend to use the predictor to proactively manage traffic by predicting traffic volumes to anticipate traffic network problems.

Outlier Detection in Big Data.

Victoria J. Hodge
J. Wang (ed.), Encyclopedia of Business Analytics and Optimization, Chapter 157, (Hershey, PA: IGI Global), 2014, pp. 1762–1771.
Book Chapter

Abstract

Outlier detection (or anomaly detection) is a fundamental task in data mining. Outliers are data that deviate from the norm and outlier detection is often compared to “finding a needle in a haystack.” However, the outliers may generate high value if they are found, value in terms of cost savings, improved efficiency, compute time savings, fraud reduction and failure prevention. Detection can identify faults before they escalate with potentially catastrophic consequences. Big Data refers to large, dynamic collections of data. These vast and complex data appear problematic for traditional outlier detection methods to process but, Big Data provides considerable opportunity to uncover new outliers and data relationships. This chapter highlights some of the research issues for outlier detection in Big Data and covers the solutions used and research directions taken along with an analysis of some current outlier detection approaches for Big Data applications.

A HADOOP–Based Framework for Parallel and Distributed Feature Selection.

Victoria J. Hodge, Tom Jackson and Jim Austin
Technical Report YCS–2013–485, Department of Computer Science, University of York, UK, Sept. 2013.
Tech report

Abstract

In this paper, we introduce a theoretical basis for a Hadoop-based framework for parallel and distributed feature selection. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of four feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop MapReduce. Hadoop allows parallel and distributed processing so each feature selector can be processed in parallel and multiple feature selectors can be processed together in parallel allowing multiple feature selectors to be compared. We identify commonalities among the four features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all four feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative–memory neural network in Hadoop.

A Survey of Outlier Detection Methodologies..

Victoria J. Hodge and Jim Austin
S. Babones (Ed.), Fundamentals of Regression Modeling, SAGE Publications, 2013. ISBN: 9781446208281
Chapter reprint of original work from 2004.
Book Chapter

Abstract

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.

A metric for pattern–matching applications to traffic management.

Richard Mounce, Garry Hollier, Mike Smith, Victoria J. Hodge, Tom Jackson and Jim Austin
Transportation Research Part C: Emerging Technologies, 29, Elsevier Science, April 2013, pp. 148–155.
Journal Paper

Abstract

This paper considers signal plan selection; the main topic is the design of a system for utilising pattern matching to assist the timely selection of sound signal control plan changes. In this system, historical traffic flow data is continually searched, seeking traffic flow patterns similar to today’s. If, in one of these previous similar situations, (a) the signal plan utilised was different to that being utilised today and (b) it appears that the performance achieved was better than the performance likely to be achieved today, then the system recommends an appropriate signal plan switch. The heart of the system is “similarity”. Two traffic flow patterns (two time series of traffic flows arising from two different days) are said to be “similar” if the distance between them is small; similarity thus depends on how the metric or distance between two time series of traffic flows is defined. A simple example is given which suggests that utilising the standard Euclidean distance between the two sequences comprising cumulatives of traffic flow may be better than utilising the standard Euclidean distance between the original two sequences of traffic flow data. The paper also gives measured on–street public transport benefits which have arisen from using a simple rule–based (traffic–responsive) signal plan selection system, compared with a time–tabled signal plan selection system.

The CARMEN software as a service infrastructure.

Michael Weeks, Mark Jessop, Martyn Fletcher, Victoria Hodge, Tom Jackson and Jim Austin
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1983), 2013
Journal Paper

Abstract

The CARMEN platform allows neuroscientists to share data, metadata, services and workflows, and to execute these services and workflows remotely via a Web portal. This paper describes how we implemented a service-based infrastructure into the CARMEN Virtual Laboratory. A Software as a Service framework was developed to allow generic new and legacy code to be deployed as services on a heterogeneous execution framework. Users can submit analysis code typically written in Matlab, Python, C/C++ and R as non–interactive standalone command–line applications and wrap them as services in a form suitable for deployment on the platform. The CARMEN Service Builder tool enables neuroscientists to quickly wrap their analysis software for deployment to the CARMEN platform, as a service without knowledge of the service framework or the CARMEN system. A metadata schema describes each service in terms of both system and user requirements. The search functionality allows services to be quickly discovered from the many services available. Within the platform, services may be combined into more complicated analyses using the workflow tool. CARMEN and the service infrastructure are targeted towards the neuroscience community; however, it is a generic platform, and can be targeted towards any discipline.

A Binary Neural Network Framework for Attribute Selection and Prediction.

Victoria J. Hodge, Tom Jackson and Jim Austin
Proceedings of the 4th International Conference on Neural Computation Theory and Applications (NCTA 2012), Barcelona, Spain, October 5–7, 2012, pp. 510–515.
Conference Papers

Abstract

In this paper, we introduce an implementation of the attribute selection algorithm, Correlation-based Feature Selection (CFS) integrated with our k–nearest neighbour (k–NN) framework. Binary neural networks underpin our k–NN and allow us to create a unified framework for attribute selection, prediction and classification. We apply the framework to a real world application of predicting bus journey times from traffic sensor data and show how attribute selection can both speed our k–NN and increase the prediction accuracy by removing noise and redundant attributes from the data.

Enhancing YouShare: the online collaboration research environment for sharing data and services.

Victoria Hodge, Aaron Turner, Martyn Fletcher, Mark Jessop, Michael Weeks, Tom Jackson and Jim Austin
Presented at, Digital Research 2012, Oxford, UK, September 10–12, 2012.
Conference Papers

Abstract

This paper describes recent enhancements to the YouShare platform, the online collaboration environment, which allows researchers to share data and software applications and perform compute–intensive analysis tasks quickly and securely. The enhancements to the platform are a result of user feedback on the current system and technology advancements. These fall into four groups – better handling of searching, use of synonyms, the addition of a workflow tool and enhancements to the infrastructure. The paper outlines these improvements.

Discretisation of Data in a Binary Neural k–Nearest Neighbour Algorithm.

Victoria J. Hodge and Jim Austin
Technical Report YCS–2012–473, Department of Computer Science, University of York, UK, June 2012.
Tech report

Abstract

This paper evaluates several methods of discretisation (binning) within a k–Nearest Neighbour predictor. Our k–NN is constructed using binary neural networks which require continuous-valued data to be discretised to allow it to be mapped to the binary neural framework. Our approach uses discretisation coupled with robust encoding to map data sets onto the binary neural network. In this paper, we compare seven unsupervised discretisation methods for retrieval accuracy (prediction accuracy) across a range of well–known prediction data sets comprising time–series data. We analyse whether there is an optimal discretisation configuration for our k–NN. The analyses demonstrate that the configuration is data specific. Hence, we recommend running evaluations of a number of configurations, varying both the discretisation methods and the number of discretisation bins, using a test data set. This evaluation will pinpoint the optimum configuration for new data sets.

Intelligent Decision Support using Pattern Matching.

Victoria J. Hodge, Tom Jackson and Jim Austin
Proceedings of the 1st International Workshop on Future Internet Applications for Traffic Surveillance and Management (FIATS–M 2011), Sofia, Bulgaria, October 2011, pp. 44–54.
Conference Papers

Abstract

The aim of our work is to develop Intelligent Decision Support (IDS) tools and techniques to convert traffic data into intelligence to assist network managers, operators and to aid the travelling public. The IDS system detects traffic problems, identifies the likely cause and recommends suitable interventions which are most likely to mitigate congestion of that traffic problem. In this paper, we propose to extend the existing tools to include dynamic hierarchical and distributed processing; algorithm optimisation using natural computation techniques; and, using a meta–learner to short–circuit the optimisation by learning the best settings for specific data set characteristics and using these settings to initialise the GA.

Outlier and Anomaly Detection: A Survey of Outlier and Anomaly Detection Methods

Victoria J. Hodge
Lambert Academic Publishing | 2011 | ISBN: 978–3–8465–4822–6.
Book
image

Abstract

An outlier or anomaly is a data point that is inconsistent with the rest of the data population. Outlier or anomaly detection has been used for centuries to detect and remove anomalous observations from data. It is used to monitor vital infrastructure such as utility distribution networks, transportation networks, machinery or computer networks for faults. Detection can identify faults before they escalate with potentially catastrophic consequences. Today, principled and systematic detection techniques are used, drawn from the full gamut of Computer Science and Statistics. The book forms a survey of techniques covering statistical, proximity–based, density–based, neural, natural computation, machine learning, distributed and hybrid systems. It identifies their respective motivations and distinguishes their advantages and disadvantages in a comparative review. It aims to provide the reader with a feel of the diversity and multiplicity of techniques available. The survey should be useful to advanced undergraduate and postgraduate computer and library/information science students and researchers analysing and developing outlier and anomaly detection systems.

Cumulatives and Errors in Pattern Matching for Intelligent Transport Systems.

Garry Hollier, Mike Smith, Victoria J. Hodge and Jim Austin
Proceedings of 2nd International Conference on Models and Technologies for Intelligent Transportation Systems, Leuven, Belgium, June 22–24, 2011.
Conference Papers

Abstract

Pattern recognition often relies on the distance between patterns, and the Euclidean distance is frequently used. In traffic studies, the sequence of flow cumulatives is more directly related to the character of the flow (e.g., congested or free), and might be expected to supply more meaningful traffic information than would a sequence of unsummed flows. However, if there is noise in the detection procedure, an error in the first count of a sequence would be carried forward to subsequent cumulatives, and would therefore, like errors in general, have a greater effect on the Euclidean distance between cumulatives than it would on that between the raw flows. This paper aims at providing a quantitative measure of the classification errors caused by noise on the Euclidean distance between sequences of cumulatives compared to those arising from distance calculations between raw sequences, and thus supplying a guideline for the situation when cumulatives or raw flows are to used in the presence of noise.

Splitting Rate Modelling for Intelligent Transport Systems.

Mike Smith, Richard Mounce, Garry Hollier, Victoria J. Hodge and Jim Austin
Proceedings of 2nd International Conference on Models and Technologies for Intelligent Transportation Systems, Leuven, Belgium, June 22–24, 2011.
Conference Papers

Abstract

This paper considers models of within day and day-to-day driver (or traveller) decisions; and focusses on a splitting rate method of modelling these decisions sequentially. The paper begins with a dynamical system proposed in Smith (1984), using route-swapping, and shows how this same system may be applied within a splitting rate model. Some stability and convergence results are to be presented. Finally the paper suggests how signal control may be introduced in a simple and helpful way. This combined routeing / signal control model may be utilized quite easily to yield initial “suggested” signal control interventions corresponding to specific incidents.

Short–Term Traffic Prediction Using a Binary Neural Network.

Victoria J. Hodge, Rajesh Krishnan, Tom Jackson, Jim Austin and John Polak
43rd Annual UTSG Conference, Open University, Milton Keynes, UK, January 5–7, 2011
Conference Papers

Abstract

This paper presents a binary neural network algorithm for short-term traffic flow prediction. The algorithm can process both univariate and multivariate data from a single traffic sensor using time series prediction (temporal lags) and can combine information from multiple traffic sensors with time series prediction (spatial–temporal lags). The algorithm provides Intelligent Decision Support (IDS) for road network managers to proactively manage problems on the network as the predictions generated may be used to determine if traffic control interventions need to be applied. The algorithm can operate in near-real-time and dynamically; using data from UTC or UTMC systems. It is based on the Advanced Uncertain Reasoning Architecture (AURA) k–nearest neighbour prediction algorithm, which is designed for scalability and fast performance. The AURA k–NN predictor outperforms other machine learning techniques with respect to prediction accuracy and is able to train and predict rapidly. The basic AURA k–NN time series prediction algorithm was extended by incorporating average daily profiles and variable weighting into the prediction in this paper. The average daily profile of a variable is calculated as the average reading of the variable for a particular time of day and day of the week after removing outliers. When data vectors are matched in the AURA k–NN, the daily profile adds an extra dimension to the match. This process was further enhanced by weighting the profile using variable weighting to vary the profile’s significance. It is shown that incorporating these two additional aspects improves the accuracy of the prediction compared to the standard AURA k–NN, resulting in a very fast and accurate traffic prediction tool.

Integrating Information Retrieval with Artificial Neural Networks: Implementing a Modular Information Retrieval System using Artificial Neural Networks

Victoria J. Hodge
Lambert Academic Publishing | 2010 | ISBN: 978–3–8433–7966–3.
Book
image

Abstract

Information Retrieval (IR) is a field of computer science investigating the automated storage and retrieval of information, particularly documents. The amount of stored information has expanded rapidly and now, vast repositories of information on almost every conceivable subject are available to be searched. Effective IR involves: understanding the needs of users; handling the vagaries and ambiguities of language and human errors; and developing an efficient and accurate storage and search system. This book provides an analysis of IR techniques and systems assessing their strengths and weaknesses. This analysis then provides the motivation for the proposed system of three integrated modules: a novel spell checker based on a binary neural network; a thesaurus generated from a dynamic growing neural network; and, an efficient word–to–document index. The book provides a detailed description of the implementation and evaluation of the proposed system. The IR analyses and system development should be useful to advanced undergraduate and postgraduate computer and library/information science students and researchers analysing and developing IR systems.

Intelligent Decision Support for Traffic Management.

Rajesh Krishnan, Victoria J. Hodge, Jim Austin, John Polak, Tom Jackson, Mike Smith and Tzu–Chang Lee
Proceedings of 17th ITS World Congress: (CD–ROM), Busan, Korea, October 25–29, 2010.
Conference Papers

Abstract

Urban traffic control systems such as the widely deployed SCOOT system, incrementally respond to changing traffic conditions. Such systems are often complemented by traffic control centres where road network managers intervene manually to mitigate rapidly developing congestion events. An Intelligent Decision Support (IDS) system developed by the authors within the UK FREEFLOW (FF) project to aid the network managers is presented in this paper. The primary objective of the FF IDS system is to identify traffic congestion in near-real-time and to recommend appropriate traffic control intervention measures. The FF–IDS consists of multiple internal components. A state estimation component monitors live traffic sensor data and determines if there is a congestion problem on the road network. If a problem is identified, a binary neural pattern-matching component is used to identify past time periods with similar congestion events. This is able to rapidly search large historic traffic datasets finding sets of traffic control interventions carried out during similar historical time periods. The effectiveness of each intervention is evaluated using a Performance Index (PI) and the intervention that resulted in the highest improvement in PI is recommended to network managers. The FF-IDS system can also present traffic incidents and equipment faults that occurred during these historical time periods to the network manager as potential causes of the problem. This paper describes the FF–IDS system in detail. The system is currently under development. An early version of the FF–IDS system was trialled using off-line data from London, yielding encouraging preliminary results.

AURA–Alert: The use of binary associative memories for condition monitoring applications.

Jim Austin, Grant Brewer, Tom Jackson and Victoria J. Hodge
Proceedings of 7th International Conference on Condition Monitoring and Machinery Failure Prevention Technologies: (CM 2010 and MFPT 2010), Stratford–upon–Avon, England, June 22–24, 2010, pp. 699–711.
Conference Papers

Abstract

Many Condition Monitoring (CM) domains are suffering from the dual challenges of substantial increases in the volumes of data being produced and collected by sensing systems, and the challenges of modelling increasing complexity in the remote monitored systems. These two issues give rise to the problem that fast and reliable data mining of CM data is a computationally demanding task for real–time (or near real–time) applications. We present the use of AURA [1], a class of binary associative network built on correlation matrix memories (CMMs), as an underpinning technology for efficient, scalable pattern recognition in complex and large scale CM applications. AURA is a class of binary neural network. However, it has a number of advantages over standard neural network techniques for CM pattern classification tasks. These include; high levels of data compression, one-pass training for on–line training, a scalable architecture that can be readily mapped onto high performance computing platforms, and a sound theoretical basis to determine the bounds of the system operation. We describe applications illustrating how the AURA system can be optimised to create an extremely efficient and scalable k–nearest neighbour classifier for multi–variate models. We will also illustrate how the one-pass training capability of the AURA system can be used as the basis of normality and exception modelling in complex CM systems. This latter application has particularly powerful advantages for fault detection models in domains which are characterised by highly dynamic trends or drifting in the standard operational mode of a system, and which, as a result, are extremely difficult to accurately model. The application of the AURA techniques will be illustrated with industry led exemplars in the transport and energy sectors.

A computationally efficient method for online identification of traffic incidents and network equipment failures.

Victoria J. Hodge, Rajesh Krishnan, Jim Austin and John Polak.
Transport Science and Technology Congress: TRANSTEC 2010, Delhi, April 4–7, 2010.
Conference Papers

Abstract

Despite the vast wealth of traffic data available, currently there is only limited integration, analysis and utilisation of data in the transport domain. Yet, accurate congestion and incident detection is vital for traffic network operators to allow them to mitigate the cost of traffic incidents. Recurrent (cyclical) traffic congestion tends to be managed using timetabled control measures or through the use of adaptive traffic control systems such as SCOOT and SCATS. However, for non-recurrent congestion with rapid onset, such as the congestion caused by a traffic incident or traffic equipment failure, traffic network operators have to quickly detect the problem and then determine the likely cause before selecting the most appropriate action to both manage the traffic network and mitigate the congestion. This is a complex task requiring specialist knowledge where assistance from automated tools will help facilitate the operator tasks. Automated detection is becoming an increasingly viable option due to the increased use of traffic sensors in the road network. Therefore, the aim of the FREEFLOW project is to provide an Intelligent Decision Support (IDS) tool which is designed to complement existing fixed-time traffic control systems and adaptive systems SCOOT and SCATS. IDS will use traffic sensor data to rapidly identify traffic problems, recommend appropriate interventions that worked in the past for similar problems and assist the traffic network operators to pinpoint the cause of the problem. Recommendations will be displayed to the network operator who will use this knowledge to select the most appropriate course of action. This paper describes and analyses the components of the IDS tool used for identifying incidents and faulty equipment.

On Identifying Spatial Traffic Patterns using Advanced Pattern Matching Techniques.

Rajesh Krishnan, Victoria J. Hodge, Jim Austin, John Polak and Tzu–Chang Lee.
Proceedings of Transportation Research Board (TRB) 89th Annual Meeting, Washington, D.C., January 10–14, 2010. (DVD–ROM: 2010 TRB 89th Annual Meeting: Compendium of Papers).
Conference Papers

Abstract

The k-nearest neighbor algorithm (k–NN) has been used in the literature for traffic state estimation and prediction over the last decade or so. A number of such multivariate methods use input data from more than one traffic sensor. While a significant amount of discussion can be found in the literature aiming towards optimising the parameters of the k–NN for better accuracy of such models, limited research is available on configuring the k–NN to differentiate between different spatial patterns in the multivariate models. This paper presents an approach to distinguish spatial patterns from one another reliably in traffic variables observed using a number of point–based sensors in a neighbourhood of road links. The application of the proposed approach is demonstrated using AURA, a fast binary pattern matching tool based on neural networks. Two different spatial patterns of traffic congestion plus non–congested situations are simulated using a PARAMICS micro–simulation model. The AURA software is used to identify similar time periods of congestion using data from a congested time period as input using conventional and proposed distance metrics. It is shown that the proposed distance metrics can identify different spatial congestion patterns better than conventional methods. This method will be useful for traffic estimation and prediction methods that use the k–nearest neighbor algorithm or its variants.

A Computationally Efficient Method for Online Identification of Traffic Control Intervention Measures.

Rajesh Krishnan, Victoria J. Hodge, Jim Austin and John Polak
42nd Annual UTSG Conference, Centre for Sustainable Transport, University of Plymouth, UK: January 5–7, 2010
Conference Papers

Abstract

Adaptive traffic control systems such as SCOOT and SCATS are designed to respond to changes in traffic conditions and provide heuristically optimised traffic signal settings. However, these systems make gradual changes to signal settings in response to changing traffic conditions. In the EPSRC and TSB funded FREEFLOW project, a tool is being designed to rapidly identify severe traffic problems using traffic sensor data and recommend traffic signal plans and UTC parameters that have worked well in the past under similar traffic conditions for immediate implementation. This paper will present an overview of this tool, called the Intelligent Decision Support (IDS),that is designed to complement adaptive traffic control systems. The IDS is essentially a learning based system. It requires an historic database of traffic sensor data and traffic control intervention data for the application area as a knowledge base. The IDS, when deployed online, will monitor traffic sensor data to determine if the network is congested using traffic state estimation models. When IDS identifies congestion in the network, the historic database is queried for similar congestion events, where the similarity is based on both the severity and the spatial pattern of congestion. Traffic control interventions implemented during similar congestion events in the historic database are then evaluated for their effectiveness to mitigate congestion. The most effective traffic control interventions are recommended by IDS for implementation, along with an associated confidence indicator. The IDS is designed to work online against large historic datasets, and is based on traffic state estimation models developed at Imperial College London and pattern matching tools developed at the University of York. The IDS is tested offline using Inductive Loop Detector (ILD) data obtained from the ASTRID system and traffic control intervention data obtained from the UTC system at Transport for London (TfL) during its development. This paper presents the preliminary results using TfL data and outlines future research avenues in the development of IDS.

Data, Intelligent Decision Support and Pattern Matching.

Victoria J. Hodge, Mike Smith and Jim Austin
Procs of the XIII Meeting of the Euro Working Group on Transportation (EWGT’2009), Advances in Transportation Systems Analysis, Padua, Italy, Sept. 23–25, 2009.
Conference Papers

Abstract

Although a substantial amount of research has examined the constructs of warmth and competence, far less has examined how these constructs develop and what benefits may accrue when warmth and competence are cultivated. Yet there are positive consequences, both emotional and behavioral, that are likely to occur when brands hold perceptions of both. In this paper, we shed light on when and how warmth and competence are jointly promoted in brands, and why these reputations matter.

Optimising Activation of Bus Pre-signals.

Victoria J. Hodge, Tom Jackson and Jim Austin
Models and Technologies for Intelligent Transportation Systems, Proceedings of the International Conference: (G. Fusco, Ed.), Rome, June 22–23, 2009, pp. 344–353
Conference Papers

Abstract

This report describes preliminary analysis of strategies to activate and deactivate a bus pre-signal using vehicle count data. The bus pre–signal currently operates during preset times to regulate access to a length of road controlled at the other end by vehicle-actuated traffic signals. However, vehicle flows at the pre–signal vary on a daily basis so a more demand-based approach would be more effective. There has been much research performed to optimise pre–signal cycle times and bus priority at pre–signals. We focus on identifying the optimal strategy to activate and deactivate the bus pre–signal using vehicle demand rather than the current fixed time strategy. The ideal strategy should be stable, robust, consistent and timely. We investigate strategies using vehicle counts, queueing theory and estimation and prediction. Our recommended strategy combines aspects of all three areas.

Intelligent Car Park Routeing for Road Traffic.

Victoria J. Hodge, Mike Smith and Jim Austin
Models and Technologies for Intelligent Transportation Systems, Proceedings of the International Conference: (G. Fusco, Ed.), Rome, June 22–23, 2009, pp. 344–353
Conference Papers

Abstract

The twin problems of congestion and inner-city parking limitations affect many cities. One solution is to promote the use of Park and Ride sites. However, for effective use of the sites, drivers need to know where the sites are and which is the "best" site to use. This work introduces a methodology to pinpoint and guide drivers to the best Park and Ride site from their current location. While drivers may be able to obtain traffic, car park location and free space data individually, the information is not usually coordinated. By fusing up–to–date details of traffic jams, roadworks and accidents coupled with free parking spaces and combining this with a novel route weighting methodology, we are able to ensure that intelligent information is displayed to guide drivers. The method uses optimised data structures and proprietary scoring measures to ensure it is fast and accurate. The method provides a simple and low cost solution through the use of existing technologies to display information to drivers.

A Binary Neural Shape Matcher using Johnson Counters and Chain Codes.

Victoria J. Hodge, Simon O’Keefe and Jim Austin
NeuroComputing, 72(2009), Elsevier Science, 2009, pp. 693–703.
Journal Paper

Abstract

In this paper, we introduce a neural network-based shape matching algorithm that uses Johnson Counter codes coupled with chain codes. Shape matching is a fundamental requirement in content–based image retrieval systems. Chain codes describe shapes using sequences of numbers. They are simple and flexible. We couple this power with the efficiency and flexibility of a binary associative–memory neural network. We focus on the implementation details of the algorithm when it is constructed using the neural network. We demonstrate how the binary associative–memory neural network can index and match chain codes where the chain code elements are represented by Johnson codes.

Identifying Perceptual Structures In Trademark Images.

Victoria J. Hodge, Garry Hollier, Jim Austin and John Eakins
Proceedings of Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications (SPPRA 2008), Innsbruck, Austria, February 13–15, 2008.
Conference Papers

Abstract

In this paper we focus on identifying image structures at different levels in figurative (trademark) images to allow higher level similarity between images to be inferred. To identify image structures at different levels, it is desirable to be able to achieve multiple views of an image at different scales and then extract perceptually–relevant shapes from the different views. The three aims of this work are: to generate multiple views of each image in a principled manner, to identify structures and shapes at different levels within images and to emulate the Gestalt principles to guide shape finding. The proposed integrated approach is able to meet all three aims.

Inducing a Perceptual Relevance Shape Classifier.

Victoria J. Hodge, John Eakins and Jim Austin
Proceedings of ACM CIVR 2007: The 6th International Conference on Image and Video Retrieval, University of Amsterdam, The Netherlands, July 9–11 2007.
Conference Papers

Abstract

In this paper, we develop a system to classify the outputs of image segmentation algorithms as perceptually relevant or perceptually irrelevant with respect to human perception. The work is aimed at figurative images. We previously investigated human visual perception of trademark images and established a body of ground truth data in the form of trademark images and their respective human segmentations. The work indicated that there is a core set of segmentations for each image that people perceive. Here we use this core set of segmentations to train a classifier to classify closed shapes output from an image segmentation algorithm so that the method returns the image segments that match those produced by people. We demonstrate that a perceptual relevance classifier is attainable and identify a good methodology to achieve this. The paper compares MLP, SVM, Bayes and regression classifiers for classifying shapes. MLPs perform best with an overall accuracy of 96.4%.

Layout Indexing of Trademark Images.

Reinier H. van Leuken, M. Fatih Demirci, Victoria J. Hodge, Jim Austin and Remco C. Veltkamp
Proceedings of ACM CIVR 2007: The 6th International Conference on Image and Video Retrieval, University of Amsterdam, The Netherlands, July 9–11 2007.
Conference Papers

Abstract

Ensuring the uniqueness of trademark images and protecting their identities are the most important objectives for the trademark registration process. To prevent trademark infringement, each new trademark must be compared to a database of existing trademarks. Given a newly designed trademark image, trademark retrieval systems are not only concerned with finding images with similar shapes but also locating images with similar layouts. Performing a linearsearch, i.e., computing the similarity between the query and each database entry and selecting the closest one, is ineffi- cient for large database systems. An effective and efficient indexing mechanism is, therefore, essential to select a small collection of candidates. This paper proposes a framework in which a graph-based indexing schema will be applied to facilitate efficient trademark retrieval based on spatial relations between image components, regardless of mutual shape similarity. Our framework starts by segmenting trademark images into distinct shapes using a shape identification algorithm. Identified shapes are then encoded automatically into an attributed graph whose vertices represent shapes and whose edges show spatial relations (both directional and topological) between the shapes. Using a graph–based indexing schema, the topological structure of the graph as well as that of its subgraphs are represented as vectors in which the components correspond to the sorted Laplacian eigenvalues of the graph or subgraphs. Having established the signatures, the indexing amounts to a nearest neighbour search in a model database. For a query graph and a large graph data set, the indexing problem is reformulated as that of fast selection of candidate graphs whose signatures are close to the query signature in the vector space. An extensive set of recognition trials, including a comparison with manually constructed graphs, show the efficacy of both the automatic graph construction process and the indexing schema.

A Binary Neural Shape Matcher using Johnson Counters and Chain Codes.

Victoria J. Hodge, Simon O’Keefe and Jim Austin
In Second International Conference on Brain Inspired Cognitive Systems 2006 (BICS 2006), Island of Lesvos, Greece. October 10–14, 2006.
Conference Papers

Abstract

In this paper, we introduce a neural network-based shape matching algorithm that uses Johnson Counter codes coupled with chain codes. Shape matching is a fundamental requirement in content-based image retrieval systems. Chain codes describe shapes using sequences of numbers. They are simple and flexible. We couple this power with the efficiency and flexibility of a binary associative–memory neural network. We focus on the implementation details of the algorithm when it is constructed using the neural network. We demonstrate how the binary associative–memory neural network can index and match chain codes where the chain code elements are represented by Johnson codes.

A Binary Neural Decision Table Classifier.

Victoria J. Hodge, Simon O’Keefe and Jim Austin
NeuroComputing, 69(16–18), October, 2006, pp. 1850–1859.
Journal Paper

Abstract

In this paper, we introduce a neural network–based decision table algorithm. We focus on the implementation details of the decision table algorithm when it is constructed using the neural network. Decision tables are simple supervised classifiers which, Kohavi demonstrated, can outperform state–of–the–art classifiers such as C4.5. We couple this power with the efficiency and flexibility of a binary associative–memory neural network. Initially, we demonstrate how the binary associative-memory neural network can form the decision table index to map between attribute values and data records and subsequently we show how two attribute selection algorithms can be used to pre–select attributes for this decision table. The attribute selection algorithms are easily implemented within the same binary associative–memory framework producing a tightly coupled, two–tier system allowing attribute selection and decision table indexing. The first attribute selector uses mutual information between attributes and classes to select the attributes that classify best. The second attribute selector uses a probabilistic approach to evaluate randomly selected attribute subsets.

Eliciting Perceptual Ground Truth for Image Segmentation

Victoria J. Hodge, Garry Hollier, John Eakins and Jim Austin
Proceedings of Image and Video Retrieval, 5th International Conference, CIVR 2006, Tempe, AZ, USA, July 13–15, 2006. H. Sundaram, M.R. Naphade, J.R. Smith, Y. Rui (Eds.), Lecture Notes in Computer Science 4071, pp. 320–329,
Conference Papers

Abstract

In this paper, we investigate human visual perception and establish a body of ground truth data elicited from human visual studies. We aim to build on the formative work of Ren, Eakins and Briggs who produced an initial ground truth database. Human participants were asked to draw and rank their perceptions of the parts of a series of figurative images. These rankings were then used to score the perceptions, identify the preferred human breakdowns and thus allow us to induce perceptual rules for human decomposition of figurative images. The results suggest that the human breakdowns follow well–known perceptual principles in particular the Gestalt laws.

Eliciting Perceptual Ground Truth for Image Segmentation

Victoria J. Hodge, John Eakins and Jim Austin
Technical Report YCS–2006–401, Department of Computer Science, University of York, UK, January 2006
Tech report

Abstract

In this paper, we investigate human visual perception and establish a body of ground truth data elicited from human visual studies. We aim to build on the formative work of Ren, Eakins and Briggs who produced an initial ground truth database. Human subjects were asked to draw and rank their perceptions of the parts of a series of figurative images. These rankings were then used to score the perceptions, identify the preferred human breakdowns and thus allow us to induce perceptual rules for human decomposition of figurative images. The results suggest that the human breakdowns follow well–known perceptual principles in particular the Gestalt laws.

A Binary Neural k–Nearest Neighbour Technique.

Victoria J. Hodge and Jim Austin
Knowledge and Information Systems, 8(3), Springer–Verlag London Ltd, September 2005, pp. 276–292.
Journal Paper

Abstract

K-Nearest Neighbour (k–NN) is a widely used technique for classifying andclustering data. k–NN is effective but is often criticised for its polynomial run–time growth as k–NN calculates the distance to every other record in the data set for eachrecord in turn. This paper evaluates a novel k–NN classifier with linear growth andfaster run-time built from binary neural networks. The binary neural approach usesrobust encoding to map standard ordinal, categorical and real-valued data sets onto abinary neural network. The binary neural network uses high speed pattern matching torecall the k–best matches. We compare various configurations of the binary approachto a conventional approach for memory overheads, training speed, retrieval speed andretrieval accuracy. We demonstrate the superior performance with respect to speed andmemory requirements of the binary approach compared to the standard approach andwe pinpoint the optimal configurations.

A Survey of Outlier Detection Methodologies.

Victoria J. Hodge and Jim Austin
Artificial Intelligence Review, 22(2), Kluwer Academic Publishers, October 2004, pp. 85–126.
Journal Paper

Abstract

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.

A Binary Neural Decision Table Classifier.

Victoria J. Hodge, Simon O’Keefe and Jim Austin
Proceedings Brain Inspired Cognitive Systems 2004 (BICS 2004), University of Stirling, UK, August 29–September 1, 2004
Conference Papers

Abstract

In this paper, we introduce a neural network-based decision table algorithm. We focus on the implementation details of the decision table algorithm when it is constructed using the neural network. Decision tables are simple supervised classifiers which, Kohavi demonstrated, can outperform state–of–the–art classifiers such as C4.5. We couple this power with the efficiency and flexibility of a binary associative–memory neural network. We demonstrate how the binary associative-memory neural network can form the decision table index to map between attribute values and data records. We also show how two attribute selection algorithms, which may be used to pre–select the attributes for the decision table, can easily be implemented within the binary associative–memory neural framework. The first attribute selector uses mutual information between attributes and classes to select the attributes that classify best. The second attribute selector uses a probabilistic approach to evaluate randomly selected attribute subsets.

A High Performance k–NN Approach Using Binary Neural Networks.

Victoria J. Hodge, Ken Lees and Jim Austin
Neural Networks, 17(3), Elsevier Science, April 2004, pp. 441–458.
Journal Paper

Abstract

This paper evaluates a novel k–nearest neighbour (k–NN) classifier built from binary neural networks. The binary neural approach uses robust encoding to map standard ordinal, categorical and numeric data sets onto a binary neural network. The binary neural network uses high speed pattern matching to recall a candidate set of matching records, which are then processed by a conventional k–NN approach to determine the k–best matches. We compare various configurations of the binary approach to a conventional approach for memory overheads, training speed, retrieval speed and retrieval accuracy. We demonstrate the superior performance with respect to speed and memory requirements of the binary approach compared to the standard approach and we pinpoint the optimal configurations.

A Comparison of Standard Spell Checking Algorithms and a Binary Neural Approach.

Victoria J. Hodge and Jim Austin
IEEE Transactions on Knowledge and Data Engineering, 15(5), September-October 2003, pp. 1073–1081.
Journal Paper

Abstract

Abstract In this paper, we propose a simple, flexible, and efficient hybrid spell checking methodology based upon phonetic matching, supervised learning, and associative matching in the AURA neural system. We integrate Hamming Distance and n–gram algorithms that have high recall for typing errors and a phonetic spell–checking algorithm in a single novel architecture. Our approach is suitable for any spell–checking application though aimed toward isolated word error correction, particularly spell checking user queries in a search engine. We use a novel scoring scheme to integrate the retrieved words from each spelling approach and calculate an overall score for each matched word. From the overall scores, we can rank the possible matches. We evaluate our approach against several benchmark spellchecking algorithms for recall accuracy. Our proposed hybrid methodology has the highest recall rate of the techniques evaluated. The method has a high recall rate and low–computational cost.

Improved AURA k–Nearest Neighbour Approach.

Michael Weeks, Vicky Hodge, Simon O’Keefe, Jim Austin and Ken Lees
Proceedings of International Work–conference on Artificial and Natural Neural Networks (IWANN–2003), Menorca, Spain, June 3–6, 2003. Lecture Notes in Computer Science (LNCS) 2687, Springer Verlag, Berlin.
Conference Papers

Abstract

The k-Nearest Neighbour (kNN) approach is a widely-used technique for pattern classification. Ranked distance measurements to a known sample set determine the classification of unknown samples. Though effective, kNN, like most classification methods does not scale well with increased sample size. This is due to their being a relationship between the unknown query and every other sample in the data space. In order to make this operation scalable, we apply AURA to the kNN problem. AURA is a highly–scalable associative-memory based binary neural-network intended for high-speed approximate search and match operations on large unstructured datasets. Previous work has seen AURA methods applied to this problem as a scalable, but approximate kNN classifier. This paper continues this work by using AURA in conjunction with kernel-based input vectors, in order to create a fast scalable kNN classifier, whilst improving recall accuracy to levels similar to standard kNN implementations.

A Comparison of a Novel Neural Spell Checker and Standard Spell Checking Algorithms.

Victoria J. Hodge and Jim Austin
Pattern Recognition, 35(11), Elsevier Science, 2002, pp. 2571–2580.
Journal Paper

Abstract

In this paper, we propose a simple and flexible spell checker using efficient associative matching in the AURA modular neural system. Our approach aims to provide a pre-processor for an information retrieval (IR) system allowing the user’s query to be checked against a lexicon and any spelling errors corrected, to prevent wasted searching. IR searching is computationally intensive so much so that if we can prevent futile searches we can minimise computational cost. We evaluate our approach against several commonly used spell checking techniques for memory–use, retrieval speed and recall accuracy. The proposed methodology has low memory use, high speed for word presence checking, reasonable speed for spell checking and a high recall rate.

Hierarchical Word Clustering: automatic thesaurus generation.

Victoria J. Hodge and Jim Austin
NeuroComputing, 48(1–4), Elsevier Science, 2002, pp. 819–846.
Journal Paper

Abstract

In this paper, we propose a hierarchical, lexical clustering neural network algorithm that automatically generates a thesaurus (synonym abstraction) using purely stochastic information derived from unstructured text corpora and requiring no prior word classifications. The lexical hierarchy overcomes the Vocabulary Problem by accommodating paraphrasing through using synonym clusters and overcomes Information Overload by focusing search within cohesive clusters. We describe existing word categorisation methodologies, identifying their respective strengths and weaknesses and evaluate our proposed approach against an existing neural approach using a benchmark statistical approach and a human generated thesaurus for comparison. We also evaluate our word context vector generation methodology against two similar approaches to investigate the effect of word vector dimensionality and the effect of the number of words in the context window on the quality of word clusters produced. We demonstrate the effectiveness of our approach and its superiority to existing techniques.

Scalability of a Distributed Neural Information Retrieval System.

Michael Weeks, Victoria J. Hodge and Jim Austin
Presented at, 11th IEEE International Symposium on High Performance Distributed Computing (HPDC–2002), Edinburgh, Uk, July 24–26, 2002.
Conference Papers

Abstract

Summary form only given. AURA (Advanced Uncertain Reasoning Architecture) is a generic family of techniques and implementations intended for high–speed approximate search and match operations on large unstructured datasets. AURA technology is fast, economical, and offers unique advantages for finding near-matches not available with other methods. AURA is based upon a high–performance binary neural network called a correlation matrix memory (CMM). Typically, several CMM elements are used in combination to solve soft or fuzzy pattern–matching problems. AURA takes large volumes of data and constructs a special type of compressed index. AURA finds exact and near–matches between indexed records and a given query, where the query itself may have omissions and errors. The degree of nearness required during matching can be varied through thresholding techniques. The PCI-based PRESENCE (Parallel Structured Neural Computing Engine) card is a hardware-accelerator architecture for the core CMM computations needed in AURA–based applications. The card is designed for use in low–cost workstations and incorporates 128 MByte of low–cost DRAM for CMM storage. To investigate the scalability of the distributed AURA system, we implement a word–to–document index of an AURA–based information retrieval system, called MinerTaur, over a distributed PRESENCE CMM.

A Hardware Accelerated Novel IR System.

Michael Weeks, Victoria J. Hodge and Jim Austin
Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and Network–based Processing (PDP–2002), Las Palmas de Gran Canaria, Spain, January 9–11, 2002.
Conference Papers

Abstract

AURA (Advanced Uncertain Reasoning Architecture) is a generic family of techniques and implementations intended for high-speed approximate search and match operations on large unstructured datasets. This paper continues the AURA II (Advanced Uncertain Reasoning Architecture) project’s research into distributed binary Correlation Matrix Memory (CMM) based upon the PRESENCE (PaRallEl Structured Neural Computing Engine) hardware architecture [14]. Previous work has described how CMMs can be seamlessly implemented onto multiple hardware PRESENCE cards to accelerate core CMM operations. To demonstrate the system, this paper describes how a novel CMM-based information retrieval (IR) system, called MinerTaur, was implemented using multiple PRESENCE cards distributed across a cluster.

Integrating Information Retrieval & Neural Networks.

Victoria J. Hodge
PhD Thesis, Department of Computer Science, University of York, Heslington, York, UK, Sept. 2001.
Thesis

Abstract

Due to the proliferation of information in databases and on the Internet, users are overwhelmed leading to Information Overload. It is impossible for humans to index and search such a wealth of information by hand so automated indexing and searching techniques are required. In this dissertation, we explore current Information Retrieval (IR) techniques and their shortcomings and we consider how more sophisticated approaches can be developed to aid retrieval. Current techniques can be slow due to the sheer volume of the search space although faster ones are being developed. Matching is often poor, as the quantity of retrievals does not necessarily indicate quality retrievals. Many current approaches simply return the documents containing the greatest number of ‘query words’. A methodology is desired to: process documents unsupervised; generate an index using a data structure that is memory efficient, speedy, incremental and scalable; identify spelling mistakes in the query and suggest alternative spellings; handle paraphrasing of documents and synonyms for both indexing and searching; to focus retrieval by minimising the search space; and, finally calculate the query-document similarity from statistics autonomously derived from the text corpus. We describe our IR system named MinerTaur, developed using both the AURA modular neural system and a hierarchical, growing self-organising neural technique based on Growing Cell Structures which we call TreeGCS. We integrate three modules in MinerTaur: a spell checker; a hierarchical thesaurus generated from corpus statistics inferred by the system; and, a word-document matrix to efficiently store the associations between the documents and their constituent words. We describe each module individually and evaluate each against comparative data structures and benchmark implementations. We identify improved memory usage, spelling recall accuracy, cluster quality and training and recall times for the modules. Finally we compare MinerTaur against a benchmark IR system, SMART developed at Cornell University, and reveal superior recall and precision for MinerTaur versus SMART.

An Evaluation of Phonetic Spell Checkers.

Victoria J. Hodge, and Jim Austin
Technical Report YCS 338(2001), Department of Computer Science, University of York, UK, Sept. 2001.
Tech report

Abstract

In the work reported here, we describe a phonetic spell-checking algorithm integrating aspects of Soundex and Phonix. We increase the number of letter codes compared to Soundex and Phonix. We also integrate phonetic rules but use far less than Phonix where retrieval may be slow due to the computational cost of comparing the input to a large list of transformation rules. Our algorithm aims to repair spelling errors where the user has substituted homophones in place of the correct spelling. We evaluate our algorithm by comparing it to three alternative spell-checking algorithms and three benchmark spell checkers (MS Word 97 & 2000 and UNIX ‘ispell’) using a list of phonetic spelling errors. We find that our approach has superior recall (percentage of correct matches retrieved) to the alternative approaches although the higher recall is at the expense of precision (number of possible matches retrieved). We intend our phonetic spell checker to be integrated into an existing spell checker so the precision will be improved by integration thus high recall is the aim for our approach in this paper.

A Novel Binary Spell Checker.

Victoria J. Hodge and Jim Austin
Proceedings of the International Conference on Artificial Neural Networks (ICANN’2001), Vienna, Austria, 25–29 August, 2001. In, Dorffner, Bischof & Hornik (Eds), Lecture Notes in Computer Science (LNCS) 2130, Springer Verlag, Berlin
Conference Papers

Abstract

n this paper we propose a simple, flexible and efficient hybrid spell checking methodology based upon phonetic matching, supervised learning and associative matching in the AURA neural system. We evaluate our approach against several benchmark spell-checking algorithms for recall accuracy. Our proposed hybrid methodology has the joint highest top 10 recall rate of the techniques evaluated. The method has a high recall rate and low computational cost.

An Integrated Neural IR System.

Victoria J. Hodge and Jim Austin
Proceedings of the 9th European Symposium on Artificial Neural Networks (ESANN’2001), Bruges, Belgium, 25–27 April 2001, pp. 265–270.
Conference Papers

Abstract

Over the years the amount and range of electronic text stored on the WWW has expanded rapidly, overwhelming both users and tools designed to index and search the information. It is impossible to index the WWW dynamically at query time due to the sheer volume so the index must be pre–compiled and stored in a compact but incremental data structure as the information is ever–changing. Much of the text is unstructured so a data structure must be constructed from such text, storing associations between words and the documents that contain them. The index must be able to index fine–grained word-based associations and also handle more abstract concepts such as synonym groups. A search tool is also required to link to the index and enable the user to pinpoint their required information. We describe such a system we have developed in an integrated hybrid neural architecture and evaluate our system against the benchmark SMART system for retrieval accuracy: recall and precision.

An Evaluation of Standard Retrieval Algorithms and a Binary Neural Approach.

Victoria J. Hodge and Jim Austin
Neural Networks, 14(3), Elsevier Science, April 2001, pp. 287–303.
Journal Paper

Abstract

In this paper we evaluate a selection of data retrieval algorithms for storage efficiency, retrieval speed and partial matching capabilities using a large Information Retrieval dataset. We evaluate standard data structures, for example inverted file lists and hash tables, but also a novel binary neural network that incorporates: single–epoch training, superimposed coding and associative matching in a binary matrix data structure. We identify the strengths and weaknesses of the approaches. From our evaluation, the novel neural network approach is superior with respect to training speed and partial match retrieval time. From the results, we make recommendations for the appropriate usage of the novel neural approach.

Hierarchical Growing Cell Structures: TreeGCS.

Victoria J. Hodge and Jim Austin
IEEE Transactions on Knowledge and Data Engineering, Special Issue on Connectionist Models for Learning in Structured Domains, 13(2), March 2001, pp. 207–218.
Journal Paper

Abstract

We propose a hierarchical clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of B. Fritzke (1993). Our algorithm refines and builds upon the GCS base, overcoming an inconsistency in the original GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. Our algorithm is unsupervised, flexible, and dynamic and we have imposed no additional parameters on the underlying GCS algorithm. Our ultimate aim is a hierarchical clustering neural network that is both consistent and stable and identifies the innate hierarchical structure present in vector-based data. We demonstrate improved stability of the GCS foundation and evaluate our algorithm against the hierarchy generated by an ascendant hierarchical clustering dendrogram. Our approach emulates the hierarchical clustering of the dendrogram. It demonstrates the importance of the parameter settings for GCS and how they affect the stability of the clustering.

Hierarchical Growing Cell Structures: TreeGCS.

Victoria J. Hodge and Jim Austin
Proceedings of the Fourth International Conference on Knowledge–Based Intelligent Engineering Systems (KES’2000), Brighton, UK, August 30–Sept. 1, 2000
Conference Papers

Abstract

We propose a hierarchical, unsupervised clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of Fritzke. Our algorithm improves an inconsistency in the GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. We demonstrate improved stability of the GCS foundation by alternating the input vector order on each presentation. We evaluate our automatically produced cluster hierarchy against that generated by an ascendant hierarchical clustering dendrogram. We use a small dataset to illustrate how our approach emulates the hierarchical clustering of the dendrogram, regardless of the input vector order.

An Evaluation of Standard Retrieval Algorithms and a Weightless Neural Approach.

Victoria J. Hodge and Jim Austin
Proceedings of the IEEE–INNS–ENNS International Joint Conference on Neural Networks (IJCNN’2000), Italy, July 24–27, 2000
Conference Papers

Abstract

Many computational processes require efficient algorithms, those that both store and retrieve data efficiently and rapidly. In this paper we evaluate a selection of data structures for storage efficiency, retrieval speed and partial matching capabilities using a large information retrieval dataset. We evaluate standard data structures, for example inverted file lists and hash tables but also a novel binary neural network that incorporates superimposed coding, associative matching and row-based retrieval. We identify the strengths and weaknesses of the approaches. The novel neural network approach is superior with respect to training speed and partial match retrieval time.

Papers from the AAAI Workshop.

Victoria J. Hodge and Jim Austin, Co-Chairs
Technical Report WS–99–04, Association for the Advancement of Artificial Intelligence (AAAI) Press, Palo Alto, USA, 63 pp., ISBN 978–1–57735–088–0.
Tech report

Abstract

Current AI methods lack the flexibility and reliability of biological information processing systems and, although a great deal is known about the construction of biological systems, this knowledge has had little impact on main stream AI. If we are to progress toward building machines with the abilities of the natural computing systems, closer collaboration between those studying biological information processing systems and AI and neural computing is essential. This workshop wasspecifically designed to bring these two groups together, with the aim of providing indicators on how the brain may organize and process information, so that this knowledge may initiate new ways to think about computation. The workshop focused on topics of common interest to neurobiologists and those working in neural networks and other approaches to intelligent systems. It focusrf on the low-level mechanisms involved in biological systems and how these may be exploited by the brain to bring about intelligent behavior.

  • 17 Oct 2019
    New journal article on win prediction in esports.
    image

    IEEE Transactions on Games

    Esports are competitive videogames watched by audiences. Most esports generate detailed data for each match that are publicly available. Esports analytics research is focused on predicting match outcomes. Previous research has emphasised pre-match prediction and used data from amateur games, which are more easily available than professional level. However, the commercial value of win prediction exists at the professional level. Furthermore, predicting real-time data is unexplored, as is its potential for informing audiences. Here we present the first comprehensive case study on live win prediction in a professional esport. We provide a literature review for win prediction in a multi-player online battle arena (MOBA) esport. The paper evaluates the first professional-level prediction models for live DotA 2 matches, one of the most popular MOBA games and trials it at a major international esports tournament. ...

  • 23 Aug 2019
    New conference paper on Deep Learning for Micro-Prediction in esports.
    image

    IEEE COG 2019 PROCEEDINGS

    Micro-predictions are of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window.

  • 13 Jun 2019
    VentureBeat Feature Article - This Dota 2 AI predicts player death within a 5-second window.
    image

    If there’s one insight that might be gleaned from continuing AI research, it’s that many events once assumed unknowable are, in fact, predictable with relatively high accuracy. Case in point? A paper (“Time to Die: Death Prediction in Dota 2 using Deep Learning“) published by researchers at the University of York describes a system that can reliably anticipate (within a 5-second window) which player characters won’t survive Dota 2 matches.

    Click here to find out more ...

  • 22 March 2019
    Conference presentation: A Psychometric Evaluation of Emotional Responses to Horror Music.
    image

    Our research explores and designs an effective experimental interface to evaluate people's emotional responses to horror music. We studied methodological approaches by using traditional psychometric techniques to measure emotional responses, including self-reporting, and galvanic skin response (GSR). GSR correlates with psychological arousal. It can help circumvent a problem in self-reporting where people are unwilling to report particular felt responses, or confuse perceived and felt responses. We also consider the influence of familiarity. Familiarity can induce learned emotional responses rather than listeners describing how it actually makes them feel. The research revealed different findings in self-reports and GSR data. Both measurements had an interaction between music and familiarity but show inconsistent results from the perspective of simple effects.

  • 09 Feb 2019
    Press Feature Article - Drones can change the future of search and rescue.
    image

    An academic from York has called for understanding on the use of drone technology, following negative reports around the UK.

    Click here to find out more ...

  • -
    Editorial Boards


 

 

 

At My Office

You can find me at my office located at the University of York, UK. I am on the Heslington East Campus in the Ron Cooke Hub. I am usually at my office during the day, but you may consider sending me an e–mail to fix an appointment.