Research

In theory there is no difference between theory and practice. In practice there is.

Yogi Berra
My research interests are in model-based software and systems engineering, in mining software repositories and communities to extract actionable insights, and in technologies for persisting and analysing large volumes of heterogeneous data.

Model-Based Software Engineering

Model-based software engineering (MBSE) is the practice of raising models to first-class artefacts of the software engineering process, using such models to analyse, simulate and reason about properties of the system under development, and eventually often auto-generate a part of its implementation.

MBSE brings and adapts well-understood and long-established principles and practices of trustworthy systems engineering to software engineering (it is unthinkable to start constructing e.g. a bridge or an aircraft without designing and analysing several models of it first) and is used extensively in organisations that produce business- or safety-critical software (e.g. in the aerospace, automotive and robotics industries), where defects can have catastrophic effects (e.g. loss of life) or can be very expensive to remedy (e.g. large scale product recall). MBSE is also increasingly used for non-critical systems due to the productivity and consistency benefits (largely through automated code generation) it delivers (e.g. JHipster for microservice architectures).

I have authored many highly-cited peer-reviewed papers on topics related to MBSE and I am leading the development of the Epsilon open-source MBSE platform under the Eclipse Foundation, which has a wide user base, including engineers at organisations such as NASA, IBM, BAE Systems and THALES. I am on the Program Committee of the ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, and I have been the Technical Director of a large European Commission project (MONDO, €2.67M) which investigated techniques for scaling up MBSE technologies for very large systems. I am currently involved in knowledge transfer projects with Rolls-Royce, Smith & Nephew and IBM which aim at applying the results of our MBSE research on problems of interest to our industry partners.

Big Data Persistence and Analytics Architectures

The need for levels of availability and scalability beyond those supported by relational databases has led to the emergence of a new generation of purpose-specific databases grouped under the term NoSQL. In general, NoSQL databases are designed with horizontal scalability as a primary concern and deliver increased availability and fault-tolerance at a cost of temporary inconsistency and reduced durability of data. To balance the requirements for data consistency and availability, organisations increasingly migrate towards hybrid data persistence architectures comprising both relational and NoSQL databases. The consensus is that this trend will only become stronger in the future; critical data will continue to be stored in ACID (predominately relational) databases while non-critical data will be progressively migrated to high-availability NoSQL databases.

TYPHON is a European Commission H2020 project (2018-2020, €4.4M), of which I am the Technical Director, which aims to provide a methodology and an integrated technical offering for designing, developing, querying and evolving scalable architectures for persistence, analytics and monitoring of large volumes of hybrid (relational, graph-based, document-based, natural language etc.) data.

TYPHON brings together research partners with a long track record of conducting internationally-leading research on software modelling, domain-specific languages, text mining and data migration, and of delivering research results in the form of robust and widely-used open-source software, industrial partners active in the automotive, earth observation, banking, and motorway operation domains, an industrial advisory board of world-class experts in the fields of databases, business intelligence and analytics, and large-scale data management, and a global consortium including more than 400 organisations from all sectors of IT.

Mining Software Repositories and Communities

Deciding whether an open source software (OSS) product or component meets the required standards for adoption in terms of quality, maturity, activity of development and user support is not a straightforward process. It involves exploring various sources of information including its source code repositories to identify how actively the code is developed, which programming languages are used, how well the code is commented, whether there are unit tests etc., communication channels such as newsgroups, forums and mailing lists to identify whether user questions are answered in a timely and satisfactory manner, to estimate the number of experts and users of the software, its bug tracking system to identify whether the software has many open bugs and at which rate bugs are fixed, and other relevant metadata such as the number of downloads, the license(s) under which it is made available, its release history etc.

Having been involved in open-source software development for more than a decade, I have developed an interest in automatically analysing software repositories and communities to guide and support future software development. I was the Technical Director of OSSMETER (2011-14, €2.7M), a European Commission FP7 project that developed a platform for incremental analysis of source code repositories, bug trackers and communication channels, to support decision makers in the process of discovering, comparing, assessing and monitoring the health, quality, impact and activity of open-source software.

I am currently a principal investigator in a follow-up project (CROSSMINER, 2017-20, €4.4M) which is investigating techniques for mining information from different sources and making it available within the Eclipse IDE. CROSSMINER builds on the results of OSSMETER and is the driving force behind the new Eclipse CROSSMETER project, where I am a committer.