Skip to content

Mining Software Repositories and Communities

Deciding whether an open source software (OSS) product or component meets the required standards for adoption in terms of quality, maturity, activity of development and user support is not a straightforward process. It involves exploring various sources of information including its source code repositories to identify how actively the code is developed, which programming languages are used, how well the code is commented, whether there are unit tests etc., communication channels such as newsgroups, forums and mailing lists to identify whether user questions are answered in a timely and satisfactory manner, to estimate the number of experts and users of the software, its bug tracking system to identify whether the software has many open bugs and at which rate bugs are fixed, and other relevant metadata such as the number of downloads, the license(s) under which it is made available, its release history etc.

Having been involved in open-source software development for more than a decade, I have developed an interest in automatically analysing software repositories and communities to guide and support future software development. I was the Technical Director of OSSMETER (2011-14, €2.7M), a European Commission FP7 project that developed a platform for incremental analysis of source code repositories, bug trackers and communication channels, to support decision makers in the process of discovering, comparing, assessing and monitoring the health, quality, impact and activity of open-source software.

I am currently a principal investigator in a follow-up project (CROSSMINER, 2017-20, €4.4M) which is investigating techniques for mining information from different sources and making it available within the Eclipse IDE. CROSSMINER builds on the results of OSSMETER and is the driving force behind the new Eclipse SCAVA project, where I am a committer.