Search-Based Synthesis of Probabilistic Models

for Quality-of-Service Software Engineering

Simos Gerasimou, Giordano Tamburrelli, Radu Calinescu
The formal verification of finite-state probabilistic systems supports the engineering of software with strict quality-of-service (QoS) requirements. However, its use in software design is currently a tedious process of manual multiobjective optimisation. Software designers must build and verify probabilistic models for numerous alternative architectures and instantiations of the system parameters. When successful, they end up with feasible but often suboptimal models. The EvoChecker search-based software engineering approach and tool introduced in our paper employ multiobjective optimisation genetic algorithms to automate this process and considerably improve its outcome. We evaluate EvoChecker for six variants of two software systems from the domains of dynamic power management and foreign exchange trading. These systems are characterised by different types of design parameters and QoS requirements, and their design spaces comprise between 2E+14 and 7.22E+86 relevant alternative designs. Our results provide strong evidence that EvoChecker significantly outperforms the current practice and yields actionable insights for software designers.

The main contributions of this work are:

Our paper has been accepted at the 30th Intl. Conference on Automated Software Engineering (ASE 2015)

We have also made available the camera-ready version of our paper.

EvoChecker

EvoChecker Modelling Language

EvoChecker allows software designers to explore different system designs using probabilistic model templates specified using an extension of the PRISM modelling language with constructs that support:
  1. Evolvable modules, i.e., the specification of n ≥ 2 alternative module implementations
    evolve module mod implementation1 endmodule
    ...
    evolve module mod implementationn endmodule
  2. Evolvable parameters, i.e., model parameters of type 'int' and 'double', and acceptable ranges for them
    evolve int param [min..max];
    evolve double param [min..max];
  3. Evolvable probability distributions, i.e., an n-element discrete probability distribution, and ranges for the n ≥ 2 probabilities of the distribution
    evolve distribution dist [min1..max1] ... [minn..maxn];
    where [mini..maxi] ⊆ [0,1] for all 1 ≤ i ≤ n is used to declare the n-element probability distribution
Probabilistic model template (T): any valid PRISM probabilistic model (M) augmented with EvoChecker constructs 1-3
Design space of T (DST): the set of all probabilistic models that are instances of the probabilistic model template T

Probabilistic Model Synthesis Problem

Consider a software system with a set of QoS requirements comprising n1≥0 constraints and n2≥1 optimisation objectives:
ps

Given a probabilistic model template T with design space DST, and the QoS requirements above, the probabilistic model synthesis problem involves finding the Pareto optimal design set PS of models from DST that satisfy the n1 constraints and are non-dominated with respect to the n2 optimisation objectives:
ps

EvoChecker Tool

approach
High-level EvoChecker architecture
The EvoChecker tool uses the implementations supported by JMetal for the multiobjective optimisation Genetic algorithms and integrates an embedded instance of the probabilistic model checker PRISM. The template parser was developed using the Antlr parser generator.
The EvoChecker source code is available on GitHub  

Software Systems Used to Evaluate EvoChecker

Dynamic Power Management System

approach A software-controlled dynamic power management (DPM) system consisting of a service provider that handles requests generated by a service requester and stored in two request queues of different priorities. The service provider has four states associated with different power usage, i.e., busy, idle, standby, sleep. The figure above depicts the power usage of each state (in watts), the possible transitions between states, and the energy consumed by each transitions (in joules). The task of a software power manager is to reduce power use while maintaining an acceptable service level for the system.

A DPM system designer must select:
  • the capacity of the request queues, QmaxH and QmaxL
  • one of several alternative power managers
  • the parameters associated with the selected power manager
such that the following QoS requirements are satisfied:

ID Description Type
R1 The steady-state utilisation of the high-priority queue should be less than 90% Constraint
R2 The steady-state utilisation of the low-priority queue should be less than 90% Constraint
R3 The system should operate with minimum steady-state power utilisation Objective
R4 The number of messages lost at the steady state should be minimised Objective
R5 The capacity of both queues should minimised Objective


We consider three different variants of the DPM system. We provide the DPM probabilistic model templates and the CSL formulae associated with the QoS requirements for these DPM system variants.

Foreign Exchange Trading System

approach An FX customer (called a trader) can use the system in two operation modes. In the expert mode, FX executes a loop that analyses market activity, identifies patterns that satisfy the trader's objectives, and automatically carries out trades. Thus, the Market watch service extracts real-time exchange rates (bid/ask price) of selected currency pairs. This data is used by a Technical analysisi> service that evaluates the current trading conditions, predicts future price movement, and decides if the trader's objectives are: (i) "satisfied" (causing the invocation of an Order service to carry out a trade); (ii) "unsatisfied" (resulting in a new Market watch invocation); or (iii) "unsatisfied with high variance" (triggering an Alarm service invocation to notify the trader about discrepancies/opportunities not covered by the trading objectives). In the normal mode, FX assesses the economic outlook of a country using a Fundamental analysis service that collects, analyses and evaluates information such as news reports, economic data and political events, and provides an assessment on the country's currency. If satisfied with this assessment, the trader can use the Order service to sell or buy currency, in which case a Notification service confirms the completion of the trade.

The FX designer has to select third-party implementations for each of the n ≥ 1 services from the workflow above for which in-house implementations are not available, in order to meet the following QoS requirements:

ID Description Type
R1 Workflow executions must complete successfully with probability at least 0.9 Constraint
R2 The total service response time per workflow execution should be minimised Objective
R3 The probability of a service failure during a workflow execution should be minimised Objective
R4 The total cost of the third-party services used by a workflow execution should be minimised Objective


We consider three different variants of the FX system. We provide the FX probabilistic model templates and the PCTL formulae associated with the QoS requirements for these FX system variants.


We would like to thank our anonymous industrial collaborator for providing the model and QoS requirements for the FX system.

Evaluation

Research Questions

We evaluate the effectiveness of EvoChecker by answering the following research questions:
RQ1 (Validation): How does EvoChecker perform compared to random search?
RQ2 (Comparison): How do EvoChecker instances using different multiobjective optimisation genetic algorithms (NSGA-II, SPEA2, MOCell) perform compared to each other?
RQ3 (Insights): Can EvoChecker provide insights into the trade-offs between the QoS attributes of alternative software architectures and configurations?

System Variants

We evaluate EvoChecker using the system variants from the table below.

Note: The last two columns in this table contain links to the probabilistic model template T and the probabilistic temporal logic formulae for the QoS attributes of these system variants.

Variant Description Size* T ** QoS attributes
DPM_Small QmaxH,L ∈ {1, ..., 10}, m = 2 2E+14 TDPMsmall CSLDPMsmall
DPM_Medium QmaxH,L ∈ {1, ..., 15}, m = 2 4.5E+14 TDPMmedium CSLDPMmedium
DPM_Large QmaxH,L ∈ {1, ..., 20}, m = 2 8E+14 TDPMlarge CSLDPMlarge
FX_Small n = 4, n1 = n2 = n3 = n4= 3 4.98E+31 TFXsmall PCTLFXsmall
FX_Medium n = 6, n1 = n2 = ... = n6 = 4 1.39E+65 TFXmedium PCTLFXmedium
FX_Large n = 8, n1 = n2 = ... = n8 = 4 7.222E+86 TFXlarge PCTLFXlarge
*     Assuming two-decimal precision for double-valued parameters of the probabilistic model templates
**   Probabilistic Model Template

Quality Indicators

We evaluate the multiobjective optimisation algorithms used in EvoChecker (NSGA-II, SPEA2, MOCell, Random) using the established Pareto-front quality indicators below.
  • Unary additive epsilon (Iε), which measures the minimum additive term by which the elements of the objective vectors from a Pareto front approximation must be adjusted so as to dominate the objective vectors from the reference front. Smaller values denote Pareto front approximations with better convergence.
  • Hypervolume (IHV), which measures the volume in the objective space covered by a Pareto front approximation with respect to the reference front. Larger values denote Pareto front approximations with better convergence and diversity.
  • Inverted Generational Distance (IIGD), which provides an "error measure" as the Euclidean distance in the objective space between the reference front and the Pareto front approximation. Smaller values denote Pareto front approximations with better convergence and diversity.

Results

DPM results
The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the DPM system variants, and the Figure (bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.

Variant NSGA-II SPEA2 MOCell Random
Iε (Epsilon)
DPM_Small 0.0209 0.0130 0.0242 0.1403 +
DPM_Medium 0.0225 0.0123 0.0489 0.1996 +
DPM_Large 0.0229 0.0147 0.0884 0.2497 +
IHV (Hypervolume)
DPM_Small 0.4455 0.4458 0.4396 0.4022 +
DPM_Medium 0.4487 0.4499 0.4386 0.3946 +
DPM_Large 0.4528 0.4549 0.4395 0.3947 +
IIGD (Inverted Generational Distance)
DPM_Small 23E-4 18E-4 16E-4 62E-5 +
DPM_Medium 24E-5 19E-5 28E-5 91E-5 +
DPM_Large 24E-5 20E-5 38E-5 11E-5 +
approach
 
approach
FX results (with fixed services characteristics)
The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the FX system variants, and the Figure (bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.

Variant NSGA-II SPEA2 MOCell Random
Iε (Epsilon)
FX_Small 0.6258 0.5083 0.6745 2.2274 +
FX_Medium 1.6379 2.0105 2.0486 6.1529 +
FX_Large 3.8528 5.2777 4.6366 13.023 +
IHV (Hypervolume)
FX_Small 0.611 0.628 0.608 0.593 +
FX_Medium 0.719 0.725 0.702 0.606 +
FX_Large 0.657 0.675 0.633 0.555 +
IIGD (Inverted Generational Distance)
FX_Small 0.00123 0.00129 0.00125 0.00145 +
FX_Medium 0.00192 0.00207 0.00200 0.00316 +
FX_Large 0.00244 0.00255 0.00272 0.00395 +
approach
 
approach
FX results (with random services characteristics)
We carried out a second study for the FX system, where the characteristics of each concrete service (i.e., cost, success probability, response time) per experiment were randomly generated. With this study, we examine whether EvoChecker is capable of synthesising good probabilistic model sets irrespective of the characteristics of the considered problem. We generated 30 random experiments per system variant and performed 10 independent runs per experiment per system variant (i.e., a total of 300 runs per system variant and 900 runs in total). To allow for a fair comparison across the experiments and to avoid undesired scaling effects, we normalise the results obtained for each quality indicator per experiment within the range [0,1].

The Table (left) reports the mean quality indicator values, and the Figure (right) depicts the boxplots for the FX system variants.

Variant NSGA-II SPEA2 MOCell Random
Iε (Epsilon)
FX_Small 0.2212 0.2009 0.2272 0.6200 +
FX_Medium 0.3393 0.3664 0.3645 0.7568 +
FX_Large 0.3396 0.3764 0.3625 0.7970 +
IHV (Hypervolume)
FX_Small 0.9374 0.9914 0.9337 0.9016 +
FX_Medium 0.9514 0.9848 0.9219 0.8138 +
FX_Large 0.9467 0.9804 0.8962 0.7868 +
IIGD (Inverted Generational Distance)
FX_Small 0.6365 0.5348 0.6390 0.8000 +
FX_Medium 0.5919 0.5790 0.6114 0.7957 +
FX_Large 0.5887 0.5622 0.6561 0.8884 +
approach

Comparison with Model Repair

Network Virus Infection
Paper: Model Repair for Markov Decision Processes. T. Chen, E. Hahn, T. Han, M. Kwiatkowska, H. Qu, and L. Zhang. In 7th Intl. Symposium on Theoretical Aspects of Software Engineering (TASE'13), pages 85–92, 2013.

In this case study we established that EvoChecker can replicate/outperform the result obtained in the paper Model Repair for Markov Decision Processes by Chen et al. This work performed model repair on the Network Virus Infection case study which considers a parametric model of a computer virus infecting a network. The network is a grid of N by N nodes, with each node connected to four neighbours (the nodes above, below, to the left and to the right), except for the nodes on the border, for which some of the neighbours are not present. The model is an MDP that represents the situation where the virus spawns/multiplies. More concretely, once a node is infected, the virus remains at that node and repeatedly tries to infect uninfected neighbouring nodes. The goal is to repair the model such that the minimal expected number of attacks required by the virus until an infection of the node (1, 1) starting at (N,N). The repairing action should have a minimum cost (i.e., a minimal modification to the original model).

The cost for the model repair solution obtained by Chen et al., and reported in their paper, is 0.02044.

EvoChecker generated a model repair solution with cost 0.0201473 in less than 3000 evaluations.

We provide the probabilistic model template for this case study and the PCTL formulae associated with its QoS requirements.
IPv4 Zeroconf Protocol
Paper: Model Repair for Probabilistic Systems. E. Bartocci, R. Grosu, P. Katsaros, C.R. Ramakrishnan, and S. A. Smolka. In 7th Intl. Conf. on Tools and Algorithms for the Construction and Analysis of Systems (TACAS'11), pages 326-340, 2011.

The Zeroconf protocol is used for assigning IP addresses in devices joining a network. When a new host joins the network it randomly selects an IP address among K possible ones. With m hosts in the network, the collision probability is q = m/K. A new host asks the other hosts whether the randomly selected IP address is already used and waits for an answer. The probability that the new host does not get any answer is p, in which case it repeats the query. If after n tries there is still no answer, the host will erroneously consider the chosen address as valid.

In paper Model repair for probabilistic systems, Bartocci et al. use a Max Profit algorithm to repair the model given an initial q. For n = 3, p = 0.1, and initial q = 0.6, they reduced the expected number of steps to termination from 6.15 to 5.1 setting q = 0.5002.

To replicate this in EvoChecker, we define q as an evolvable parameter (double) as follows:
evolve double q [0.5 .. 0.7]

EvoChecker found the optimal q = 0.50000964 and determined that the expected steps to termination are 5.105. EvoChecher obtained these results in less than 1000 evaluations.

We provide the probabilistic model template for this case study and the PCTL formula for the QoS requirement.