Observation-Based Markov Chain Refinement

Department of Computer Science, University of York.


Despite major advances in software performance engineering, it is still very challenging to ensure that performance models are sufficiently accurate to support the design and verification of real systems. Our paper addresses this challenge for continuous-time Markov chains (CTMCs), a type of stochastic state-transition models used to analyse non-functional software properties both at design time and at runtime. To this end, we introduce an Observation-based Markov chaiN refInement (OMNI) method that significantly improves the accuracy with which non-functional software properties can be analysed using CTMCs. OMNI starts from a high-level abstract CTMC whose states correspond to operations performed by the analysed software, and uses observed execution times for these operations to refine this CTMC. The execution time observations can be obtained by unit testing individual components before the software system is built or, for existing systems, from their logs.

OMNI comprises two stages. The first stage makes the CTMC more realistic through the addition of states and transitions that model the fact that software operations have non-zero minimum execution times. The CTMC is then further refined in the second OMNI stage by using phase-type distributions to model the variations in the execution time of the operations. The refined CTMC supports the accurate analysis of a wide range of non-functional software properties expressed in continuous stochastic logic. This analysis can be carried out using existing probabilistic model checkers.

The main contributions of this work are:

  • The two-stage OMNI method for refining CTMC software performance models based on the observed behaviour of the software components.
  • Rigorous techniques for synthesising the Erlang distributions used to model the minimum execution times of software operations, and the phase-type distributions used to model the variations in these execution times with far greater accuracy than the exponential distributions typically used in CTMC models.
  • A tool that implements the OMNI refinement method, producing refined CTMC models that can be analysed with the probabilistic model checker PRISM.
  • A case study that uses OMNI to accurately analyse non-functional properties of a web application assembled from real third-party services, and shows how software engineers can use OMNI to avoid multiple invalid design decisions suggested by traditional CTMC analysis.

Motivating Example

We motivate our CTMC refinement method and illustrate its use by considering the development of a travel web application.

The application must handle two type of request:

  • Requests from users who plan to meet and entertain a visitor arriving by train. These requests are expected to occur with probability p1.
  • Requests from users looking for a possible destination for a day trip by train. These requests are expected to occur with probability 1-p1.

The high-level abstract CTMC below models the handling of a request by the application.

The initial state s1 corresponds to the finding the location of the train station. For the first request type, this operation is followed by finding the train arrival time (state s2), identifying suitable restaurants in the area (state s4), obtaining a traffic report for the route from the user's location to the station (state s6), and returning the response to the user (state s7).

For the second request type, locating the train station (state s1) is followed by consulting the train departures for a possible tourist destination (state s3), and obtaining a weather forecast for this destination (state s5). With a probability of p2 the weather is unsuitable and a new destination needs to be selected and to have its weather forecast checked. Once a suitable destinations is selected, the traffic report is obtained for travel to the station (state s6) and the response is returned to the user (state s7).

The real services used in the application and their observed average rates after testing are listed below.

Label Third-party service rate (/s)
location Bing location service* 9.62
arrivals Thales rail arrivales board 19.88
departures Thales rail departures board 19.46
search Bing web search* 1.85
weather WebServiceX.new weather service 1.11
traffic Bing traffic service* 2.51
*Note that selecting the links to the RESTful Bing services in a web browser will show an empty webpage - this is normal. A description of these services is available at https://msdn.microsoft.com/en-us/library/ff701713.aspx.

The CTMC was expressed in the PRISM modelling language and used to analyse three non-functional properties of the application:

  • P1 The fraction of user requests handled in under T seconds, for 0 < T <= 4.
  • P2 The fraction of ``day trip'' user requests handled in under T seconds, for 0 <T<= 4.
  • P3 The expected profit per request handled, assuming that 2 cents are charged for requests handled within T seconds and a 1 cent penalty is paid for requests not handled within 3 seconds, for 0 <T<= 3.

The PRISM CTMC model and the properties specified in probabilistic computation tree logic (PCTL) are available as a PRISM properties file.

The following graphs show the actual and predicted values for each of the properties with actual figures calculated from observed values in the trace data. The errors reported give the area difference between the actual and predicted property values where Tmax is 4 for the first two properties and 3 for property P3.

This CTMC model makes the typical assumption that operation execution times are exponentially distributed. However, this assumption is almost always invalid for two reasons. First, each software operation i has a minimum execution time ti >0 such that its probability of completion within ti time units is zero. In contrast, modelling the execution time of the operation as exponentially distributed with rate λi yields a probability 1-e{-λiti} >0 of completion within ti time units. Second, even the ``shifted'' execution times t1-ti, t2-ti, ..., tn-ti of an operation i are rarely exponentially distributed.

The problem addressed by OMNI is to overcome these issues by producing a refined CTMC model that supports the accurate analysis of non-functional properties of software, and that can still be analysed using existing model checkers.

OMNI achieves this in two stages. In the first stage, the high-level CTMC is refined to better model the minimum execution times of software operations. In the second stage, the CTMC is further refined through the use of phase-type distributions to more accurately model the variations in the execution time of each operation.

Stage I: Delay Extraction

In this stage, the CTMC is extended with additional states and transitions that model the minimum execution times of the software operations by means of Erlang distributions, i.e., sums of several independent exponential distributions with the same rate. Each state si is replaced with a sequence of delay-modelling states si1, si2, ..., siki that encode a Erlang-k distribution with rate λiE and a state s'i with outgoing transitions of rates pijλ'i, 1<= j<= mi, to the same next states as si.

The calculation of the parameters ki, λiE and λ'i of this delay-extraction refinement involves the application of theoretical results described in the full paper.

For our motivating example we obtained the values below for these parameters.

si label ti (ms) λiE(s-1) λi'(s-1)
s1 location 49 5285 18.21
s2 arrivals 45 5756 188.81
s3 departures 45 5756 156.76
s4 search 209 1239 3.01
s5 weather 706 367 5.14
s6 traffic 179 1447 4.57

Our resultant PRISM model then includes modules for the delay associated with each component and may be downloaded as Simple_Exp_With_Delay_mod.pm.

The number of states associated with a delay is controlled through use of a parameter kSize and each module takes the form

  • module locDelay
  •  iL : [1..kSize];
  •  [locStart] iL < kSize -> kSize/locMean : (iL'=iL+1);
  •  [locFin] iL = kSize -> kSize/locMean : (iL'=1);
  • endmodule

From this model PRISM was used to assess the properties of the model and the results are shown below. As anticipated, the error decreased significantly for all three properties, reducing the margin for engineering decision errors. Nevertheless, there are still time points T where the actual and predicted property values differ noticeably.

Stage II: Holding-Time Modelling

This OMNI stage further refines the CTMC model obtained in the delay extraction stage by using phase-type distributions (PHDs) to model the "holding times" of the system operations. A PHD is defined as the distribution of the time to absorption in a CTMC with one absorbing state. OMNI uses a PHD fitting heuristic that leverages state-of-the-art cluster-based PHD fitting algorithms to refine the CTMC component associated with each software operation.

An error measure α is utilised in the heuristic to control the stopping criteria such that smaller values of α lead to more accurate models which are generally larger in size.

We used the OMNI tool to perform Stage II of our refinement method on the CTMC model of the web application from our running example. The refinement was performed for each of the six services invoked by the application.

The refined CTMC obtained after both OMNI stages had 1766 states and 1797 transitions, and the graphs below compare the actual values of properties P1 - P3 from our running example with the values predicted by the analysis of this CTMC. Both the visual assessment and the error values associated with these predictions (which are significantly lower than the error values before the refinement and after the first stage show that this CTMC supports the accurate analysis of the three properties.


We evaluated the effectiveness of OMNI by performing a set of experiments to answer the next research questions:

RQ1 (Accuracy):

How effective are OMNI models at predicting non-functional property values for other system runs than the one used to collect the datasets used for the refinement?

For any stochastic process, a (sufficiently large) training dataset should be sufficient to capture the behaviour of the system. Models based on this dataset should be robust when evaluated against datasets from other system runs (i.e., no overfitting). To assess if OMNI models have this property, we used three four-hour runs of the application, carried out at different times of day over a period of two days, to collect three testing datasets of same size to the training dataset.

The differences between actual and predicted values for each additional dataset are presented below. The models used in this analysis are the initial CTMC (labelled ``exponential'' in the diagrams) and the refined CTMCs obtained after each OMNI stage for the training dataset.

A visual inspection of the results shows that each OMNI significantly improves the accuracy of the analysis.

To eliminate the risk that these improvements were due to an advantageous partition of the data into training and testing sets, we carried out 30 more experiments in which the observations were randomly partitioned into four new datasets, one used for training and the others for assessing fitting errors. The errors from these experiments, summarised as box plots, show that OMNI consistently outperforms traditional CTMC modelling irrespective of the choice of training data, confirming the accuracy and robustness of OMNI.

RQ2 (Refinement Granularity):

What is the effect of varying the refinement granularity on the model accuracy, size and verification time?

We ranexperiments to evaluate the accuracy of OMNI predictions when varying: (i) the size ki of the Erlang delay model; and (ii) the threshold α from Stage II fitting Algorithm. These parameters determine the granularity of the refinement with larger ki values and smaller α values corresponding to finer-grained refinement (and larger refined model sizes). The experimental results are summarised below. For each experiment, the (cumulated) prediction error for the properties are is provided.

The experimental results that assess only Stage I of our approach show that increasing ki initially reduces the analysis error for all properties, with significant improvements obtained even for small ki values. However, increasing ki beyond a certain value (approximately 100 for the system from our analysis) has little effect on the accuracy of the model. In Stage II, the analysis error decreases when smaller values are used for α. However, this decrease is much less significant when changing from α=0.1 to α=0.05 than when changing from α=0.2 to α=0.1. Both results show the presence of a point of "diminishing returns" in continuing the refinement beyond a certain level of granularity, although this level is necessarily application and training dataset dependent.

As an additional remark, a coupling effect between α and ki can be identified in the results. This leads to an increase in the limit at which ki still provides improvements in model accuracy, with Stage II model accuracy showing improvements as ki is increased from 100 to 259.

RQ3 (Comparison to PHD):

How does OMNI compare to a single-stage PHD fitting approach?

As PHDs can fit any positive continuous distribution, we assessed the benefit of having a delay extraction stage in OMNI. To this end, we switched off OMNI's delay extraction and produced PHD-only refined CTMCs for α = {0.1, 0.05, 0.01}. The table below compares these models to OMNI refined CMTCs of equivalent size (i.e. small, medium or large) that we constructed by using α=0.1, with additional states used to model the delay. The experimental results show that for all three model sizes, the OMNI models yield more accurate results (with similar or better verification times, and much smaller refinement times) than the PHD models. This confirms existing theoretical results which show that PHD fitting of deterministic delays requires an excessive number of states, and that an Erlang distribution (as used in the first OMNI stage) is the best fit for a fixed delay.

Tool Support

We provide a tool that uses HyperStar for the second OMNI stage, and produces models that can be analysed with PRISM. An executable version of the tool is available here.

A how-to document describing the installation and usage of the OMNI tool is available here.

Datafile used in this work
Artefact Description
Trace data The data obtained by calling the running example web services.
Initial CTMC model PRISM model of the abstract CTMC using simple 6 state representation
CTMC model after Stage I PRISM model with delays replaced by Erlang-k
CTMC model after Stage II PRISM model as refined by OMNI
PCTL Properties PCTL properties in PRISM format used to assess the CTMC models.
Larger Models A set of larger input models including configuration files.