Self-Adaptive Software with Decentralised Control Loops

Radu Calinescu, Simos Gerasimou, Alec Banks

DECIDE (DEcentralised Control In Distributed sElf-adaptive software) is a rigorous approach to decentralising the control loops of distributed self-adaptive software used in mission-critical applications. DECIDE uses quantitative verification at run time, first to agree individual component contributions to meeting system-level QoS requirements, and then to ensure that components achieve their agreed contributions in the presence of changes and failures. All verification operations are carried out locally, using component-level models, and communication between components is infrequent. We applied DECIDE on a distributed embedded system built on a widely used platform for developing autonomous applications on unmanned marine vehicles.

The main contributions of this work are:

  • A theoretical framework for the decentralisation of control loops in distributed self-adaptive software systems
  • A runtime quantitative verification method for devising component QoS capability summaries in a distributed self- adaptive system
  • A method for the decentralised selection of component contribution-level agreements that guarantee system-level compliance with performance, reliability and other QoS requirements at system level
  • The validation of the DECIDE approach for a distributed embedded system comprising homogeneous unmanned underwater vehicles (UUVs)
Full information on DECIDE are described in our research paper presented at FASE'15.
We have also made available the proofs for the theorems introduced in the paper.

DECIDE: DEcentralised Control In Distributed sElf-adaptive software

approach
The decentralised control workflow executed by a DECIDE component comprises the following stages:
  1. Local capability analysis.
  2. When joining a distributed self-adaptive system and after major environment and internal changes, the DECIDE component uses runtime quantitative verification locally, to establish its expected range of QoS attributes. The result of this analysis is a capability summary, i.e., a finite set of possible contributions that the component could make towards achieving the QoS requirements of the system. These contributions are associated with different modes of operation that the component can assume, and with different costs.

  3. Receipt of peer capability summaries.
  4. On start-up and after major changes experienced by peer components, the component receives peer capability summaries. DECIDE capability summary calculation and communication overheads are made acceptable for a given system by fine tuning the local capability analysis carried out by each component. Informally, this analysis is sufficiently conservative to ensure that "major changes" occur with a low frequency at which these overheads represent only small fractions of the component compute resources and system bandwidth, respectively.

  5. Selection of local contribution-level agreement.
  6. This DECIDE stage is performed by a component each time when there is a change to a capability summary, whether the local one or that of a peer. As a result, the component selects one of the possible contributions in its local capability summary as its contribution-level agreement (CLA). The CLA selection is carried out locally by each component, such that the system complies with its QoS requirements as long as each component achieves its CLA.

  7. Execution of local control loop.
  8. Most of the time, this is the only stage of the DECIDE workflow that a component needs to execute. Its purpose is to ensure compliance with the selected component CLA.

Distributed multi-UUV embedded system

System characteristics
We consider an n-UUV system deployed on a data collection mission, with the following characteristics:
  • Each UUV is equipped with ni on-board sensors that can take periodic measurements of the ocean environment
  • The j-th sensor of UUV i operates with rate rij≥0 and the measurements taken are sufficiently accurate with probability pij, which depends on the UUV speed spi ∈ [0, spimax]
  • For each measurement taken an amount of energy eij is consumed
  • The sensors can be switched on and off, and these operations consume eijon and eijoff, respectively
Mission requirements
The n-UUV system to complete its mission successully should comply with the following QoS requirements:
System-level requirements:
  • R1: The n UUVs should take at least 1000 measurements of sufficient accuracy per 60 seconds of mission time
  • R2: At least two UUVs should have switched-on sensors
  • R3: If requirements R1 - R2 are satisfied by multiple configurations, the system should use one of these configurations that minimises the energy consumption (so that the mission can continue for longer)
Component-level requirements:
  • R4: The energy consumption of its sensors should not exceed eimax Joules per 60 seconds of mission time
  • R5: The UUV should use only sensors whose measurements are accurate with probability at least pimin
  • R6: If local requirements R4 - R5 are satisfied by multiple configurations, the i-th UUV should use one of these configurations that minimises the local cost, given by the equation: w1 × ei + w2 × spi
Adaptation
The system-level and the UUV-specific QoS requirements should be satisfied at all times. In such a dynamic environment, each UUV i should adapt to changes in the operating rate of its sensors and to sensor failures, by continually adjusting:
  • the UUV speed spi
  • the sensor configurations xi1, xi2, . . . , xini , where xij = 1 if the j-th sensor is on and xij = 0 otherwise

Example: Applying DECIDE to the distributed multi-UUV system

Assume a three-UUV embedded system with UUV names "Apollo", "Hermes", and "Zeus". Each UUV is equipped with three on-board sensors and the average operating rate of each sensor is shown in the table right.
Image 01
We describe below the execution of DECIDE during system start-up:
  1. Local capability analysis
  2. The three UUVs carry out the local capability analysis step and assemble their capability summaries.

    * Each tuple (α, β) denotes the values of capability summary csik for system-level QoS requirements R1 and R3, respectively.
  3. Receipt of peer capability summaries
  4. Apollo, Hermes, and Zeus exchange their capability summaries within a pre-specified time window. These time windows begin by the receipt of an update, and a CLA selection stage for all updates received within the window is triggered when the window ends.

Image 01

Image 01

  1. Selection of local contribution-level agreement
  2. Each UUV solves the optimisation problem:
    Image 01

    which corresponds to the minimisation form of a multiple-choice knapsack problem (MCKP).

    The algrotihm we implemented solves this problem by first transforming the minimisation form of MCKP to its equivalent maximisation problem by finding for each UUV i the following values:
    Image 01
    and updating each capability summary CSi, 1 ≤ i ≤ n, of the n-UUV system as
    Image 01
    where bound1 is the constraint specified for system-level QoS requirement R1 and C is the knapsack capacity. Then using an efficient dynamic programming algorithm, we obtain the optimal solution to the problem solving the following equation recursively
    Image 01
    where DP is a suitable data structure of size (n+1) × C.
  • After solving this optimisation problem, the selected CLAs for the 3-UUV system are:

Image 01
  1. Local control loop
  2. Every 5s each UUV executes a local control loop and selects a configuration that satisfies its CLA1, CLA2, CLA3 and its local requirements R4, R5, R6.

Image 01

Distributed UUV system simulator

We implemented a fully-fledged simulator for the multi-UUV self-adaptive system using MOOS-IvP, a widely used platform for the implementation of autonomous applications on unmanned marine vehicles. We designed and developed a DECIDE MOOS component that implements the DECIDE workflow.

To illustrate the applicability of DECIDE, we carried out a broad range of experiments using MOOV-IvP and DECIDE MOOS. To examine the impact of different types of failure, the experiments were seeded with failure patterns that consisted of failures of sensors, sudden significant reductions in sensor measurement rates (i.e., not following the distribution stated in the sensor specification) and failures of entire UUVs.
Scenario characteristics
In this particular scenario, a three UUV-system is deployed on a 5000s mission. Each UUV is equipped with three on-board sensors and the requirements for the mission are identical to those described before.
Mission scenario
Image 01
Timeline
Image 01 * Image 01 Image 01 etc. correspond to DECIDE stages
Demo
  • Green circle indicates an operating sensor
  • White circle indicates an idle sensor
  • Red circle indicates a sensor with abnormal behaviour
  • Orange circle indicates a sensor that is tested whether it has recovered

Evaluation

To evaluate the effectiveness of DECIDE, we carried out a broad range of experiments using the multi-UUV system implementation and MOOS-IvP simulator. The system characteristics varied in these experiments included the number of UUVs n and the number of UUV sensors ni, 1 ≤ i ≤ n, as well as the confidence level α used to assemble the UUV capability summaries in the local capability analysis stage. The experiments were also in line with the demo and were seeded with failure patterns including complete UUV failures and abrupt reductions in sensor measurement rates. The data collected while running the experiments can be found on Google Drive.
CPU usage and communication overheads
As shown in the table (right), the CPU overheads for each stage of DECIDE never exceeded 40ms, when executing the local control loop with 10s intervals.

The communication overhead is 71 bytes per message per UUV which is within the 0.5 - 5 Kbps bandwidth allowed between UUVs that operate underwater and communicate using acoustic channels.
Image 01
Adaptation efficiency
To assess the efficiency of DECIDE , we compared the number of measurements taken and the energy consumed using DECIDE and three different α-confidence values against an ideal system in which the sensor rates are fixed and the globally optimal set of sensors is enforced at all times. The results shown in the table right are averaged over 10 experiments.

In the worst case DECIDE consumes 21% more energy and takes 13% more measurements. This loss is mitigated by the need for significantly less CPU to execute DECIDE and the capability to run RQV to systems that was unfeasible with traditional RQV-based approaches..
Image 01
Scalability
To examine how well DECIDE can cope with large systems , we evaluated the approach with systems comprising up to 32 components. As shown in the diagram (right), the local capability analysis and local control loop stages require the same small amount of CPU time irrespective of the size of the UUV system. The CLA selection algorithm uses less than 200ms for systems up to 32 UUVs. Although the CPU time required by the knapsack systel CLA selection algorithm depends on the size of the knapsack, it is upper bounded to O(n2) and is negligible compared to a centralised control loop that requires 4200s for n=2 and is unfeasible for n ≥ 3.
Image 01