[High Integrity Compilation] Susan Stepney.
High Integrity Compilation : a case study .

Prentice-Hall, 1993.

7. Tosca---States and Environments

7.1 Introduction

There are two obvious ways to organize the specification of each of the four semantics for each of Tosca's various language constructs: either specify each separate semantics completely in turn, or specify each construct completely in turn. After experimenting with both organizations, the latter has been chosen. Specifying all the semantics of each language construct in one place makes it easier to understand that construct as a whole. This approach does have one drawback, however. It becomes necessary to specify quite a few auxiliary Z types and functions before their use may be apparent. These auxiliary definitions are gathered together in this chapter. It may be best to skim through on first reading, noting what things have been defined, but waiting until the next chapter to find out how they are used.

7.2 Semantics---general

7.2.1 Environment and state

A variable that has been declared but not yet initialized has no associated value. So modelling the state as a mapping from variable names to their values is not appropriate. (Using a partial function for the mapping does not help, because it does not distinguish variables not in the domain that have not been declared, from those not in the domain that have been declared, but not yet given a value.) It is conventional to introduce intermediate locations , and use an environment to map names to locations, and a store to map locations to values. A declaration changes the environment by assigning a new variable to a new location. But that location does not become part of the store's domain until the variable is first assigned a value. The evolving state of a computation consists of this store, and the input and output streams.

In two of the static semantics, declaration-before-use and type checking, the state of a variable does not change: once declared it stays declared, once typed it retains that type. So for these an environment, but no state, is required. With the other two semantics, initialization-before-use and dynamic, the state of a variable can change. A variable starts out uninitialized when declared, and may later be initialized. In the dynamic case, a variable may take on many values as the computation progresses. So both these definitions require an environment and a state.

The domains of all these various environments are a set of variable names. Since NAME is a syntactic, as well as a semantic, domain, when an environment function is applied to a name it is written env[|x|] rather than env(x) , to highlight this point.

7.2.2 Check status

The purpose of a static semantics is to check whether a construct is okay or wrong.

    CHECK ::= checkOK | checkWrong

Although CHECK has only two values, Z has no boolean type, so CHECK cannot be declared as a boolean flag. Even if Z did have such a type, it would be better specification style to make CHECK a free type definition as above. If later it were decided to modify the specification by adding an extra check status, for example, checkWrong , it would be relatively straightforward to extend the definition given above.

Check results need to be combined. Join is an infix function that combines check results 'pessimistically'; the combination of the check results is checkOK only if both the arguments are checkOK :

    |    _ |><|  _ : CHECK x CHECK -|-> CHECK
    |--------
    |    A c1, c2 : CHECK @
    |        c1 |><| c2 =
    |            if c1 = checkOK /\ c2 = checkOK
    |            then checkOK else checkWrong

7.2.3 Memory locations

Store locations are modelled as integers.

    Locn == Z

These numbers could be interpreted as representing memory addresses, for example. Negative locations have been allowed: these could be interpreted as 'special' addresses, for example, registers. The interpretation chosen does not affect Tosca's semantics; it is important only when mapping to a particular low level language.

7.3 Declaration-before-use semantics

For this non-standard interpretation, the declaration environment is a mapping from the names that are referenced in the program to their check status (representing declared or not declared):

    EnvD == NAME -|-> CHECK

7.4 Type checking semantics

The TYPE definition given in the abstract syntax is augmented with a typeWrong component:

    TYPE  ::=  ...
           |   typeWrong

For this non-standard interpretation, the type environment describes the mapping from identifiers to their types:

    EnvT == NAME -|-> TYPE

7.5 Initialization-before-use semantics

A use value is either checkWrong (for a variable whose value is used before being initialized, or an expression that uses a checkWrong variable in a subexpression) or checkOK (for a variable that has been initialized before being used, or an expression that uses only checkOK variables in its subexpressions). The StoreU is the mapping from store locations to the current use state of the variable stored there:

    StoreU == Locn -|-> CHECK

7.5.1 The initialization-before-use state

The StateU has two components, the StoreU and a tag noting the check status. This tag propagates through the definition and becomes the final check result:

    StateU == StoreU x CHECK

The generic Z functions first and second extract components of an ordered pair. The query functions storeOfU and checkOfU are more meaningful names for these, for use when extracting the store and check components of the state pair:

    storeOfU == first[StoreU, CHECK]

    checkOfU == second[StoreU, CHECK]

[+] u updates the store component of a use state:

    |    _ [+]u _ : StateU x StoreU --> StateU
    |--------
    |    A su1, su2 : StoreU; c1 : CHECK @
    |        (su1, c1) [+]u su2 = (su1 (+) su2, c1)

updateUseU changes the use check component of the store to the worse of its current value and that of the input:

    |    updateUseU : CHECK --> StateU --> StateU
    |--------
    |    A c1, c2 : CHECK; su : StoreU @
    |        updateUseU c1(su, c2) = (su, c1 |><| c2 )

7.5.2 The initialization-before-use environment

The use environment is a mapping from names (identifiers) to what they denote, locations:

    EnvU == NAME >-|-> Locn

The mapping is an injective (one-to-one) function, which ensures that no two names map to the same location.

7.5.3 Worse use store

The function worseStore takes two use stores, su 1 and su 2 , and returns a use store that combines the worst properties of each.

Where the domains of the two use stores coincide, on dom su 1 n dom su 2 , worseStore is defined by the check status combining function, |><| .

Where the domains do not overlap, on dom su 1 <-: su 2 and dom su 2 <-: su 1 , it means that one store has a location that maps to some check status, and the other does not have that location. If that unmatched location maps to checkWrong , the worse behaviour is to continue to map to checkWrong . If that unmatched location maps to checkOK (if, for example, a variable is initialized in one branch of a choice but not in the other), the worse behaviour is to remove the location from the domain of the result; it is worse to have a variable uninitialized than set to something. In either case, this can be achieved by range restricting the non-overlapping parts of the stores to checkWrong :

    |    worseStore  :  StoreU x StoreU --> StoreU
    |--------
    |    A su1, su2 : StoreU @
    |        worseStore(su1, su2) =
    |            { l : Locn | l e dom su1 n dom su2 @
    |                l |--> (su1 l |><| su2 l) }
    |            u dom su2 <-: su1 :-> {checkWrong}
    |            u dom su1 <-: su2 :-> {checkWrong}

:-> is Z's range restriction: the relation S :-> r is that subset of the relation S that has its range restricted to the elements in r . <-: is Z's domain anti-restriction: the relation d <-: S is that subset of the relation S that has its domain restricted to the elements not in d . For example,

    {a |--> 1, b |--> 2, c |--> 3 } :-> {1, 2} = {a |--> 1, b |--> 2}
    {a} <-: {a |--> 1, b |--> 2, c |--> 3} = {b |--> 2, c |--> 3}
    {a} <-: {a |--> 1, b |--> 2, c |--> 3} :-> {1, 2} = {b |--> 2}

The definition of worseStore is one of the more complicated-looking definitions, so, to see how it works, consider two stores:

    su1 = {l1 |--> checkOK, l2 |--> checkWrong, l3 |--> checkOK}
    su2 = {l2 |--> checkOK, l3 |--> checkOK, l4 |--> checkWrong}

The overlapping domain is dom su 1 n dom su 2 = { l 2, l 3} . l 2 maps to checkWrong in one store, checkOK in the other, and so to checkWrong in the composed store. l 3 will continue mapping to checkOK .

su 1 has one location not in su 2 , l 1 . This maps to checkOK , so will not appear in the combined store:

    dom su2 <-: su1 = {l1 |--> checkOK}
    dom su2 <-: su1 :-> {checkWrong} = (/)

su 2 has an unmatched location, l 4 . It maps to checkWrong , which carries over:

    dom su1 <-: su2 = {l4 |--> checkWrong}
    dom su1 <-: su2 :-> {checkWrong} = {l4 |--> checkWrong}

So the worse store that results from combining these is

    worseStore(su1, su2) 
            = {l2 |--> checkWrong, l3 |--> checkOK}
                u (/) u {l4 |--> checkWrong}
            = {l2 |--> checkWrong, l3 |--> checkOK, l4 |--> checkWrong}

7.5.4 Worse use state

The function worseState takes two use states and returns a use state that combines the worst properties of each:

    |    worseState : StateU x StateU --> StateU
    |--------
    |    A su1, su2 : StoreU; c1, c2 : CHECK @
    |       worseState((su1, c1), (su2, c2)) =
    |            (worseStore(su1, su2), c1 |><| c2)

7.6 Dynamic semantics

The Store is modelled as a mapping from store locations to values:

    Store == Locn -|-> VALUE

7.6.1 Input and output

Input and output are modelled as lists of integers:

    Input == seq Integer
    Output == seq Integer
7.6.1.1 The dynamic state

The state of a computation has three components, the store mapping, the input, and the output:

    State == Store x Input x Output 

Z has no generic functions analogous to its first and second to extract components of an ordered triple. The query functions storeOf and outOf need to be defined to extract the store and output sequence components of the state tuple:

    |    storeOf : State --> Store
    |    outOf : State --> Output
    |--------
    |    A s : Store; in : Input; out : Output @
    |            storeOf(s, in, out) = s
    |            /\ outOf(s, in, out) = out

[+] updates the store component of a dynamic state:

    |    _ [+] _ : State x Store --> State
    |--------
    |    A s1, s2 : Store; in : Input; out : Output @
    |            (s1, in, out) [+] s2 = (s1 (+) s2, in, out)
7.6.1.2 The dynamic environment

The environment is a mapping from names (identifiers) to what they denote, locations:

    Env == NAME >-|-> Locn

The mapping is an injective (one-to-one) function, which ensures that no two names map to the same location.

7.7 Aside---using generic definitions

Tosca's syntax includes sequences of declarations and sequences of commands. The meaning of a sequence of such constructs is related to the meaning of a single construct in a uniform manner; all the specifications look boringly similar (see, for example, section 8.2.2).

Thus, it might be thought appropriate to define a generic Z function that maps a meaning function for a single construct to the corresponding meaning function for a sequence of constructs, and instantiate this generic with various actual types when used. But one of the motivations for specifying and building a high integrity compiler in the manner being described here is that the translation from the formal Z specification to the Prolog implementation is as clear as possible; the use of some sorts of generic function can obscure this translation. So the Tosca specification has the definition of each separate function written out fully. To illustrate this point further, a generic form for sequences of declarations is given below, for comparison with the explicit forms given in section 8.2.2.

The meaning of a declaration in all four of Tosca's semantics is an appropriate environment change (expressed as a mapping from environment to environment). The meaning of a sequence of declarations is also an environment change. An empty sequence has no effect on the environment, and hence its meaning is the identity function. A sequence consisting of a single declaration has the same meaning as that declaration. The meaning of a longer sequence can be split up as the meaning of shorter subsequences, joined together by functional composition. So an appropriate definition, generic in the environment variable, that maps the meaning function for a single declaration to one for a sequence of declarations, is

    |===[E]========================================
    |    SEQ : (DECL -|-> E -|-> E) --> (seq DECL -|-> E -|-> E)
    |--------
    |    A d : DECL; D1, D2 : seq1 DECL; F : DECL -|-> E -|-> E @
    |        (SEQ F)< > = id E
    |        /\ (SEQ F)<d> = F d
    |        /\ (SEQ F)(D1 ^ D2) = (SEQ F)D1 o (SEQ F)D2
    |-----------------------------------------------

Each of the four meaning functions for sequences of declarations could be concisely defined, by instantiating SEQ with the appropriate environment, and applying it to the appropriate meaning function for single declarations, thus

    DD* == SEQ[EnvD] DD
    TD* == SEQ[EnvT] TD
    UD* == SEQ[EnvU] UD
    MD* == SEQ[Env] MD

However, the link to the Prolog implementation would be less clear in this form. The style of this particular Z specification is guided by a balance between clarity of specification, ease of proof, and clarity of translation into Prolog. If the same specification were written for a different reason, it might well be written in a different style.