Books

Books : reviews

Douglas B. Lenat, R. V. Guha.
Building Large Knowledge-Based Systems: representation and inference in the Cyc project.
Addison-Wesley. 1990

rating : 3 : worth reading
review : 13 March 1991

This book describes the half-way state of the 10 year Cyc project to build a massive (hundreds of millions of facts) "common-sense" knowledge base, as a basis of less brittle (i.e. less stupid) expert systems. The authors have thought quite carefully about what to represent and how to represent it. They describe this activity as being ontological engineering, as opposed to knowledge engineering. Below I summarize the most interesting [to me, from the perspective of OO modelling] parts of their work. The book itself is pretty terse in places (it has a lot to say) so my summary probably verges on incomprehensibility in places -- when so, read the book!

Cyc's emphases are different from most knowledge engineering, due to its different goals. In particular, a lot of effort has gone in to modelling "thorny" real world things like matter, space and time, which often aren't considered much in more traditional data modelling.

review written for the ORCA project

What is a Thing?

Cyc defines an ontology to describe a domain as containing

To build this, you must start with the different types of primitive entity (fundamental abstractions); look for relations that hold between elements of these; and generate further composite relations and entities.

Desirable properties of an ontology are:

Things

Everything in Cyc is a thing (shades of Smalltalk, you might think, but Cyc takes the idea even further!).

A thing is anything about which you can state a fact, and to which you can ascribe a name. A thing should possess some interesting properties, and should be capable of playing a direct role, as a whole, in some situation of interest. So each thing should not only have a name, but should also deserve to have that name. It deserves the name if:

Things are represented in Cyc by frames (analogous to OO classes) with slots (analogous to OO attributes). Although everything is a thing, in order to have an "adequate ontology", it is important to have a rather larger set of concepts. So the category Thing (the set of all things) is successively specialised into more useful categories (subsets of Thing).

In what follows, remember that the English sentence "X is a Y" can mean either "the category X is a subset of the category Y" (e.g. an engineer is a person) or "the instance X is a member of the category Y" (e.g. Casey Jones is an engineer).

Some of the specialisations of Thing that Cyc identifies are (some of which are described in more detail later, and in a lot more detail in the book):

Relationships and Slots

Relationships are Things, and a Slot is a specialised Relationship (a slot specifies a relation between the Thing with the Slot and the Thing in the Slot), and every such relationship has a corresponding inverse, automatically maintained.

Slots are specialised into bookkeeping slots, used by Cyc in storing facts and executing queries (for example, creationTime, lastEditor, cacheMe), and defining slots, which are further specialised into:

One interesting kind of relation is the transitive relations (many of the taxonomic relations are transitive). Such relations usually have both an immediate-only form (e.g. parent) and transitive-closure form (ancestor).

Analogies with OO models

Extensional slots are analogous to the relationship lines between class boxes in OO models, whereas taxonomic slots are analogous to the relationship lines in meta-models.

Instances of taxonomic relations (the actual hierarchical relationships in the model) are given by class definitions (instances of inheritance relations), and pattern diagrams (instances of subpatterns, augments relations). This might indicate that instances of other Cyc taxonomic relations might have interesting OO counterparts yet to be identified.

The OO analogues of intrinsic slots are supplied by the Class definitions. For example, EntryIsA is supplied by the type of the attribute; makesSenseFor is implicitly given by where the attribute is first defined in the class hierarchy (the attribute makes sense for that class and all its subclasses, but not for any of its superclasses or sibling classes).

Thinking about "subrelationing" the various relations could be interesting from an OO modelling perspective. How does subrelationing fit with inheritance and subclassing? What are useful subrelations of the taxonomic relations? The partOf relation could be viewed as a subrelation of uses, with the extra constraint that the part cannot be used by any other "external" object (it can be used by another part of the same object). The inheritance relation could be partitioned into conformant and non-conformant inheritance, or into more degrees of conformance. Are there any others?

Individual Objects versus Collections

Things are partitioned into Individual Objects and Collections.

A Collection is not structured, it just has a set of instances, with no further structure.

FredTheSkeletonsBones 
instanceOf(Collection) 
instances(FredsLeftThighBone FredsLeftKneeBone ...)
setOfPartsOf(FredTheSkeleton) 
...

An Individual Object is any Thing that is not a Collection. A (composite) Individual Object consists of its parts and the structured interrelationships between them.

FredTheSkeleton 
instanceOf(IndividualObject) 
parts(FredsLeftThighBone FredsLeftKneeBone ...)
setOfParts(FredTheSkeletonsBones)
structureConstraints(
    (Connected FredsLeftThighBone FredsLeftKneeBone) 
     ...) 
...

Hence assembling the parts differently results in a different individual object.

Stuff

If you take a piece of wood, and chop it up, you have lots of pieces of wood. If you take a table and chop it up, you don't have lots of tables. A table is an Individual Object, a piece of wood is Stuff. (Brooms are usually Individual Objects, but one once became Stuff.)

The rule to recognise Stuff is: "every portion of a piece of Stuff is a piece of the same type of Stuff". Stuff has granularity: you can't necessarily sub-divide it for ever. Once the size of an instance becomes comparable to the size of the granule, it stops being stuff-like.

Time, and Temporal Subabstractions

Time is Stuff-like: a bit of a time interval is a time interval.

An Event is a Thing that has temporal extent. All Tangible Objects are Events (they come into existence at some time, continue, and cease to exist at later time). The temporal extent need not be continuous: for example, the Event of reading a book might starts and stop many times. Hence the distinction between the interval during which the Event exists, and the duration of the event.

A Process is a stuff-like Event; any temporal slice of an instance of a process is an instance of the same process. For example, Walking, Eating, Breathing. Remember: stuff has granularity, and the grain size, or cycle time, of a process could be quite large.

Consider a particular Event (for example, a person). Some of its properties are probably changing over time. It might be valid to split the event into two events (for example, one before a particular time when a property changes, and another after that time). This fits in perfectly with the requirement for a Thing given above. For example, Child and Adult as specialisation of Person have this relationship: a particular person (space-time event) starts off as a child (shorter space-time event) and becomes an adult (when reaching a certain age or after performing a certain ritual). Child and Adult are each a temporal subabstraction of Person, as is PersonWhileEating, PersonAtWork, etc. [Cyc's Temporal Subabstractions are a bit like classical data modelling Entity Life Histories, but with more structure, and with inheritance properties.]

Cyc stores static facts: these do not change "state" (except during editing, or caching information during a query). Cyc doesn't need just the "current state": it has to remember history, too. If it needs to store information about the many states an abstraction goes through, each relevant state is represented as one of a temporal sequence of subabstractions. A change of state is a new fact.

Inheritance

Via Specialisation

Specialisation is like OO subclassing: specialised things "inherit" all the Slots of their superclasses, and can add new slots of their own (ones that don't makeSenseFor their ancestors). They also inherit any default values of these slots, which can be overridden.

Via other Slots

Things can inherit attributes along any slot, not just their generalisation (superclass) slot. Textual indentation is used to show inheritance. For example, in the following

Texas 
instanceOf(GeopoliticalRegion) 
topography(DesertLike, Hilly) 
soilQuality(Sandy, Rocky) 
... 

Hilly 
instanceOf(TopographyTrait) 
topographyOf(Appalachia Texas ...) 
    soilQuality(Rocky) 
...

Texas "inherits" its Rocky soil quality from Hilly, and its Sandy soil quality from DesertLike. Any thing related to Hilly by topographyOf will automatically inherit the slot soilQuality, with default value Rocky.

Specialising the Specialisation Relation

Any relation is a specialisation of Thing, and so can be further specialised. In particular, the Specialisation relation itself can be further specialised.

We have already seen an example of this. For example, there is something more in the relationship "Child is a specialisation of Person" than in "Battleship is a specialisation of Ship": every instance of Person is or has been an instance of Child. The first is an example of a Temporal Subabstraction, which is a specialisation of the Specialises relation. In Cyc there is a specialised version of the Specialisation slot, called subAbstractionType. This would point from Person to Child.

Structures

A composite IndividualObject may need multiple descriptions of its internal structure. Not only may the relationships between its parts differ according to the description, for example: TelephoneStructure-Physical, TelephoneStructure-Functional, the component parts themselves might also depend on the "view" or "model" being used, particularly when the object is very complex.

The Cyc project is searching for "broad but useful structural cliches". They intend to collect a few hundred widely used cliches, and use these to define primitives that will make it easier to represent structures. [Cyc's structures and structural cliches look very like OO's systems, frameworks and patterns -- they have analogues of framework specialisation and sub-frameworks.]

One particular interesting such cliche is a sequential collection of parts of the same type. In the simplest case part n+1 is related to part n. A more complex, Fibonacci-like structure, would have part n+1 related to parts n and n-1.