Traditional Human-Computer Interaction

For human-computer interaction to take place, it is necessary for both humans and computers to present information to each other in a form that is easily recognisable.

From the user's point of view, presenting information involves the inputting of data into the computer. Traditionally this has been achieved through a keyboard and mouse, but is slowly expanding to include pen, voice and tactile input devices. However, even these input media may be little use to people with limited physical abilities, maybe resulting from neuro-muscular disorders.

The portrayal of information by computer is more often than not achieved visually, through a graphical user interface. However, for many users of computers the visual medium is a poor choice. Many elderly people (who are becoming a sizeable proportion of the population) with visual impairment, along with blind people, find the heavy reliance on the visual medium a major problem.

Even those people who can interpret visual information may need to rely on another medium for certain tasks; for example, aircraft pilots may be visually overloaded by a large number of dials. Presenting information aurally may help to reduce this overload and allow the pilot to focus his visual attention on the external view. Consequently, the area of auditory display is increasingly becoming recognised as an important part of human-computer interaction.

Traditional Auditory Approaches

There are numerous ways of transmitting information through the auditory channel, the most obvious being speech. Many blind users have benefitted considerably from speech-driven interfaces with the surge in popularity of speech recognition tools assisting greatly. Speech synthesis too plays an important role in presenting information to blind people. Text-based interfaces can be read aloud as can text found in graphical interfaces.

However, speech can be very unwieldy when presenting certain types of information; it is also reliant on the speech processing ability of the user. The heavily visual nature of many graphical interfaces make description by speech unwieldy. To overcome the limitations of speech, a number of auditory devices have been developed which make use of non-speech sounds.

These non-speech audio methods tend to associate particular sounds with specific events or objects as discussed in another section. Such associations are often arbitrary and must be learnt. Their descriptive power is usually limited to simple forms of message.

A Potential Alternative Approach?

Information is often highly structured. Even simple command-based instructions have a level of structure and grammaticality associated with them. Other structured systems are found in human-computer interaction too and are often open systems. The most obvious is natural language, found in most textual information.

The structure of language utterances can be described using a set of grammar rules. Similar rules exist which describe the structures of music. Therefore, if structural information (such as natural language) is used in an interaction and needs to be communicated aurally, music may be a suitable way to present this information. Moreover, the grammaticality of music could be extremely useful in intuitively determining correct or incorrectly formed structures.

By studying the structures of language and music, and the grammar rules which describe them, it is hoped that basic mappings between similar structures can be designed. This in turn should lead to effective methods of communicating structured information between humans and computers through sound. Furthermore, a number of applications of such an approach could result.

For example, people with speech disabilities could use music to interact with a computer. This could be through a MIDI instrument connected directly to the computer, or through a separate instrument picked up through a microphone. Furthermore, the use of whistling or even singing may be possible even for people with speech problems. Once information is transferred to a computer, it could be converted to natural language as either text or speech. This could form the basis of a simple translation device.

It is speculated that language-impaired users may also gain from a musical description of language. As music and language are thought to activate different parts of the brain, it might be possible to design aids which can help to reinforce linguistic ability through the use of mappings to musical expressions which can be processed successfully. The viability of music-based aids is unknown, but it would be fascinating to see how far language modelling of this nature could go.


These pages originally belonged to John Hankinson, but are now maintained by Alistair Edwards (alistair@cs.york.ac.uk)

21st February 2002