Academic Pages:
Previous Studies

Personal Pages:

Many human-computer interaction (HCI) researchers are interested in the benefits of using sound in computer interfaces. They will often wish to test new interface ideas on a group of people, so these researchers can judge the worth of new research. However, bad test results are often interpreted as due to a poor interface, and not necessarily due to poor performance of individuals resulting from their possibly low audio capabilities. The focus is on the computer-side of HCI, not on the human.

So, maybe it is also important to look at the audio capabilities of the people who use a new interface before we judge the interface itself. Only this way can we be sure that any comparisons between audio interfaces are made on an equal basis.

Existing Measures of Human Capability

Some researchers already attempt to control for variations across their population of test participants. This is usually done by classifying into two distinct groups, musicians and non-musicians. The reasoning behind this is that 'musicians' must have better audio capabilities than 'non-musicians' as they 'appreciate' sounds more than 'non-musicians'. As can be seen, this reasoning is flawed, as many 'non-musicians' do actively enjoy listening to sounds and can distinguish between subtle differences of sound just as well as (sometimes even better than) their 'musician' counterparts.

Of course, it all depends on how you define the words 'appreciate', 'musician' and 'non-musician'. A number of defining attributes for the set of musicians may include people who:

have 'musical ability'
good 'musicianship'

play an instrument


actively listen to music

have musical qualifications

Throughout the audio-interface research literature you will find different researchers using any number of these attributes to define their musician set, sometimes in conjunction with a time period (e.g. people who have played an instrument for at least x years; people who sing at least twice a week, etc). Of course, this means the classifications used in the research community are arbitrary and inconsistent resulting in a non-standard classification of 'musicians'. Furthermore, dividing participants into separate groups like this becomes pointless as it no longer controls for variation across individuals' ability but actually increases variation across the results of different studies. It is inappropriate and ineffective. It also adds confusion and bias to the interpretation of results and misleads other researchers who may wish to evaluate the success of a particular audio system.

However, before we rush to standardise on one of these defintions, we should question whether classifying people as 'musicians' or 'non-musicians' is a good idea in the first place. It can be argued that many people who have played an instrument almost all their life are still not very good with sound. Conversely, it is entirely possible for someone who has never played a musical instrument, sang properly or taken any qualifications to be able to notice subtle nuances of the sounds around them. What we really need is a suitable standard pre-test which can somehow provide an indication of an individual's sound capabilities.

In Search of a Suitable Standard Pretest

A suitable pretest is required to measure individuals' sound abilities. It needs to measure the human qualities which are relevant to interpreting the sounds used in audio-interface research. Furthermore, it ideally needs to be a comparative (i.e. numerical) measure so that comparisons across individuals can be made.

There are a number of existing psycho-acoustic tests which can be used to measure simple perceptual abilities, such as the ability to distinguish between two sounds of very close frequency, or similar intensity. If these sorts of tasks are typical of the style of auditory tasks present in an audio interface, then these existing psyco-acoustic tests should be used as the pretest to interface testing.

However, some audio interfaces make use of more structured sound so for these, an alternative pretest is required. A number of musical aural tests were investigated as potential pretests, but none of these existing ear tests were suitable. The main problems with tests such as the UK's Associated Board of the Royal Schools of Music aural tests, Bentley's Musical Ability tests and software-based tools such as EarMaster and EarPower are:

  • Understanding of Musical Terminology and Knowledge
    All the tests mentioned above rely on the participant's understanding of musical terms. Examples include:
    • Knowledge of musical chord descriptions.
    • Understanding of terms such as interval, crotchet and common-time.
    • Knowledge of the layout of notes on a piano keyboard, guitar fingerings and/or music notation.
  • Reliance on Performance Ability
    Many of the tests involve singing back a melody, clapping out a rhythm, beating out a meter or even playing an instrument.

Consequently, there is a need for a pre-test which tests for structured sounds without relying on participants' existing knowledge or their performance ability (as an ability to perform is not the same as having a 'good ear'). However, before development of a new test can begin, it is necessary to decide which 'musical' tasks will be appropriate. As music is multi-dimensional, is it better to test music as a whole or test its component parts? Some may argue that music exists as a whole and therefore any tests should present 'full musical material'. Others prefer to test the component parts of music as they believe a number of distinct skills are involved. For example, Seashore looked for component talents in his measures, whilst Bentley restricted himself to the basic abilities he believed were important to music performance.

In order to develop a component-based test of musical aptitude, it can be useful to examine psychological knowledge of musical development in children. Bentley used this approach in the design of his tests and it is important to highlight some of the key points here.

Musical Development of Children

As young children develop, they begin to respond to different sound qualities (timbres). This is closely followed by attention to rhythmical patterns and subsequently melody. Their memory of pitch changes develops from simple up/down movements to more precise interval recognition and recall. As rhythmical ability starts to develop before pitch accuracy, there is usually a tendency to match against rhythm rather than pitch when responding to melody.

Memory span also increases with repeated exposure to familiar tunes. This in turn encourages an ability to compare musical stimulae against learnt tunes. Clarification develops where a child becomes interested in detail, and can spot errors in either pitch or time intervals between notes. To be able to accomplish this, they must have already gained an ability to distinguish between different pitches and note durations.

Individuals progress through this musical development at different rates and stop at different points. The quality and span of their musical memories can also vary considerably. It would therefore be of interest to incorporate various memory elements into the musical aptitude tests whilst testing individuals' pitch and rhythm abilities (both separately and combined).

The Design of the Musical Aptitude Tests

After careful consideration, a collection of eight tasks has been designed to measure a number of component musical abilities. The purpose of each task is to judge for a given participant, the extent to which the ability in question has developed. Where appropriate, the tasks also attempt to determine the participant's memory capabilities within the context of the task.

The general form of each task is as follows. A test stimulus is presented to the participant twice, after which a number of specimen stimulae are made available for inspection. The participant's task is to identify the one specimen which is identical to the test stimulus. To avoid individual comparisons against the test, the participant is not able to hear the test stimulus after its initial two playings.

The specimens are chosen in such a way that differences between the test and specimen stimulae range from gross differences to small, subtle distinctions. As the tasks progress, the distinctions between specimens and test sounds become finer and the number of specimens may increase. Therefore it is possible not only to see the proportion of correct answers a participant can provide, but also to find the level of subtlety of nuance at which they begin to confuse similar sounds.

More detailed overviews and demonstrations of each task are available.

Five Measurable Abilities

The results of the eight tasks are collected together to form five scores (pitch, rhythm, harmony, pitch/rhythm combined, dynamics). These scores are expressed as a ratio to the mean of the scores produced from a 30-user pilot study. From this, a five-quotient profile is generated which details an individual's ability in the five areas listed above.

For example, someone with a profile of '+,-,+,-,+' would have an above-average aptitude for pitch, harmony and dynamics, but below-average rhythm and pitch/rhythm skills. Conversely, a person with a profile '-,+,-,-,-' would be only above-average in rhythm.

Application of the MAT Profiles

By way of an example of a typical application of the MAT profiles, imagine a new audio interface has been developed by a team of researchers. The researchers run a user study to test how effectively the users can use the interface. This produces a wide range of performance scores. By carrying out a MAT test on each study participant, it is now possible to correlate the scores from the interface study with the MAT profiles of each individual.

Consequently, it is now possible to analyse the results for any strong correlations between the study scores and any of the five general abilities measured by the MAT tests. This may reveal that the performance of a participant using the new auditory interface heavily relies on their rhythmical ability (for example).

MAT scores could also be useful when a number of different styles of auditory interface are available to a user. For the sake of argument, assume a user has a strong aptitude for pitch. Suppose also that there are three new interfaces which have been thoroughly tested with a number of users. Each interface has shown to be more effective with users respectively having strong rhythm, pitch and dynamics abilities. Therefore, the second of these interfaces is likely to be the most effective for the hypothetical user with strong pitch.


The MAT tests as outlined above have been developed to provide a standard, effective way of determining some of the sound capabilities of individuals. These tests have been evaluated during a 30-user pilot study and have shown to be usable. However, further improvements can be made, such as making the duration of the overall test less and reducing the number of 'extreme' questions (i.e. those deemed far too easy or far too difficult).

Ideally, a larger user study is required before the suite of tests can be released as a 'standard testing procedure for auditory interface research'. This requires a large amount of assistance and feedback from the auditory research community in general. Any suggestions, offers of help (either in participating in a future study or organising a regional contribution to a larger study) or any other assistance are welcome. Please feel free to contact Alistair Edwards in this regard.