Computer Systems Research Institute
University of Toronto Toronto,
Ontario Canada
M5S 1A4
Following an age-old technique, the point of departure for much
recent work has been to attempt to impose some structure on the problem
domain. Perhaps the most significant difference between this work and earlier
efforts is the weight placed on considerations falling outside the scope
of conventional computer science. The traditional problem-reduction paradigm
is being replaced by a holistic approach which views the problem as an
integration of issues from computer science, electrical engineering, industrial
design, cognitive psychology, psychophysics, linguistics, and kinesthetics.
In the main body of this paper, we examine some of the taxonomies which
have been proposed and illustrate how they can serve as useful structures
for relating studies in user interface problems. In so doing, we attempt
to augment the power of these structures by developing their ability to
take into account the effect of gestural and positional factors on the
overall effect of the user interface.
One of the benefits of such a taxonomy is that it can serve as the
basis for systems analysis in the design process. It also helps us categorize
various user interface studies so as to avoid "apples and bananas" type
of comparisons. For example, the studies of Ledgard, Whiteside, Singer
and Seymour (1980) and Barnard, Hammond, Morton and Long (1981) both address
issues at the syntactic level. They can, therefore, be compared(which is
quite interesting since they give highly contradictory results) [2 ]. On
the other hand, by recognizing the "keystroke" model of Card, Moran and
Newell (1980b) as addressing the lexical level, we have a good way of understanding
its limitations and comparing it to related studies (such as Embley, Lan,
Leinbaugh and Nagy, 1978), or relating it to studies which address different
levels (such as the two studies in syntax mentioned above).
While the taxonomy presented by Foley and Van Dam has proven to be a useful tool, our opinion is that it has one major shortcoming. That is, the grain of the lexical level is too coarse to permit the full benefit of the model to be derived. As defined, the authors lump together issues as diverse as: how tokens are spelt (for example "add" vs "append" vs "a" vs some graphical icon)
These issues are sufficiently different to warrant separate treatment.
Grouping them under a single heading has the danger of generating confusion
comparable to that which could result if no difference was made between
the semantic and syntactic levels. Therefore, taking our cue from work
in language understanding research in the AI community, we chose to subdivide
Foley and Van Dam's lexical level into the following two components:
To illustrate the distinction, in the Keystroke model the number
of key pushes would be a function of the lexical structure while the homing
time and pointing time would be a function of pragmatics.
Factoring out these two levels helps us focus on the fact that the issues affecting each are different, as is their influence on the overall effect of the user interface. This is illustrated in examples which are presented later in this paper.
It should be pointed out that our isolation of what we have called pragmatic issues is not especially original. We see a similar view in the Command Language Grammar of Moran (1981), which is the second main taxonomy which we present. Moran represents the domain of the user interface in terms of three components, each of which is sub-divided into two levels. These are as follows:
- semantic level
The interaction level relates the user's physical actions to the
conventions of the interactions in the dialogue. The spatial level then
encompasses issues related to how information is laid out on the display,
while the device level covers issues such as what types of devices are
used and their properties (for example, the effect on user performance
if the locator used is a mouse vs an isometric joystick vs step-keys).
(A representative discussion of such issues can be found in Card, English
and Burr, 1978).
One subtle but important emphasis in Moran's paper is on the point that
it is the effect of the user interface as a whole (that is, all levels
combined) which constitutes the user's model. The other main difference
of his taxonomy, when compared to that of Foley and Van Dam, is his emphasis
on the importance of the physical component. A shortcoming, however, lies
in the absence of a slot which encapsulates the lexical level as we have
defined it above. Like the lexical level (as defined by Foley and Van Dam),
the interaction level of Moran appears a little too broad in scope when
compared to the other levels in the taxonomy.
While the paper makes some important points, it has a serious defect
in that it does not point out the limitations of the technique. The approach
does tell us something about the cognitive burden involved in the learning
of a query language. But it does not tell us everything. In particular,
the technique is totally incapable of taking into account the effect that
the means and medium of doing something has on our ability to remember
how to do it. To paraphrase McLuhan, the medium does affect the message.
Issues of syntax are not independent of pragmatics, but pencil-and-paper tests cannot take such dependencies into account. For example, consider the role of "muscle memory" in recalling how to perform various tasks. The strength of its influence can be seen in my ability to type quite effectively, even though I am incapable of telling you where the various characters are on my QWERTY keyboard, or in my ability to open a lock whose combination I cannot recite. Yet, this effect will never show up in a pencil-and-paper test. Another example is seen in the technique's inability to take into account the contribution that appropriate feedback and help mechanisms can provide in developing mnemonics and other memory and learning aids.
We are not trying to claim that such pencil-and-paper tests are not
of use (although Barnard et al, 1981, point out some important dangers
in using such techniques). We are simply trying to illustrate some of their
limitations, and demonstrate that lack of adequate emphasis on pragmatics
can result in readers (and authors) drawing false or misleading conclusions
from their work. Furthermore, we conjecture that if pragmatics were isolated
as a separate level in a taxonomy such as that of Foley and Van Dam, they
would be less likely to be ignored.
The basis of the technique is that the complexity of the grammar
is a good metric for the cognitive burden of learning and using the system.
Grammar complexity is measured in terms of number of productions and production
length. There is a problem, however, which limits our ability to reap the
full benefits of the technique. This has to do with the technique's current
inability to take into account what we call chunking. By this we mean the
phenomenon where two or more actions fuse together into a single gesture
(in a manner analogous to the formation of a compound word in language).
In many cases, the cognitive burden of the resulting aggregate may be the
equivalent of a single token. In terms of formal language theory, a non-terminal
when effected by an appropriate compound gesture may carry the cognitive
burden of a single terminal.
Such chunking may be either sequential, parallel or both. Sequentially, it should be recognized that some actions have different degrees of closure than others. For example, take two events, each of which is to be triggered by the change of state of a switch. If a foot-switch similar to the high/low beam switch in some cars is used, the down action of a down/up gesture triggers each event. The point to note is that there is no kinesthetic connection between the gesture that triggers one event and that which triggers the other. Each action is complete in itself and, as with driving a car, the operator is free to initiate other actions before changing the state of the switch again.
On the other hand, the same binary function could be controlled by a foot pedal which functions like the sustain pedal of a piano. In this case, one state change occurs on depression, a second on release. Here, the point to recognize is that the second action is a direct consequent of its predecessor. The syntax is implicit, and the cognitive burden of remembering what to do after the first action is minimal.
There are many cases where this type of kinesthetic connectivity can be bound to a sequence of tokens which are logically connected. One example given by Buxton (1982) is in selecting an item from a graphics menu and "dragging" it into position in a work space. A button-down action (while pointing at an item) "picks it up". For as long as the button is depressed, the item tracks the motion of the pointing device. When the button is released, the item is anchored in its current position. Hence, the interface is designed to force the user to follow proper syntax: select then position. There is no possibility for syntactic error, and cognitive resources are not consumed in trying to remember "what do I do next?". Thus, by recognizing and exploiting such cases, interfaces can be constructed which are "natural" and easy to learn.
There is a similar type of chunking which can take place when two or more gestures are articulated at one time. Again we can take an example from driving a car, where in changing gears the actions on the clutch, accelerator and gear-shift reinforce one another and are coordinated into a single gesture. Choosing appropriate gestures for such coordinated actions can accelerate their bonding into what the user thinks of as a single act, thereby freeing up cognitive resources to be applied to more important tasks. What we are arguing here is that by matching appropriate gestures with tasks, we can help render complex skills routine and gain benefits similar to those seen at different level in Card, Moran and Newell (1980a).
In summary, there are three main points which we wish to make with this example:
From the application programmer's perspective, this is a valuable
feature. However, for the purposes of specifying systems from the user's
point of view, these abstractions are of very limited benefit. As Baecker
(1980b) has pointed out, the effectiveness of a particular user interface
is often due to the use of a particular device, and that effectiveness
will be lost if that device were replaced by some other of the same logical
class. For example, we have a system (Fedorkow, Buxton & Smith, 1978)
whose interface depends on the simultaneous manipulation of four joysticks.
Now in spite of tablets and joysticks both being "locator" devices, it
is clear that they are not interchangeable in this situation. We cannot
simultaneously manipulate four tablets. Thus, for the full potential of
device independence to be realized, such pragmatic considerations must
be incorporated into our overall specification model so that appropriate
equivalencies can be determined in a methodological way. (That is, in specifying
a generic device, we must also include the required pragmatic attributes.
But to do so, we must develop a taxonomy of such attributes, just as we
have developed a taxonomy of virtual devices.)
Figure 1: Taxonomy of Input Devices.
Continuous manual input devices are categorized. The first order categorization is property sensed (rows) and number of dimensions (columns). Subrows distinguish between devices that have a mechanical intermediary (such as a stylus) between the hand and the sensing mechanism (indicated by "M"), and those which are touch sensitive (indicated by "T"). Subcolumns distinguish devices that use comparable motor control for their operation.
To begin with, the tableau deals only with continuous hand controlled
devices. (Pedals, for example, are not included for simplicity's sake.)
Therefore the first (but implicit) questions in our structure are:
Note that the primary rows and columns of the matrix are subdivided,
as indicated by the dotted lines. The sub-columns exist to isolate devices
whose control motion is roughly similar. These groupings can be seen in
examining the two-dimensional devices. Here the tableau implies that tablets
and mice utilize similar types of hand control and that this control is
different from that shared in using a light-pen or touch-screen. Furthermore,
it is shown that joysticks and trackballs share a common control motion
which is, in turn, different than the other subclasses of
two-dimensional devices.
The rows for position and motion sensing devices are subdivided in order to differentiate between transducers which sense potential via mechanical vs touch-sensitive means. Thus, we see that the light-pen and touch-screen are closely related, except that the light-pen employs a mechanical transducer. Similarly, we see that trackball and TASA touch-pad [3] provide comparable signals from comparable gestures (the 4" by 4" dimensions of the TASA device compare to a 3 1/2" diameter trackball).
The tableau is useful for many purposes by virtue of the structure which it imposes on the domain of input devices. First, it helps in finding appropriate equivalencies. This is important in terms of dealing with some of the problems which arose in our discussion of device independence. For example, we saw a case where four tablets would not be suitable for replacing four joysticks. By using the tableau, we see that four trackballs will probably do.
The tableau makes it easy to relate different devices in terms of metaphor.
For example, a tablet is to a mouse what a joystick is to a trackball.
Furthermore, if the taxonomy defined by the tableau can suggest new transducers
in a manner analogous to the periodic table of Mendeleev predicting new
elements, then we can have more confidence in its underlying premises.
We make this claim for the tableau and cite the "torque sensing" one-dimensional
pressure-sensitive transducer as an example. To our knowledge, no such
device exists commercially. Nevertheless
it is a potentially useful device, an approximation of which has been
demonstrated by Herot and Weinzaphel (1978).
Finally, the tableau is useful in helping quantify the generality of various physical devices. In cases where the work station is limited to one or two input devices, then it is often in the user's interest to choose the least constraining devices. For this reason, many people claim that tablets are the preferred device since they can emulate many of the other transducers (as is demonstrated by Evans, Tanner and Wein, 1981). The tableau is useful in determining the degree of this generality by "filling in" the squares which can be adequately covered by the tablet.
Before leaving the topic of the tableau, it is worth commenting on why a primary criterion for grouping devices was whether they were sensitive to position, motion or pressure. The reason is that what is sensed has a very strong effect on the nature of the dialogues that the system can support with any degree of fluency. As an example, let us compare how the user interface of an instrumentation console can be affected by the choice of whether motion or position sensitive transducers are used. For such consoles, one design philosophy follows the traditional model that for every function there should be a device. One of the rationales behind this approach is to avoid the use of "modes" which result when a single device must serve for more than one function. Another philosophy takes the point of view that the number of devices required in a console need only be in the order of the control bandwidth of the human operator. Here, the rationale is that careful design can minimize the "mode" problem, and that the resulting simple consoles are more cost-effective and less prone to breakdown (since they have fewer devices).
One consequence of the second philosophy is that the same transducer must be made to control different functions, or parameters, at different times. This context switching introduces something known as the nulling problem. The point which we are going to make is that this problem can be completely avoided if the transducer in question is motion rather than position sensitive. Let us see why.
Imagine that you have a sliding potentiometer which controls parameter A. Both the potentiometer and the parameter are at their minimum values. You then raise A to its maximum value by pushing up the position of the potentiometer's handle. You now want to change the value of parameter B. Before you can do so using the same potentiometer, the handle of the potentiometer must be repositioned to a position corresponding to the current value of parameter B. The necessity of having to perform this normalizing function is the nulling problem.
Contrast the difficulty of performing the above interaction using a
position-sensitive device with the ease of doing so using one which senses
motion. If a thumb-wheel or a treadmill-like device was used, the moment
that the transducer is connected to the parameter it can be used to "push"
the value up or "pull" it down. Furthermore, the same transducer can be
used to simultaneously change the value of a group of parameters, all of
whose instantaneous values are different.
The example is not isolated. In fact, just as strong an argument
could be made for adopting a model based on a vertical structure as the
horizontal ones which we have discussed. Models based on interaction techniques
such as those described in Martin (1973) and Foley, Wallace and Chan (1981)
are examples. With them, the primary gestalt is the transaction, or interaction.
The user model is described in terms of the set and style of the interactions
which take place over time. Syntactic, lexical and pragmatic questions
become sub-issues.
Neither the horizontal or vertical view is "correct". The point is that both must be kept in mind during the design process. A major challenge is to adapt our models so that this is done in a well structured way. That we still have problems in doing so can be seen in Moran's taxonomy. Much of the difficulty in understanding the model is due to problems in his approach in integrating vertically oriented concepts (the interaction level) into an otherwise horizontal structure.
In spite of such difficulties, both views must be considered. This is
an important cautionary bell to ring given the current trend towards delegating
personal responsibilities according to horizontal stratification. The design
of a system's data-base, for example, has a very strong effect on the semantics
of the interactions that can be supported. If the computing environment
is selected by one person, the data-base managed by another, the semantics
or functional capability by another, and the "user interface" by yet another,
there is in inherent danger that the decisions of one will adversely affect
another. This is not to say that such an organizational structure cannot
work. It is just imperative that we be aware of the pitfalls so that they
can be avoided. Decisions made at all levels affect one another and all
decisions potentially have an effect on the user model.
The work reported has made some contribution towards an understanding
of the effect of issues which we have called pragmatics. It is, however,
a very small step. While there is a great deal of work still to be done
right at the device level, perhaps the biggest challenge is to develop
a better understanding of the interplay among the different levels in the
strata of a system. When we have developed a methodology which allows us
to determine the gesture that best suits the expression of a particular
concept, then we will be able to build the user interfaces which today
are only a dream.