Evolution of a Reactive Environment
Jeremy R. CooperstockDept. of Electrical & Computer Engineering
University of Toronto
Toronto, Ontario M5S 1A4
+1-416-978-6619
jer@dgp.toronto.eduKoichiro Tanikoshi*, Garry Beirne, Tracy Narine, William Buxton**†
Computer Systems Research Institute
University of Toronto
Toronto, Ontario M5S 1A4
+1-416-978-0778
{tanikosi, garry, tracyn, butxton}@dgp.toronto.edu*Author is visiting from Hitachi Research Laboratory, Hitachi Ltd., Japan.
**†Author is Principal Scientist, User Interface Research, Alias Research Inc., Toronto, Ontario.
ABSTRACT
A basic tenet of "Ubiquitous computing" (Weiser, 1993 [13]) is that technology should be distributed in the environment (ubiquitous), yet invisible, or transparent. In practice, resolving the seeming paradox arising from the joint demands of ubiquity and transparency is less than simple. This paper documents a case study of attempting to do just that. We describe our experience in developing a working conference room which is equipped to support a broad class of meetings and media. After laying the groundwork and establishing the context in the Introduction, we describe the evolution of the room. Throughout, we attempt to document the rationale and motivation. While derived from a limited domain, we believe that the issues that arise are of general importance, and have strong implications on future research.KEYWORDS: case studies, CSCW, intelligent systems, reactive environments, home automation, design rationale, office applications
After laying the groundwork and establishing the context, we describe the evolution of the room in more-or- less chronological order. We trace the development of the room from manual control, to manually-driven computer control, to context-sensitive reactive automation -- all the while striving towards the goal of simultaneous ubiquity and invisibility. What we present is not a simple "show and tell" story. Throughout, we attempt to document the rationale and motivation. Since these derive from the observations of the specifics of use, our story is somewhat bound in the details of the application and technology of the case study. While derived from a limited domain, we believe that the issues that arise are of general importance, and have strong implications on future research.
In contrast to work such as Colab (Stefik, Foster, Bobrow, Kahn, Lanning & Suchman, 1987 [10]), our research has focussed mainly on supporting social transactions centered on the offices of the individuals, rather than meeting rooms. Increasingly, however, we have been working towards providing an integrated foundation to support both meeting room and office based collaborative work.
FIGURE 1. Conference Room Equipment. Electronic attendees are given a choice of locations appropriate to the variety of social roles in meeting scenarios.
In the process, we were strongly influenced by the emerging ideas
of ubiquitous computing and augmented reality. In particular, we were interested
in exploring their affordances in preserving the fidelity of conventional
social distance/place/function relationships. An early example of this
was the use of video "surrogates" employed in our four-way round-the-table
video conferencing system, Hydra (Sellen, Buxton & Arnott, 1992 [9]).
The driving function here is the notion that for each location in architectural
space for which there is a distinct social function, the affordances should
be provided to enable that function to be undertaken from that location
by any party, be they attending physically or electronically.
We will see examples of this in practice later in the paper. For our purposes now, note the physical distribution of technology that this implies; hence, our interest in the Ubicomp approach. The inevitable problem that arises, however, is as follows: once the equipment implied by this approach is deployed, how can its use and potential ever be accessed within the user community's "threshold of frustration?" To a large extent, the rest of this paper documents our efforts to answer this question. In our attempts, we learned a lot about what worked and what did not, as well as methods of testing ideas and designs.
Even if all of these issues are resolved for the local audience, what does the remote person see, and how can one be sure that this is what the presenter intended? These, and a myriad of related problems confront the baffled user. We have not even addressed the basic issue of how the user turned on all of the equipment in the room, initially. Where are all the switches and controls?
While our usage studies indicated that we were trying to incorporate the correct functionality and deploying the components in more-or-less the right locations, our work had not really begun. Regardless of the tremendous potential existing in the room, if the complexity of its use was above the threshold of the typical user, the functionality simply did not exist in any practical sense.
In this section we describe the design motivation behind each iteration, discuss the solution taken, and evaluate the results. It should be noted that our evaluation was informal, based on personal experiences and anecdotal evidence.
Using this implementation, it was realized through "breakdowns" in meetings that modifications were required. For example, due to the placement of the video surrogate at the front of the room, remote attendees often spent the whole meeting watching the back of the presenter's head. At the same time, local attendees were distracted from the presenter due to the inappropriate location of the remote participant(s) at the front of the room, in the speaker's space. (Note that this situation is the norm in an embarrassingly large number of videoconferencing rooms.) It was clear that different locations of video surrogates were needed for the different social roles of meeting attendees.
At this stage, the user interface consisted of the set of physical connections between devices themselves. This meant that in order for a presenter to realize a goal, such as "record my presentation," it was first necessary to determine which devices to activate, and then make the appropriate connections between them. Figure 2 depicts the user interaction with various devices. The cognitive effort required by the user in order to achieve the high-level goal through direct device manipulation is considerable.
FIGURE 2. Complexity of First Iteration Interface. The inter-device lines represent physical patchbay connections, which the user was required to make.
FIGURE 3. Matrix-based interface for controlling equipment (virtual graphical patchbay).
Each row corresponds to a source device (eg. camera, VCR output) and each column to a destination (eg. visitor view of room, video monitor). By clicking the mouse on entry , an audio or video switch would make a connection between sourcei and destinationj. This resulted in considerable time savings, because the user could now establish connections through a graphical user interface, rather than physical wire. However, as depicted in Figure 4, since the user was still responsible for all device connections, the cognitive effort remained high.
FIGURE 4. Complexity of Second Iteration Interface. The solid lines represent user interaction and the dashed lines represent tasks performed by the user interface. Note that the user is still responsible for inter-device connections, now made through the graphical user interface.
FIGURE 5. Presets Menu (DAN). As shown, the Hi-8 video is currently being viewed and the user is considering the selection of the desktop video deck instead.
FIGURE 6. Complexity of Third Iteration Interface, using presets. Now, the user can ignore details of device representation and location, However, presets can be confusing, especially when there is more than one way to accomplish a subgoal.
At this stage, our work was addressing the problems of control at
essentially the same level as many commercial room control systems, such
as ADCOM's iVue (ADCOM Electronics Inc., 1994 [1]) and AMX's AXCESS systems
(AMX Corporation, 1993 [2]).
To illustrate by example, suppose we wish to view a remote participant on monitor 5, and provide this surrogate with the output of our document camera. Pressing the button associated with the surrogate and the button associated with monitor 5 would establish the first connection. The second connection would be formed by pressing the surrogate button and the document camera button. Since the computer knows that monitor 5 is an output-only device and the document camera is input-only, there is no ambiguity as to which connections are intended.
This implementation partly addresses the diagnostics problem of previous iterations through the use of different light states. While the system is working to effect a change, the flashing light indicates to the user that the action is being performed. If a light continues to flash long after a connection has been attempted, a problem exists at a lower level of the system. Obviously, it would benefit us to add diagnostics at these levels as well.
A possible disadvantage of these modules is that they require the user to walk around the room in order to make connections. As an enhancement to this approach, we envision using a laser pointer to point to sources and "drag" them to their destination devices (see Figure 7). As a simple example, one could point to a VCR to select it as input, then drag it to one of the monitors for output. Most of the standard connections necessary during presentations could be accomplished in this manner. In order to provide this capability, we will be installing two calibrated laser detectors to cover the front and back of the conference room. This pointer-based connection process, shown in Figure 7, could provide efficient device selection without the need for the presenter to change location.
FIGURE 7. Conference Room in use. The speaker is using a laser pointer to select a camera view for the remote visitor.
While the buttons and lights modules offer a substantial gain in simplicity, they cannot adequately replace the high-level control of presets provided in the previous iteration. Users may be reluctant to press five buttons (or point to three devices) in order to play a video tape to local and remote conference participants, when a single preset selection would suffice.
To provide a mechanism for such behaviour, the integration of sensors with various devices was required. The output of these sensors allows the computer to determine when certain actions should be taken by the environment, or, in other words, how the environment should react. We call this resulting system a reactive environment. Our reactive environment consists of a set of tools, each of which reacts to user-initiated events. For each tool incorporated into our environment, we must keep in mind the following issues of invisibility, seamlesness, and diagnostics:
FIGURE 8. The Xerox PARC Tab.
Through various sensors, the room can detect most actions that will precede the use of a remote control, and issue the appropriate commands itself, using the infrared transmitter.
Since the user does not need to interact with the computer, nor manipulate remote controls to turn on or configure equipment appropriately, the tool which performs these tasks is completely invisible. In our prototype environment, manual use of remote control units is unnecessary, except on rare occasions where the user wishes to override normal system behaviour.
Knowledge of whether or not remote participants are involved in a conference is obtained by checking the status of the outside line. VCR functions (eg. play, record, stop) are monitored by polling the VCR interface for user-initiated commands. When a function is selected, our environment can react by establishing the required connections between video sources and VCR inputs, or video destinations and VCR outputs, as appropriate. From the users's perspective, the interface is invisible, since no explicit action beyond pressing the VCR's play or record button was required.
Fortunately, selection of the document camera view can be automated easily. Using basic image analysis, we can determine whether or not a document is presently under the camera, and whether or not there is motion under the lens. When either a document or motion is detected, the environment reacts by sending the output from the document camera to the display monitor as well as to any remote participants. If no document is detected over a certain timeout period, then the camera is deselected. Again, the tool is invisible. The simple act of placing a document under the camera is sufficient to activate the device To provide a mechanism for seamless manual override, we also wanted a method to force the "re-selection" of the document camera. Our solution was very simple. Whenever document motion is detected after a period of inactivity, the document camera is again selected, regardless of its (assumed) current state.
Once again, this selection can be automated trivially with the help of a contact sensor on the light pen. Whenever the pen is held, the environment reacts by selecting the Macintosh display automatically and sending this view to remote conference participants as appropriate.
FIGURE 9. The Digital White Board in use.
We have adopted a more elegant solution, which requires no additional equipment beyond a video camera and monitor on the remote end, yet which allows the remote participant far more control over the received view. We treat the remote monitor as a window through which the local room can be viewed. Applying a head-tracking algorithm to the video signal, we can determine the position of the remote participant's face in relation to his or her monitor. This position is then used to drive a motorized video camera locally. When the remote participant peers to the left or right, the local camera pans accordingly. Similarly, when the remote participant moves closer to or further from the monitor, the local camera zooms in or out.
A standard issue, shared by those working on intelligent agents, is how to deal with exceptions. How do different users specify different behaviours, or how can the system adapt to the evolving desires or expectations of the user?
In another direction, if the room is to make informed decisions based on context, can there be an application- independent architecture for accommodating the shared cross-application knowledge base and heuristics according to which such decisions are made?
Ubiquitous computing holds great promise. For the first time, we have a model that does not confront us with the classic strength vs. generality tradeoff. We no longer have to choose between strong-specific systems and weak- general ones. With Ubicomp, we have the promise of both strength and generality by virtue of the combined power of a family of strong-specific systems working in concert. But the risk is that while any member of the family is easy to use due to its specific nature, complexity and cognitive load may remain the same or increase, by virtue of coordination overhead. In this case, load is simply transferred, not off-loaded.
Our case study attempts to solve this problem. By appropriate design, the complexity of coordination can be relegated to the background, away from conscious action. The intent of this exercise is to begin paving the foundation for an existence proof that useful background processing can be carried out by context-sensitive reactive systems. That being the case, our hope is that this work will stimulate research that will make this capability available sooner rather than later.
We also thank Rich Gold, Roy Want, and Norman Adams of Xerox PARC for help with the PARC Tab and Mike Ruicci of CSRI for his outstanding technical support. Special thanks are due to Sidney Fels for the design of the buttons and lights modules, as well as many hours of insightful discussion. Finally, we are greatly indebted to the members of the various research groups who make up the user community of the room. Their patience and feedback has been essential to our work.
This research has been undertaken as part of the Ontario Telepresence
Project. Support has come from the Government of Ontario, the Information
Technology Research Centre of Ontario, the Telecommunications Research
Institute of Ontario, the Natural Sciences and Engineering Research Council
of Canada, Xerox PARC, Bell Canada, Alias Research, Sun Microsystems, Hewlett
Packard, Hitachi Corp., the Arnott Design Group and Adcom Electronics.
This support is gratefully acknowledged.
2. AMX Corporation (1993). Advanced Remote Control Systems, Product Information.
3. Bly, S., Harrison, S. & Irwin, S. (1993). Media Spaces: bringing people together in a video, audio and computing environment. Communications of the ACM, 36(1), 28-47.
4. Buxton, W. & Moran, T. (1990) EuroPARC's Integrated Interactive Intermedia Facility (iiif): Early Experience, In S. Gibbs & A.A. Verrijn-Stuart (Eds.). Multi-user interfaces and applications, Proceedings of the IFIP WG 8.4 Conference on Multi-user Interfaces and Applications, Heraklion, Crete. Amsterdam: Elsevier Science Publishers B.V. (North-Holland), 11-34.
5. Elrod, S., Bruce, R., Gold, R., Goldberg, D., Halasz, F., Janssen, W., Lee, D., McCall, K., Pedersen, E., Pier, K., Tang, J. & Welch, B. (1992). Liveboard: A large interactive display supporting group meetings, presentations and remote collaboration, Proceedings of CHI'92, 599-607.
6. Elrod, S., Hall, G., Costanza, R., Dixon, M. & Des Rivieres, J. (1993) Responsive office environments. Communications of the ACM, 36(7), 84-85.
7. Mantei, M., Baecker, R., Sellen, A., Buxton, W., Milligan, T. & Wellman, B. (1991). Experiences in the use of a media space. Proceedings of CHI '91, ACM Conference on Human Factors in Software, 203- 208. Reprinted in D. Marca & G. Bock (Eds.)(1992). Groupware: software for computer-supported collaborative work. Los Alamitos, CA.: IEEE Computer Society Press, 372 - 377.
8. Riesenbach, R. (1994). The Ontario Telepresence Project, CHI'94 Conference Companion, 173-174.
9. Sellen, A., Buxton, W. & Arnott, J. (1992). Using spatial cues to improve videoconferencing. Proceedings of CHI '92, 651-652. Also videotape in CHI '92 Video Proceedings.
10. Stefik, M., Foster, G., Bobrow, D., Kahn, K., Lanning, S. & Suchman, L. (1987). Beyond the chalkboard: Computer support for collaboration and problem solving in meetings. Communications of the ACM, 30(1), 32-47.
11. Vicente, K. & Rasmussen J. (1990). The Ecology of Human-Machine Systems II: Mediating "Direct Perception" in Complex Work Domains, Ecological Psychology, 2(3), 207-249.
12. Want, R., Hopper, A., Falcao, V., & Gibbons, J. (1992). The Active Badge Location System. ACM Transactions on Information Systems, 10(1):91-102.
13. Weiser, M. (1993). Some Computer Science Issues in Ubiquitous Computing, Communications of the ACM, 36(7), 75-83.
14. Wellner, P., Mackay, W. & Gold, R. (Eds.)(1993). Computer-Augmented Environments: Back to the real world. Special Issue of the Communications of the ACM, 36(7).
Copyright ACM: The documents contained in these directories are
included by the contributing authors as a means to ensure timely dissemination
of scholarly and technical work on a non-commercial basis. Copyright and
all rights therein are maintained by the authors or by other copyright
holders, notwithstanding that they have offered their works here electronically.
It is understood that all persons copying this information will adhere
to the terms and constraints invoked by each author's copyright. These
works may not be reposted without the explicit permission of the copyright
holder.