The Future of Human-Computer Interaction

The Future of Human-Computer Interaction

or

How I learned to stop worrying and love my Intelligent Room

Michael H. Coen, MIT Artificial Intelligence Lab

Predicting the future is notoriously difficult. Suppose 100 years ago someone suggested that every bedroom in the United States would soon have a bell that anyone in the world could ring anytime, day or night. Would you have believed it? Nevertheless, the telephone caught on and has become a technology conspicuous only by its absence.

Now, I find myself in a similar position. Having spent the past four years immersed in the future as part of the MIT AI Lab's Intelligent Room project, I have gained an inkling of what is to come in the next fifty. I have taken to heart the advice of Alan Kay, "the best way to predict the future is to invent it." Of course, this is not a solo performance, and the cast of fellow prognosticators (i.e., researchers and inventors) who work on similarly futuristic environments such as the Intelligent Room has grown markedly. (See the Proceedings of the 1998 AAAI Spring Symposium on Intelligent Environments, for example.¹) It is interesting to note that this cast is equally divided among industrial and academic research labs. There are, I think, two reasons for this: (1) intelligent rooms are an incredibly exciting testbed in which to do research (read the rest of this essay to find out why), and (2) there is a lot of money to be made. Intelligent Rooms promise to have the ubiquity of televisions and the upgradability of PCs; you do the math.

My starting – and I think quite uncontroversial – premise for my research is computers are not particularly useful. Of course, scientists and engineers adore them, as do a growing cadre of web surfers, e-mailers, and online socialites. And certainly, computation is a fundamental component of our society's technical, financial, and industrial infrastructures; machines are outstanding data processing slaves. But in terms of raising our quality of life, computers have a very far road to travel before becoming as essential as the lightbulb, indoor plumbing, or pillow.

The reason is obvious. Computers are generally used for things that are computational, such as reading e-mail, and for most of us, the majority of our lives are spent doing non-computational things, such as taking baths and eating dinner. Most people spend their time in the real world, not in the much ballyhooed realms of cyberspace, and as a description of the current utility of computation, I propose the following observation: the value of a computer decreases with the square of your distance from its monitor. That being said, computation will not become sociologically essential until computers are connected to the human-level events going on in the real world around them – until they are engrained in and conversant with our ordinary state of affairs. Borrowing once more from Kay, "the computer revolution hasn’t happened yet."

In the Intelligent Room project, we are interested in creating spaces in which computation is seamlessly used to enhance ordinary, everyday activities. We want to incorporate computers into the real world by embedding them in regular environments, such as homes and offices, and allow people to interact with them the way they do with other people. The user interfaces of these systems are not menus, mice, and keyboards but instead gesture, speech, affect, context, and movement. Their applications are not word processors and spreadsheets, but smart homes and personal assistants. Instead of making computer-interfaces for people, it is of more fundamental value to make people-interfaces for computers. The need for doing this is not new, it has simply become more urgent:

"The great creators of technics [i.e., technology], among which you are one of the most successful, have put mankind into a perfectly new situation, to which it has as yet not at all adapted itself."

Albert Einstein in a tribute to Thomas Edison, October 21, 1929.

It is time for technology to start adapting to us. This may sound trite but only because it so obviously true.

Sounds Great, but How?

We have built two Intelligent Rooms in our laboratory, where our approach has been to give the rooms cameras for eyes and microphones for ears to make accessible the real-world phenomena occurring within them. A multitude of computer vision and speech understanding systems then help interpret human-level phenomena, such as what people are saying, where they are standing, etc. By embedding user-interfaces this way, the fact that people, for example, tend to point at what they are speaking about is no longer meaningless from a computational viewpoint and we can (and have) built systems that make use of this information.

Coupled with their natural interfaces is the expectation that these systems are not only highly interactive—they talk back when spoken to—but more importantly, that they are useful during ordinary activities. They enable tasks historically outside the normal range of human-computer interaction by connecting computers to phenomena (such as someone sneezing or walking into a room) that have traditionally been outside the purview of contemporary user-interfaces. Thus, in the future, you can imagine that elderly people's homes would call an ambulance if they saw anyone fall down. Similarly, you can also imagine kitchen cabinets that automatically lock when young children approach them. Some sample scenarios with the Intelligent Room that run today, reflecting both our interests and our funders’, are presented in the sidebar.

Some Pictures of the Intelligent Room.

The Layout of My Office

Laser Pointing with an Interactive Map

The most important factor in making intelligent rooms possible in recent years has been the novel viability of real-time computer vision and speech understanding. AI, and computer science more generally, have experienced something of a renaissance in the past decade, with many research areas blossoming almost entirely due to the sudden and unexpected availability of inexpensive but powerful processors. It is now entirely within the realm of possibilities to have literally a dozen computer vision systems as components of a larger project, the suggestion of which would surely have raised more than a few skeptical eyebrows in the not-too-distant past.

Computer vision and speech understanding are among the preeminent members of a class of research problems known as being AI-hard; namely, they are as difficult to solve as anything else we don't yet know how to do. When we started working on our lab's first Intelligent Room, we expected the vision and natural language subsystems would require the overwhelming majority of our intellectual effort. What was not obvious from the start was the amount of forethought that would be required to integrate the room's myriad subsystems and from them produce a coherent whole. Building a computational system – one that had literally dozens of hardware and software components — that allowed its subsystems to not only interoperate but leverage off one another eventually emerged as the Intelligent Room's chief research problem.² In this, we have not been alone; finding some way of managing similar systems and moving data among their components was the foremost difficulty raised at the previously mentioned Intelligent Environments Symposium.

In some regards, our solution to this management crisis has been a bit drastic: we created a new programming environment, called Metaglue, to meet the room's fairly unique computational needs, in which (at last count) the room's 80 software components are distributed among a dozen workstations.³ We also formulated some general principles for creating Intelligent Rooms, which we now adhere to with religious fervor.⁴ These include:

Scenario based development.

We initially spent a great deal of time designing overly complex sensory systems for the room, without much thought regarding how we would eventually use them—not a good idea. In the end, when we started thinking up room demos, i.e., scenarios, to show off our work, what we wanted to do our sensors didn't support. What our sensors supported, we didn't want to do. In particular, the sensing needs required by our desired demos were actually simpler, albeit different, from what the complex computer vision systems we had created provided. Much time could have been saved had we thought ahead.

Do not scavenge.

There is an enormous temptation when designing an Intelligent Room to try and incorporate all your friends' thesis work into it. How can you resist adding the latest state-of-the-art system that does [your favorite idea]? However, there is a good chance systems not intended to work together will refuse to do so smoothly and almost surely will be unable to take advantage of each other's capabilities. (See the next point.) How components will integrate into the overall system must be taken into account when the individual components are themselves are being designed.
Systems should leverage off each other.

Since nothing is perfect, particularly in AI, throw everything you can at a problem. In the Intelligent Room today, the computer vision systems communicate with the speech recognition systems. A strange mix you might think, but we have found that as you approach something, such as a projected map, you are more likely to talk about it. Thus, by visually locating a person in the room, we can cue the speech recognition system with information about what they are likely to say. In this way, we get higher speech recognition accuracy. This principle generalizes and many subsystems in the Intelligent Room intercommunicate with one another for similar reasons.

Big Brother, post-1984?

It is quite easy to trace almost all work in the intelligent environments back to the very influential Digital Desk project at Xerox PARC in the late 80s and early 90s, which was among the first user interfaces to directly observe people manipulating real objects. Using a video camera, it watched people reading real paper documents on the surface of a real desk – highlight something with your finger and the system proceeded to scan in the delineated text. Surprisingly, however, the very first intelligent environment was proposed in the late 18^th century by the well-known British philosopher and would-be prison warden, Jeremy Bentham. Bentham designed the Pantopticon, an Orwellian structure in which a hive of rooms (or cells) could be kept under the constant scrutiny of an unseen Observer.⁵ Denizens of the Panopticon, which Bentham proposed could be either inmates, the insane, or students, would never be precisely sure when they were being observed by the central Observer, namely the warden, doctor, or graduate advisor. Order would be maintained exactly because observed infractions might be harshly punished at some unknown later date.

As pointed out by Bentham, and subsequently elaborated upon by Foucault, enormous power is conferred on the Observer by the observed, and this may seem the most serious objection to widespread introduction of intelligent environments.⁶ The potential for abuse is frightening and should certainly give any technology enthusiast or futurist pause. However, this need not be a fatal objection, and I think it is premature to address at present. Given that we have no clear idea what types of new sensing technologies will be developed and deployed in the future, worrying about security now is therefore somewhat pointless – whatever privacy-guaranteeing techniques are developed for today's technology will almost surely be irrelevant for tomorrow's. More importantly, fear of misuse is no reason not to push for something that has the potential to so greatly revolutionize our lives.

In the end, it will come as no great surprise if the widespread acceptance of intelligent rooms, homes, cars, etc., comes as much from clever marketing as from clever security. If someone were to propose filling your home with microphones that anyone in the world could listen to anytime, day or night, you might very likely shudder in horror. Yet, your home is filled with microphones, every telephone has one. But perhaps you object, "Phones can’t be abused like that! Other people can hear me only when I let them!" Aha! It sounds like you've already been socialized to accept telephones; it is quite difficult to now view them as a realistic threat. I think it is quite reasonable to expect that someday your children will be similarly comfortable inside their intelligent homes.

Coen, M. (ed.) Proceedings of the 1998 AAAI Spring Symposium on Intelligent Environments. AAAI TR SS-98-02. 1998.

Coen, M. Building Brains for Rooms: Designing Distributed Software Agents. In Proceedings of the Ninth Conference on Innovative Applications of Artificial Intelligence. (IAAI97). Providence, R.I. 1997.

Phillips, B. Metaglue: A Programming Language for Multi-Agent Systems. M.Eng. Thesis. Massachusetts Institute of Technology. Cambridge, MA. 1999.

Coen, M. Design Principles for Intelligent Environments. In Proceedings of The Fifteenth National Conference on Artificial Intelligence. (AAAI98). Madison, Wisconsin. 1998.

Bentham, J. The Panopticon Writings. Verso, London. 1995.

Foucault, M. "The Eye of Power," Power/Knowledge. Pantheon Books, New York, 1981.