"Multimodal Interaction with an Autonomous Forklift"

Matthew Walter
Postdoctoral Associate
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology

The following documents the presentation that I gave at the 2010 Human-Robot Interaction (HRI) Conference in Osaka, Japan. The presentation focused on the command interface and bystander interaction capabilities of MIT's Agile Robotics project that Andrew Correa, Luke Fletcher, Jim Glass, Seth Teller, and Randall Davis, and I discuss in our corresponding 2010 HRI paper.

Correa, A., Walter, M.R., Fletcher, L., Glass, J., Teller, S., and Davis, R., Multimodal Interaction with an Autonomous Forklift. Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), Osaka, Japan, March 2010.
[bibtex] [pdf]

Abstract

In this talk, I describe a multimodal framework for interacting with an autonomous robotic forklift. A key element enabling effective interaction is a wireless, handheld tablet with which a human supervisor can command the forklift using speech and sketch. Most current sketch interfaces treat the canvas as a blank slate. In contrast, our interface uses live and synthesized camera images from the forklift as a canvas, and augments them with object and obstacle information from the world. This connection enables users to "draw on the world," enabling a simpler set of sketched gestures. Our interface supports commands that include summoning the forklift and directing it to lift, transport, and place loads of palletized cargo. We describe an exploratory evaluation of the system designed to identify areas for detailed study.

Our framework incorporates external signaling to interact with humans near the vehicle. The robot uses audible and visual annunciation to convey its current state and intended actions. The system also provides seamless autonomy handoff: any human can take control of the robot by entering its cabin, at which point the forklift can be operated manually until the human exits.

Slides

Presentation slides (see below for links to referenced movies)

pdf (15MB)


Creative Commons License

Media

(2009_11_30_agile_short.mp4)

[mp4 (h264, 67MB)]

This video demonstrates the multimodal interaction mechanisms whereby a human supervisor conveys task-level commands to the robot via a hand-held tablet. These task-level commands include: directing the robot to pickup a desired pallet by circling it in an image from one of the robot's cameras; summoning the robot to a particular destination by speaking to the tablet; and directing the robot to place a pallet by circling the desired location on the ground or a truck.

(2009_06_09_summon_truck_placement)

[mp4 (h264, 20MB)]

This video demonstrates an "end-to-end" scenario in which the bot, carrying a loaded pallet is directed to Issue via a spoken command and subsequently requested to place the pallet on a flatbed truck. A standard forklift operator commands the bot.

Last Modified: February 21, 2013

MIT