The Brain

 

Aryan's brain helps it to exhibit intelligence behaviors by coordinating all of its non-intelligent parts discussed before. Its brain is nothing, but a program running on a standard PC with 200MHz Intel Pentium processor and under RedHat Linux. The whole code is written by me in c language. The program runs in X Window environment to ease monitoring Aryan's perception. Linux and c were chosen for their flexibility and faster execution

A simple and modular brain architecture is proposed for Aryan. Although it is lightweight and easy to implement. It covers essential capabilities that Aryan needs for exhibiting simple emotional and interactive behaviors. A block-diagram representation of this architecture is shown in below. 

 

 

Visual sensors capture visual information and represent them in an appropriate
form for digital processing.
Linux kernel provides an easy to use programming interface called "Video for Linux" or "V4L" for short, which supports a large set of capture cards including mine. It is also able to access multiple cards simultaneously.

The ultimate goal of low-level vision is to detect and track regions that are likely (not certainly) to be a face or hand based on a rough inspection. This is achieved by first detecting motion and skin color. Then, these are fused to extract salient regions. A bounding box is superimposed about each salient patch. These boxes are are tracked over time. Low-level vision is decomposed to submodules, as shown below, to efficiently process massive visual data. 

 

 

High-level vision explores tracked boxes and attempts to recognize their content. We employed a neural network for this purpose because of its automatic learning, good generalization and fast recall. Due to the key role of hand, face and facial features in natural human communication, we trained Aryan's neural networks by these samples.

Below are shown some recognition results. Note that recognition result is indicated by a small word, either F or H corresponding to Face and Hand. Moreover, if a box is recognized as face, facial features are shown by different colors. Yellow for eyebrows, cyan for eyes and magenta for mouth.

 

Attention enables Aryan to respond in real-time by concentrating its limited or computational resources on the information of interest. Psychophysical studies show that attention is not limited to a single object. Therefore, we provided the ability of tracking multiple items. However, the most motivating item determines the focus of attention.

High-level vision is a slow and time-consuming module. However, attention may manage it in an efficient manner. This is achieved by introducing a curiosity factor, which is high at the beginning and decays exponentially over time. Thus, when a new item is detected high attention is paid to it. If recognition fails frequently, longer delays take place between successive recognition attempts.

Aryan's motivation deals with instantaneous high-level goals. Currently the robot’s only motivation is to behave socially. To achieve this, Aryan's attention must be biased toward the item that best satisfies this goal.

Emotions play a key role for improving believability in an interactive robot. Aryan's emotion system is a simple state machine to take it to the appropriate emotional state based on the previous state, an internal counter value and its visual stimulation. This is shown below

 

Surprise decays over time, while the rest remain as long as their corresponding stimulation survives. When Aryan is left alone, it becomes angry. In this state it occasionally issues an involuntary saccade that to make large changes in its sight, hoping to find a mate in the new view.

 


By directing the robot’s gaze to the visual target, the person interacting with the robot can accurately use the robot’s gaze as an indicator of what the robot is attending to. This greatly facilitates the interpretation and readability of the robot’s behavior, since the robot reacts specifically to the thing that is it looking at.

First a coarse estimation of eyes vergence is computed analytically, using
simple trigonometric manipulation and from the image of the middle camera. From this situation, a finer one can be obtained using the images of each eye. Similar to coarse vergence, joint tilt angle of the eyes is also computed.

Since the range of eye movements is limited, it cannot track an item that has large movements. Thus, compensatory neck movements must be performed. Currently neck motions are of fixed amount, about 30 degrees. This is shown in the following figure.

 

 


Back to Aryan's Page                        Back to the Main Page