[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Long term prediction with layers of recurrent neural network systems

To: address@hidden
Subject: Long term prediction with layers of recurrent neural network systems
From: arnoud <address@hidden>
Date: Sat, 27 Sep 2003 12:59:23 -0700
Sender: address@hidden
Hello all,

I have an idea about how to make long term prediction possible. I would like 
to have some feedback about it. If it has been tried/thought of before and so 
on.

The idea in its simpest form is:
A system consisting outof 2 neural networks tries to predict input at each 
time t:
1. Abstraction(C, I) -> new C
2. Prediction(new C, I) -> predicted next I

C and I are vectors of bits. I is external input that is set each cycle. C is 
the internal state (C = context). Prediction uses that to predict the next 
input I. Abstraction contains one layer of neurons; it is a recurrent net. 
Prediction has one hidden layer, and an output layer, and is feedforward. The 
error is the difference between the predicted I and the real I. Prediction is 
trained with backpropagation, and Abstraction with a version of RTRL.

A known problem with recurrent neural networks is that the gradients vanish 
too fast. It is impossible to train such a system on long causal connections 
(>100 cycles/steps or so). There have been made improved systems such as LSTM 
that can handle longer durations between cause and effect (>1000), but I 
don't think that there will ever be a system that can predict over the time 
span of years, weeks, days, or even hours, in realistic environments.

What is wrong with the strategies I have read about is that systems are always 
designed to always predict the details, the raw input. They all predict in 
too precise representations for long term prediction. 
What people do however is predict in global general terms, or more vaguer 
terms for long term prediction. Questions like 'what color do I see over 10^7 
s ?', 'How high of the ground will my foot be 10^7 s?'  are not asked, 
answers can most of the time not be given, and furthermore, most of the time 
the answers are unimportant for people to achieve their goals, on the long 
term. People need to know things like whether they have a job in the coming 
years, when they need to be at the train station to pick someone up next week 
etc etc
People handle the future in abstract general terms. The farther away the 
prediction is about, the more general are the terms they are stated in, on 
average. No detail perception is predicted, but in what class a sequence of 
future perception will fall (more far away future -> bigger class).       
Only about the very near future people need to predict very detailed. If one 
is walking, at each moment of a step one needs to know how muscles will move 
to the leg exactly, in order to go forward and to keep balance.

So, back to the system. What is needed is a classification mechanism. Similar 
sequences of inputs should lead to the same classification. 
In the neural network system there is already such a mechanism. Input 
sequences that lead to the same state of C can be seen a similar. A state of 
C can be seen as a classification of the system's recent past input.
C is the basis for prediction (together with the most recent input). If 
prediction succeeds C can be seen as an adequate lossy datacompression of the 
past inputs. The mechanism of making C outof the input sequence can be seen 
as abstraction, since the information that is important is kept and the 
information that is unimportant is filtered out. The training of the network 
Abstraction can be seen as concept formation.

So C is a classification. Now we can start to predict the states of this C. 
For this an identically structured system is build on top of the low level 
system. C will be its input: C base level -> I second level. This system will 
try to predict and classify its input also. And so we will get more abstract 
predictions and more abstract classifications.
This second system will run 'slower', only after a prediction failure on the 
base level the C will be passed above. The C can be seen to contain all the 
information about the future (together with its coupled detail information I) 
until a prediction failure. A prediction is seen as failed when the error is 
higher than the average error of the system (or something like that).
If the base level predicts well say 10 steps in a row on average, the second 
level will run 10 times slower than the first. The second level will then 
have a farther time horizon. If it also can predict 10 steps in a row that 
will equal 100 steps at the base level. Of course such a prediction is more 
general/vaguer, it is less detailed. Only in what class the future input 
sequences will fall is predicted.

Since the second system is functionally identical to the first, the trick can 
be applied again and again, recursively, by building system upon system. 
getting more and more abstract classifications and abstract predictions.
If there are 9 system in a hierarchical structure that each predict 10 steps 
in a row succesfully on average, The time horizon of the top level prediction 
is 10^9 steps. If a step is 0.01 sec the range is 10^7 sec, thus being in the 
order of a year.

So this seems to me to make long term prediction possible, albeit not on the 
detail level. The novel aspect about my system seems to me to be the 
abstraction mechanism and concept formation. I haven't read anything alike 
(but maybe other people have?) in the AI literature. But this is a necessary 
ingredient for a intelligent AI agent, I think, since it is a necessary 
ingredient for the intelligence of people. It is what makes people so much 
more intelligent over present AI systems. 

The abstract states can be the input for action systems (also neural 
networks), one at each level, that calculate actions. They can calculate a 
goal that a lower level must achieve: the lower level must get the C to be 
equal to that goal. The actions of the lowest level are concrete actions in 
the environment through effectors ('muscles'). The whole agent can be trained 
by an external reinforcement signal. This signal is passed up the hierarchy 
as an average over its wait period till next activation.
 
A subsystem:
Abstraction(C, I, O) -> new C
Action(C, I, O, G) -> new O
Prediction(new C, new O, I) -> predicted I
Evaluation(C) -> reinforcement signal

where O is the action of the subsystem and G is the goal it gets from the 
subsystem above. Action and Evaluation are feedforward neural networks. 
Evaluation is trained to predict how state C is evaluated. It is used to 
train Action when there is no higher level or when the higher level is not 
yet functioning (well). Otherwise Action is trained to get C equal to G 
(using the other networks calculate the effects of the action O).


The most important problem for AI to build intelligent agents is to get them 
to plan successful on the long term, to get their intelligence to humanlike 
level in other words. Other AI problems have been more or less solved, I 
think. My idea seems to me to be promising to be a solution of that last 
problem. 

Any comments?

bye,
Arnoud Michel
Prev by Date: Lightweight Languages 2003 (LL3) CFP
Next by Date: Dynamically scoped functions
Previous by thread: Lightweight Languages 2003 (LL3) CFP
Next by thread: Dynamically scoped functions
Index(es):
- Date
- Thread