[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Long term prediction with layers of recurrent neural network systems
Hello all,
I have an idea about how to make long term prediction possible. I would like
to have some feedback about it. If it has been tried/thought of before and so
on.
The idea in its simpest form is:
A system consisting outof 2 neural networks tries to predict input at each
time t:
1. Abstraction(C, I) -> new C
2. Prediction(new C, I) -> predicted next I
C and I are vectors of bits. I is external input that is set each cycle. C is
the internal state (C = context). Prediction uses that to predict the next
input I. Abstraction contains one layer of neurons; it is a recurrent net.
Prediction has one hidden layer, and an output layer, and is feedforward. The
error is the difference between the predicted I and the real I. Prediction is
trained with backpropagation, and Abstraction with a version of RTRL.
A known problem with recurrent neural networks is that the gradients vanish
too fast. It is impossible to train such a system on long causal connections
(>100 cycles/steps or so). There have been made improved systems such as LSTM
that can handle longer durations between cause and effect (>1000), but I
don't think that there will ever be a system that can predict over the time
span of years, weeks, days, or even hours, in realistic environments.
What is wrong with the strategies I have read about is that systems are always
designed to always predict the details, the raw input. They all predict in
too precise representations for long term prediction.
What people do however is predict in global general terms, or more vaguer
terms for long term prediction. Questions like 'what color do I see over 10^7
s ?', 'How high of the ground will my foot be 10^7 s?' are not asked,
answers can most of the time not be given, and furthermore, most of the time
the answers are unimportant for people to achieve their goals, on the long
term. People need to know things like whether they have a job in the coming
years, when they need to be at the train station to pick someone up next week
etc etc
People handle the future in abstract general terms. The farther away the
prediction is about, the more general are the terms they are stated in, on
average. No detail perception is predicted, but in what class a sequence of
future perception will fall (more far away future -> bigger class).
Only about the very near future people need to predict very detailed. If one
is walking, at each moment of a step one needs to know how muscles will move
to the leg exactly, in order to go forward and to keep balance.
So, back to the system. What is needed is a classification mechanism. Similar
sequences of inputs should lead to the same classification.
In the neural network system there is already such a mechanism. Input
sequences that lead to the same state of C can be seen a similar. A state of
C can be seen as a classification of the system's recent past input.
C is the basis for prediction (together with the most recent input). If
prediction succeeds C can be seen as an adequate lossy datacompression of the
past inputs. The mechanism of making C outof the input sequence can be seen
as abstraction, since the information that is important is kept and the
information that is unimportant is filtered out. The training of the network
Abstraction can be seen as concept formation.
So C is a classification. Now we can start to predict the states of this C.
For this an identically structured system is build on top of the low level
system. C will be its input: C base level -> I second level. This system will
try to predict and classify its input also. And so we will get more abstract
predictions and more abstract classifications.
This second system will run 'slower', only after a prediction failure on the
base level the C will be passed above. The C can be seen to contain all the
information about the future (together with its coupled detail information I)
until a prediction failure. A prediction is seen as failed when the error is
higher than the average error of the system (or something like that).
If the base level predicts well say 10 steps in a row on average, the second
level will run 10 times slower than the first. The second level will then
have a farther time horizon. If it also can predict 10 steps in a row that
will equal 100 steps at the base level. Of course such a prediction is more
general/vaguer, it is less detailed. Only in what class the future input
sequences will fall is predicted.
Since the second system is functionally identical to the first, the trick can
be applied again and again, recursively, by building system upon system.
getting more and more abstract classifications and abstract predictions.
If there are 9 system in a hierarchical structure that each predict 10 steps
in a row succesfully on average, The time horizon of the top level prediction
is 10^9 steps. If a step is 0.01 sec the range is 10^7 sec, thus being in the
order of a year.
So this seems to me to make long term prediction possible, albeit not on the
detail level. The novel aspect about my system seems to me to be the
abstraction mechanism and concept formation. I haven't read anything alike
(but maybe other people have?) in the AI literature. But this is a necessary
ingredient for a intelligent AI agent, I think, since it is a necessary
ingredient for the intelligence of people. It is what makes people so much
more intelligent over present AI systems.
The abstract states can be the input for action systems (also neural
networks), one at each level, that calculate actions. They can calculate a
goal that a lower level must achieve: the lower level must get the C to be
equal to that goal. The actions of the lowest level are concrete actions in
the environment through effectors ('muscles'). The whole agent can be trained
by an external reinforcement signal. This signal is passed up the hierarchy
as an average over its wait period till next activation.
A subsystem:
Abstraction(C, I, O) -> new C
Action(C, I, O, G) -> new O
Prediction(new C, new O, I) -> predicted I
Evaluation(C) -> reinforcement signal
where O is the action of the subsystem and G is the goal it gets from the
subsystem above. Action and Evaluation are feedforward neural networks.
Evaluation is trained to predict how state C is evaluated. It is used to
train Action when there is no higher level or when the higher level is not
yet functioning (well). Otherwise Action is trained to get C equal to G
(using the other networks calculate the effects of the action O).
The most important problem for AI to build intelligent agents is to get them
to plan successful on the long term, to get their intelligence to humanlike
level in other words. Other AI problems have been more or less solved, I
think. My idea seems to me to be promising to be a solution of that last
problem.
Any comments?
bye,
Arnoud Michel