Available HERE starting at 5/8/2012 at 10am.
Due 5/9/2012 at noon in the TA's departmental mailbox or via email to the TA. NO LATE EXAMS WILL BE ACCEPTED!
Q: What system are we referring to for 2c? Just the general HJB equation for LQR with gaussian noise?
A: Yes, I'm referring to the HJB equation for LQR w/ Gaussian noise.
Q: On Q1 section 2, are we supposed to represent m_t in the form of x_t or y_t and z_t ? Is linearized about m_t JUST means A(m_t) and c(m_t) are linear function of m_t ?
A: No, m_t is the state about which you are linearizing the system. No, linearized about m_t means that you are calculating A(m_t) and c(m_t) using a taylor expansion.
Q: i am a little puzzled regarding q.7 (pdf question). the p(x) in the diagram has a max value of 2 but p(x) should always be <= 1. please clarify.
A: Since this is a probability density function (a pdf) over a real-valued space, the probability must integrate to one -- this does not mean that a pdf cannot take values greater than one.
Q: for q3 (cliff and states), when calculating the expected reward, should we be considering the infinite horizon case (i.e. with equilibrium achieved) or the instantaneous case (i.e. considering just the immediate transitions) ? please clarify.
A: This is a good question. I forgot to say in this problem that the expected reward is to be calculated over a three-timestep planning horizon. That is, tell me what the expected reward is in each state assuming that you will execute the next three steps completely randomly. You get reward=+1 whenever the system moves into the goal state.
Q: In question 3, does execution completely stop when the system moves into a cliff state? For example, if the system is in state 1, and walks into a cliff on the next state, would the probability of all other states in this case be 0. So what I am really asking is, do we have to factor in the probability that for each state, we did not walk into a cliff on the last one? Also, it is unclear in the problem which state execution actually begins in. Can it start in any of the three points?
A: Yes, execution stops when the system moves into the cliff state. But, you do need to factor in the probability of reaching the goal state on the second step if you did not walk into a cliff on the first step. For example, suppose you start in the middle state. You have a 0.5 chance of moving either upwards or downwards into the cliff. Once this happens, there is no longer any chance of reaching the goal state. The question is to find the expected reward as a function of state over a three time-step planning horizon (see answer to the last question on the Q/A webpage). One way to think about this is to calculate the probability of reaching the goal state within three time steps.
Q: You say that you get a reward of 1 when moving into the goal state, and then execution ends, but what about the possibility that you start in the goal state? Assuming this is possible, should we account for the "right" movement possibility? IE imagine there is another column of cliff to the right of the current map? Or adjust the probability to a 1/3rd chance of going up down and left and ignore the right.
A: You don't have to worry about the possibility of starting in the goal state because I asked you to expected reward as a function of the other states. If the system starts in the goal state, it gets reward and execution immediately ends. You only need to consider the possibility of starting in one of the other states and arriving in the goal state.
Q: Is it safe to assume that the log in the question is base 10?
A: It doesn't matter. Feel free to assume that.
Q: I'm really confused about what you're looking for in question 2. For part a are we supposed to write down the formulas and say they're not the same? I think I missed the day we went over LQR with Gaussian process noise, so I'm kinda confused.
A: For a little bit of information on LQR for stochastic systems, there's a slide on Pieter Abeel's slides here. You should be able to figure out the rest (or look it up on the Internet).
Q:how do we deal with the non linear dependency equation given by z_(t+1) = z_t - ((z_t * z_t)/ (z_t + 0.5)) ? how do we determine the function which combines both the update equations when one of the equations(the one above) in non linear ?
A: This question is essentially asking you to find a first-order Taylor expansion of the non-linear dynamics. This is what I mean when I ask you to "linearize" the system about m_t.
Q: On Question 6, the ending time tau constrain x_T != x1 seems werid, since we can minimize the problem by letting tau = 2 and v1 near but not equal to 0 (seems not even like a optimize problem), is there any typerro?
A: Okay, now I get your point. The question specifies a particular x_1 and x_T. As long as x_T != x_1, then there is a tradeoff between a small velocity and a long time or a short time and a large velocity.
Q: I was wondering if for question # 4 I can just find the second derivative of -log(x) and then show that the graph of the second derivative of f(x) is greater than or equal to zero for all value of x>0 and since the second derivative takes values greater than 0 it is a convex function.
A: Since the question asks for the second derivative, you need to give that. If you can find the first derivative, why can't you just differentiate once more?
Q: On Q9, Is the expression "to within a constant factor" means represent the function by constants parameters? On Q9 b) Can we use delta function to repersent the probability density function? As in case there is a peaky value that is not suit for PDF to express in any other way.
A: No, that phrase means that you don't need to solve for the normalization constant explicitly. You only need to find an expression proportional to the pdf -- not the pdf itself. Yes, using a delta function is fine. It's also fine to express the pdf using an if-otherwise expression.
Q: Question 1b. i understand that we need to linearize about m_t. however what i am asking is how do we get the function which we use taylor expansion upon. how do we get the function from the non linear equation we have ? once we have the function, the rest part i.e. partial differentiations is pretty straightforward.
A: Sorry if I was being unclear. I have given you the non-linear process dynamics in terms of y_t and z_t. You need to combine these equations into a single equation in terms of x_t. Then you need to take the Taylor expansion.
Q: In-person question
A: For Question 1b, you're only looking for a first-order Taylor expansion.
Q: In-person question
A: Question 7c: The question is to provide pseudocode. You need to list the three or four steps in importance sampling that you would need to execute in order to estimate the expectation.
Q: In-person question
A: Question 9a: This first part is really similar to part of what you had to do in homework 5.
Q: Question 8: Are we to assume what actions the agent will take, similar to what you had given in assignment 4?
A: Yes, in this question, I told you what the actions are.