Due 2/13/2012.
This assignment is to code the Cliff Walking example (Example 6.6) in Sutton and Barto, "Reinforcement Learning, an Introdution", Chapter 6. You should solve the problem using both SARSA and Q-learning. Use e-greedy exploration with epsilon=0.1 (the agent takes a random action 10 percent of the time in order to explore.)
The programming should be done in MATLAB. Students may get access to MATLAB here. Alternatively, students may code in Python (using Numpy). If the student would rather code in a different language, please see Dr Platt.
Students should submit their homework via email to the TA (zihechen@buffalo.edu) in the form of a ZIP file that includes the following:
1. A PDF of a plot of gridworld that illustrates the paths found by q-learning and SARSA. It should look like the diagram in Figure 6.13 in SB.
2. A PDF of a plot of reward per episode for SARSA and q-learning.
3. A text file showing output from a sample run of your code.
4. A directory containing all source code for your project.