f(r(s,a)) = r(s,a) + c for a constant c
f(r(s,a)) = c * r(s,a) for a constant c
f(r(s,a)) = log(r(s,a))
and so on. By interesting, I mean that you want
the property that
if r(s,a) < r(s,a') then f(r(s,a)) < f(r(s,a'))
If you don't have this property then clearly
the optimal policy will be changed (in general).
You could also consider transformations that also
involve other rewards.
/*** Parameters for simulation ***/
#define GRAVITY 9.8
#define MASSCART 1.0
#define MASSPOLE 0.1
#define TOTAL_MASS (MASSPOLE + MASSCART)
#define LENGTH 0.5 /* actually half the pole's length */
#define POLEMASS_LENGTH (MASSPOLE * LENGTH)
#define FORCE_MAG 10.0
#define TAU 0.02 /* seconds between state updates */
#define FOURTHIRDS 1.3333333333333
x is the cart's position (float)
x_dot is the cart's velocity (float)
theta is the pole's angle (float)
theta_dot is the angular velocity of the pole (float)
Here is a method to update the state variables according to
what they would be TAU seconds later
float xacc,thetaacc,force,costheta,sintheta,temp;
if you are stationary
force = 0
if you are moving forward
force = FORCE_MAG
if you are moving backwards
force = -FORCE_MAG
costheta = cos(theta);
sintheta = sin(theta);
temp = (force + POLEMASS_LENGTH * theta_dot * theta_dot * sintheta)/ TOTAL_MASS;
thetaacc = (GRAVITY * sintheta - costheta* temp)
/ (LENGTH * (FOURTHIRDS - MASSPOLE * costheta * costheta/ TOTAL_MASS));
xacc = temp - POLEMASS_LENGTH * thetaacc* costheta / TOTAL_MASS;
/*** Update the four state variables, using Euler's method. ***/
x += TAU * x_dot;
x_dot += TAU * xacc;
theta += TAU * theta_dot;
theta_dot += TAU * thetaacc;
}
There is a Bayes net that will satisfy both of these. Use the example of page 6.9.1 to give some guidance. Also, don't worry if the edges you put in the Bayes net do not really correspond to the causality that you would expect. If you satisfy the two conditions above then you have given the Bayes belief network that represents the conditional independence assumptions of the naive Bayes classifier.
Be sure to give the conditional probability table associated with the node Wind.
P(a|f,t)P(f)P(t)+P(a|f,!t)P(f)P(!t)+P(a|!f,t)P(!f)P(t)+P(a|!f,!t)P(!f)P(!t)
= .5*.01*.02 + .99*.01*.98 + .85*.99*.02 + .0001*.99*.98
CC = /pkg/gnu/bin/gccThen to compile it use the command
/pkg/gnu/bin/makeAlso, in svm_base.c you may need to change the call to sqrtf to sqrt.
The files README and INSTALL give additional guidance.
CC = /pkg/gnu/bin/gccThen to compile it use the command
/pkg/gnu/bin/makeFinally, I have edited the training and testing data so it has the appropriate paths for you. Save trainset.zip into the same directory where you put the code and faces_4.tar before you used tar xvf faces_4.tar. Then use
unzip trainsetand you will have the training and testing sets ready. You should know be able to follow the directions given. Note that xv is found in pkg/X11/bin/xv. You should be able to use it by just typing xv followed by one of the images.
Each of the hidden units has a single real-valued output. When visualizing what the hidden units are doing, you can represent each hidden unit as a 30x32 grid of weights which correspond to the weights from the inputs to the hidden unit and by a vector of 30 weights which correspond to the weights to the output units. However, each hidden unit itself produces a single output. There are different options as to whether you directly use the dot product of the vectors w and x or if you threshold it (or in some other way guarantee that the outputs are between -1 and 1).
To help be sure this is clear, let's compute the total number of weights in the ALVINN system:
# weights from input layer to hidden layer = 960*4 = 3840 # weights from hidden layer to output layer = 4*30 = 120 # "w_0" weights (one per hidden and output unit) = 34 So the total number of weights is 3840+120+24= 3994.What would happen if the hidden units were removed and instead the input and output layers were directly connected? Then you would need
960*30 + 30 = 28,830 weightssince each input would be connecting to each output (960*30) and there are 30 "w_0" weights. This is the value of the hidden unit. As we talked about, the more hidden units added, the more expensive the training but you have the ability to create more "intermediate" features and hence if they are needed then you can obtain better accuracy. So you want to have as few hidden units as you need to represent the target.
As I mentioned, each weight will be initialized to a random value between -1 and 1 (or sometimes a smaller range like -.1 to .1 is used). Next class we will talk about how to adapt what we saw today for updating the weights for a single neuron to do the update for a full neural network.
Take a look at the Figure 4.1 (on page 84) which shows the final weights for one of the hidden units of ALVINN, and Figure 4.10 (on page 113) which shows the weights for all three hidden units of the face recognition network after 1 iteration and then 100 iterations of training. (In Figure 4.10, they use the top left corner of the 30x32 weights from the input to hidden layer to show "w_0". For the weights from the hidden layers to the ouputs "w_0" is shown as the leftmost weight followed by the 3 weights to the output units.) I think looking at this will help you understand the role of the hidden units.
w_i = w_i + eta (V_train(b) - V_hat(b)) x_i / Z
The only change is that x_i in the formula given in the text
is being replaced by x_i / Z. (That is, each x_i value is
being divided by Z.)Let me briefly explain why this should be done. The idea of the LMS rule is that after the update the value of V_hat(b) should be
V_hat(b) + eta (V_train(b) - V_hat(b)).
So as an example , if eta = .1, V_hat(b)=10 and V_train(b)=100 then
you want the update to modify the weights so that with the new weights
V_hat(b) = 10 + .1 * 90 = 19.Furthermore, the idea is to adjust the weights based on the relative values of x_i. That is why you want the x_i in the update rule. Without the normalization constant Z the problem with the weights getting too large occurs.
However, with the adjustment notice that the sum over all i of x_i / Z = 1 and this is what guarantees that the total change in the value of V_hat will just be eta (V_train(b) - V_hat(b)) and hence it will always closer to the correct value by eta percent but can never overshoot it. This is what is wanted.
V_hat = w_0 + w_1 * x_1 + ... + w_n * x_nwhere n is the number of featuers. Another way to write this is
V_hat = w_0 * 1 + w_1 * x_1 + ... + w_n * x_nHere you can see that 1 fills the role of x_0.