Lab 3: Linear Regression and Functions with Multiple Outputs
In class on tuesday we talked about matrix operations. Today we are going to practice with some of these, then apply them to the problem of linear regression. Regression (or fitting, or least squares) is a method of fitting a line (or other function) through a set of points. In our problems today, we will make sample data and go through the steps of fitting a line to them, then you will be asked to design a function to fit data to a collection of sample points and plot the best fit line through them.

Problems

  1. First, lets create a collection of points where we know how the x and y values are related. You should keep track of these points with 2 arrays (called x,y) each with the same number of points. I would like you to create a list of points (x,y) where x varies from 1 to 20, and y is defined to be 2*x.
  2. Now, to make it interesting, lets add a little bit of random noise to our points. make an array an array n to be the same size as y using the "rand" command.
  3. use "help rand" to look up what kind of random numbers the rand function creates. We want to make sure that noise has a mean of approximately zero (otherwise the noise would be called biased). Now change the command you used to make unbiased noise.
  4. now make an array yPrime by adding your y array to your noise array, and make a plot of your x values vs. your yPrime values.
  5. Now, given the points x,yPrime, we'd like to solve for the linear relationship between these values. Solve for the "best fit" value a such that yPrime = a*x, using the matlab "/" operator
  6. Now we'd like to draw the line y = ax. Create values "yLine" using the a value and the x vector, then plot these values on the same plot as you plotted the (noisy) points. Check with the TA and show your plot
Part II Now we are going to write a function that computes the first order fit, and returns both the best fit line and the mean-squared error. The syntax for writing a fuction that has * two * outputs is:
function [a, err] = regression(x,y);
 
Using steps like you did above, write a function that:
  1. computes the best fit a (so that y is approximately a*x),
  2. uses that best fit to create "predicted y values" for each x value,
  3. computes the "error", the absolute value of the difference between the y values and the predicted y values.
  4. computes the "mean error", which is the mean of all the (errors).
  5. when your function finishes, it will "return" whatever values are in the variables "a", and "mse", so make sure you have assigned something to these variables.
check you function with the values of x,y that you created in the first part of this lab.

Now, we are going to use this to check the "small angle" approximation. This is a common approximation used in physics classes, that for small angles theta (when theta is close to zero), then sin(theta) is approximately theta.

Now we are going to test the small angle approximation. You might want to put this function into a script so that you can test, change, and re-run parts of this (to open a script, you might type "edit testSmallAngle" ), and then right the below lines there. For a script you don't put any of the "function .... " business at the begining.

for ix = 1:10
  a = ix./10;
  x  = "make 100 points between -a and a"; 
  y  = "compute the sin of those points";
  [a err] = regression(x,y);    % compute the regression line and the error 
  plot(x,y,'b');                % draw the points on the figure
  hold on;
  plot(x,a*x,'r-');             % draw the best fit line
  hold off;
  disp(a,err);                  % show what the fitting error is...
end
Check with the TA and show your plot