CS 527A, Spring 2002, Homework 4


You are expected to complete 100 points worth of homework problems. This homework is due on Wednesday March 27 with the standard late policy applying.

A signed cover sheet for Homework 4 must be submitted with your homework.


  1. (20 pts) In this problem we look at instance-based learning.

  2. (20 pts) Suggest a lazy version of the eager decision tree learning algorithm ID3. Be sure to give a very clear description of your lazy algorithm. What are the advantages and disadvantages of your lazy learning algorithm as compared to the original eager algorithm. I'm expected a well thought out discussion on this.

  3. (10 pts) Consider the example application of Bayes rule in Section 6.2.1 of the text. Suppose the doctor decides to order a second laboratory test for the sample patient, and suppose the second test returns a positive result as well. What are the posterior probabilities of cancer and !cancer following these two tests? Assume that the two tests are independent.

  4. (10 pts) In the example of Section 6.2.1 we computed the posterior probability of cancer by normalizing the quantities

    P(+|cancer)*P(cancer) and P(+ | !cancer) * P(!cancer)

    so that they summed to one. Use Bayes theorem and the theorem of total probability (see Table 6.1) to prove that this method is valid (i.e. that normalizing in this way yields the correct value for P(cancer|+)).

  5. (10 pts) Draw the Bayesian belief network that represents the conditional independence assumptions of the naive Bayes classifier for the PlayTennis problem of Section 6.9.1. Give the conditional probability table associated with the node Wind.

  6. (10 points) Consider the concpet learning algorithm FindG, which outputs a maximally general consistent hypothesis (e.g. some maximally general member of the version space).

  7. (20 pts) Consider the Minimum Description Length (MDL) principle applied to the hypothesis space H consisting of conjunctions of up to n boolean attributes (i.e. monotone monomials). Assume that each hypothesis is encoded simply by listing the attributes present in the hypothesis, where the number of bits needed to encode any one of the n boolean attributes is log2 n. Suppose the encoding of an example given the hypothesis uses zero bits if the example is consistent with the hypothesis and uses log2 m bits otherwise (to indicate which of the m examples was misclassified--the correct classification can be inferred to be the oppositie of that predicted by the hypothesis).

  8. (100 pts) In this problem you use some provided code to explore how the naive bayes learning algorithm can be applied to text categorization. Here's the provided code. Here's the assignment from Tom Mitchell to help guide you.

    After running the provided install program you will need to edit the Makefile to modify the line that begins with "CC =" to be

    CC = /pkg/gnu/bin/gcc
    
    Then to compile it use the command
    /pkg/gnu/bin/make
    
    Also, in svm_base.c you may need to change the call to sqrtf to sqrt. There are many interesting variations to this. For example, you can combine NB with EM to make use of unlabeled data. If you are interested in any of these variations please talk to me.

  9. (20 pts) In this problem you will compute the posterior probabilities based on a given bayesian belief network and some partial observations.

    Consider the Fire Alarm example from the following applet except remove the attribute "reporting." For each of the following three sets of observations, show your computation for obtaining the posterior probabilities of all variables.

  10. (40 pts) The students in a machine learning course had the following semester averages:
    85,84,70,82,87,94,88,76,65,79,93,68,70,59,78,99,95,58,85,82,83,61
    
    Only 3 different letter grades, {A,B,C} will be assigned. Use the k-means algorithm to assign grades (with k=3) Explore how changing the initial values for the mean of the three clusters (using standard deviation of 1) will affect both the speed of convergence and also the resulting clusters. As just one possibility, you could suppose that the instructor begins the semester with the initial expectation that the mean grade for an A to be a 90, for a B to be an 80, for a C to be a 70. Try a lot of variations and report upon and discuss your findings.

  11. (30 pts) Read one of the following papers and write a paper critique. Please write the summary of the paper so that someone in this class who has not read the paper would understand what it was about at a high level and would understand one part at a deeper level.

  12. CHOOSE YOUR OWN ADVENTURE. You can propose any additional homework options (or variations of those given above) to Dr. Goldman. If approved a point value will be given.