CS 527A, Spring 2002, Homework 5


You are expected to complete 100 points worth of homework problems. This homework is due on Wednesday April 10 with the standard late policy applying.

As another option there are two 150 point projects that count for both Homework 5 and Homework 6 that you can choose. If you select either of these projects then by April 10 you must demonstrate to me that you have made significant progress towards completing your project but the homework will not be due until Wednesday April 17. A signed cover sheet for Homework 5 must be submitted with your homework.


  1. (20 pts) Here we look at the problem of boosting the confidence parameter (delta) in the PAC model. Suppose that you are given a PAC-like learning algorithm A such that with probability at least 1/10, algorithm A returns a hypothesis that has error at most epsilon with respect to the distribution D over X (from which both the training data and test data is drawn). Give a PAC algorithm A' that uses A as a black box (i.e. you cannot modify A itself) that will with probability at least 1 - delta return a hypothesis that has error at most epsilon with respect to distribution D. Be sure to give both the time complexity and sample complexity of A' as a function of the time complexity, TA, and sample complexity SA of A. Also prove that A' meets all requirements for being a PAC algorithm assuming that TA and SA are polynomial in the number of bits to encode an example, the number of bits to encode the target concept, and 1/epsilon.

  2. (20 points) We have seen in class that an algorithm that is capable of finding a hypothesis consistent with a given set of labeled examples can be turned into a PAC learning algorithm. Argue that the converse is true: a PAC learning algorithm can be used (with high probability) to find a hypothesis consistent with a given set of labeled examples.
    Hint: Show how to define an appropriate probability distribution on the given set of examples, and how to set epsilon, so that the PAC learning algorithm returns a hypothesis consistent with the given set of examples with probability at least 1-delta.

  3. (20 points) Consider the space of instances X corresponding to all points in d-dimensional space. Compute the VC dimensions for the following hypothesis spaces and clearly explain why your answer is correct.

  4. (20 points) In this problem we consider the concept class of monomials.

  5. (50 points) In this problem you'll compare the bounds of a PAC algorithm with what is found experimentally for the problem of learning rectangles in the plane. You will need to write a procedure that takes as input a set of positive and negative examples and returns a rectangle consistent with this data. Also, you will need to write a procedure that randomly generates a point in the plane and labels it based on a target rectangle. Generate a variety of target rectangles at random corresponding to different rectangles in the plane and try some different distributions besides just the uniform distribution. For three different choices for the distribution over the examples space X, plot the average generalization error (over about 100 different target concepts) as a function of the number of training examples, m. You can use a random 1000 examples (drawn from the same distribution used to generate the training data) to measure the generalization error. Along with showing the point also, show a 95% confidence interval. On the same graph plot the theoretical relationship between epsilon and m for delta = 0.95. How close do they match? Give an explanation for your findings.

  6. (40 points) A membership query is designed to model the ability to experiment. For instance space X and any x in X, MQ(x) returns the correct label for x. A monotone DNF formula is of the form t1 v t2 v ... v tk where each term ti is a conjunction of any subset of the n Boolean variables x1, ..., xn where no negations can be used. For this problem you are to give algorithm to learn any monotone DNF formula with k terms over n boolean attributes that uses a polynomial number of MQs and makes a most k mistakes in the mistake bound model of learning. As part of your analysis you should derive a bound for the number of MQs made (as a function of k and n).

    Hint: It will be helpful to think of the domain X as a lattice (or hasse diagram) where 11...11 is at the top and 00....00 is at the bottom and the children of each node are are examples that can be reached by changing a 1 to a 0 in any bit position. For example when n=3, you have the following lattice:

    Think about what properties hold about how these examples can be labeled as + or - when labeled according to a monotone DNF formula.

  7. (30 points) In this problem we considered a simple case of learning with queries where the feedback can be erroneous. The learner and adversary agree on a number n, and then the adversary thinks of a number between 1 and n, inclusive. The learner must find out which number the adversary has selected by asking questions of the form, ``Is your number less than t?'' for various t. A binary-search approach allows you to ask at most log2 n questions before finding the number. To make this an interesting problem, suppose that the adversary is allowed to incorrectly respond to at most one question. How many questions must the learner now ask? (A bound of 3 log2 n is easy: the learner can just ask each question three times and take the majority vote of the adversary's responses. Also it is fairly easy to extend this idea to get a bound of 2 log2 n + 1.)

    For the problem you need to design a learning algorithm that uses a number of queries of the form log2 n + f(n) where f(n) grows asymptotically more slowly than the logarithm function (i.e. f(n) = o(log n)). Be sure to clearly describe your algorithm and prove that it satisfies the requirements of the problem.

  8. (20 pts) In this problem we consider r-of-k threshold functions over n Boolean attributes. For a chosen set of k (k <= n) variables and a given number r (1 <= r <= k), an r-of-k threshold function is true if and only if at least r of the k relevant variables are true. Assuming that both r and k are unknown to the learner, show that the class of r-of-k threshold functions can be learned in the mistake-bound model using the halving algorithm. In your analysis you should give the mistake bound obtained by your algorithm.
    Recall the Binomial Theorem: [sumk = 0 to n C(n,k)] = 2n.

  9. (30 points) Consider the hypothesis class H of "regular, depth-2 decision trees" over n Boolean variables. A "regular, depth-2 decision tree" is a depth-2 decision tree (a tree with four leaves, all distance 2 from the root) in which the left child and right child of the root are required to contain the same variable.

  10. (30 points) Read one of the following papers and write a paper critique follwing these guidelines. You will be required to have a conference with Dr. Goldman to discuss the paper and part of your grade will be based on this conference.

  11. CHOOSE YOUR OWN ADVENTURE. You can propose any additional homework options (or variations of those given above) to Dr. Goldman. If approved a point value will be given.

  12. (10 points) Simulate the DFA algorithm covered in class to learn the following DFA. Show the classification tree and hypothesized DFA at each step of the algorithm.