CS 527A, Spring 2002, Homework 5
You are expected to complete 100 points worth of homework problems.
This homework is due on Wednesday April 10 with the standard
late policy applying.
As another option there are two 150 point
projects that count for both Homework 5 and Homework 6 that you can
choose. If you select either of these projects then by April 10 you
must demonstrate to me that you have made significant progress towards
completing your project but the homework will not be due until
Wednesday April 17.
A signed cover sheet for Homework 5 must
be submitted with your homework.
- (20 pts) Here we look at the problem of boosting the confidence
parameter (delta) in the PAC model. Suppose that you are given a
PAC-like learning algorithm A such that with probability at least 1/10,
algorithm A returns a hypothesis that has error at most epsilon with
respect to the distribution D over X (from which both the training
data and test data is drawn). Give a PAC algorithm A' that uses A as
a black box (i.e. you cannot modify A itself)
that will with probability at least 1 - delta return a
hypothesis that has error at most epsilon with respect to distribution
D. Be sure to give both the time complexity and sample complexity of
A' as a function of the time complexity, TA, and sample complexity
SA of A. Also prove that A' meets all requirements for being a
PAC algorithm assuming that TA and SA are polynomial in
the number of bits to encode an example, the number of bits to
encode the target concept, and 1/epsilon.
- (20 points)
We have seen in class that an algorithm that is capable
of finding a hypothesis consistent with a given set of labeled
examples can be turned into a PAC learning algorithm. Argue that the
converse is true: a PAC learning algorithm can be used (with high
probability) to find a hypothesis consistent with a given set of
labeled examples.
Hint: Show how to define an appropriate
probability distribution on the given set of examples, and how to set
epsilon, so that the PAC learning algorithm returns a
hypothesis consistent with the given set of examples with
probability at least 1-delta.
- (20 points) Consider the space of instances X corresponding to
all points in d-dimensional space. Compute the VC dimensions for the following
hypothesis spaces and clearly explain why your answer is correct.
- Hd the set of all axis-aligned boxes in d-dimensional space.
Points on the boundary
or inside of the target box are positive and the rest are negative.
(To get started you may want to consider when d=2 in which case the target
concept is the class of rectangles in the plane).
- Hp the set of polygons (with any number of sides) in the x,y plane
(so you only need to consider when d=2).
- (20 points) In this problem we consider the concept class of monomials.
- A monomial is monotone if it contains no negated
literals. Prove that for the concept class MMn
of monotone monomials defined over n Boolean attributes
that VCD(MMn) = n. defined
- For Mn the class of (general) monomials defined over
n Boolean variables show that:
n <= VCD(Mn) <= n log2 3.
- (50 points) In this problem you'll compare the bounds of a PAC
algorithm with what is found experimentally for the problem of learning
rectangles in the plane. You will need to write a procedure that takes
as input a set of positive and negative examples and returns a rectangle
consistent with this data. Also, you will need to write a procedure
that randomly generates a point in the plane and labels it based on
a target rectangle. Generate a variety of target
rectangles at random corresponding to different rectangles in the
plane and try some different distributions besides just the uniform
distribution.
For three different choices for the distribution over the examples
space X, plot the average generalization error (over about 100
different target concepts) as a function of the number of training
examples, m. You can use a random 1000 examples (drawn from the same
distribution used to generate the training data) to measure the
generalization error. Along with showing the point also, show a 95%
confidence interval. On the same graph plot the theoretical
relationship between epsilon and m for delta = 0.95. How close do
they match? Give an explanation for your findings.
- (40 points) A membership query is designed to model the ability
to experiment. For instance space X and any x in X, MQ(x) returns the
correct label for x. A monotone DNF formula is of the form
t1 v t2 v ... v tk where each term
ti is a conjunction of any subset of the n Boolean
variables x1, ..., xn where no negations can be
used. For this problem you are to give algorithm to learn any
monotone DNF formula with k terms over n boolean attributes that uses
a polynomial number of MQs and makes a most k mistakes in the mistake
bound model of learning. As part of your analysis you should
derive a bound for the number of MQs made (as a function of k and n).
Hint: It will be helpful to think of the domain X as a
lattice (or hasse diagram) where 11...11 is at the top and 00....00
is at the bottom and the children of each node are are examples
that can be reached by changing a 1 to a 0 in any bit position.
For example when n=3, you have the following lattice:
Think about what properties hold about how these examples can be
labeled as + or - when labeled according to a monotone DNF formula.
- (30 points)
In this problem we considered a simple case of learning with queries
where the feedback can be erroneous. The learner and adversary agree
on a number n, and then the adversary thinks of a number between 1
and n, inclusive. The learner must find out which number the
adversary has selected by asking questions of the form, ``Is your
number less than t?'' for various t. A binary-search approach
allows you to ask at most log2 n questions before finding the number.
To make this an interesting problem, suppose that the adversary is
allowed to incorrectly respond to at most one question. How
many questions must the learner now ask? (A bound of 3 log2 n is
easy: the learner can just ask each question three times and take the
majority vote of the adversary's responses. Also it is fairly easy to extend
this idea to get a bound of 2 log2 n + 1.)
For the problem you need to design a learning
algorithm that uses a number of queries of the form
log2 n + f(n)
where f(n) grows asymptotically more slowly than the logarithm
function (i.e. f(n) = o(log n)). Be sure to clearly describe
your algorithm and prove that it satisfies the requirements of
the problem.
- (20 pts) In this problem we consider r-of-k threshold functions
over n Boolean attributes. For a chosen
set of k (k <= n) variables and a given number r (1 <= r <= k),
an r-of-k threshold function is true if and only if
at least r of the k relevant variables are true.
Assuming that both r and k are unknown to the learner, show that
the class of r-of-k threshold functions can be learned in the
mistake-bound model using the halving algorithm. In your analysis you should
give the mistake bound obtained by your algorithm.
Recall the Binomial Theorem: [sumk = 0 to n C(n,k)] = 2n.
- (30 points) Consider the hypothesis class H of "regular, depth-2
decision trees" over n Boolean variables. A "regular, depth-2 decision
tree" is a depth-2 decision tree (a tree with four leaves, all distance 2
from the root) in which the left child and right child of the root are
required to contain the same variable.
- As a function of n, how many syntactically distinct trees are there in H?
- Given an upper bound on the number of examples needed in the PAC model
to learn H with error epsilon and confidence delta.
- Consider the following Weighted-Majority algorithm for the class H.
You begin with all hypothesis in H assigned an initial weight equal to 1.
Every time you see a new example, you predict based on a weighted majority
vote over all hypothesis in H. Then instead of eliminating inconsistent trees,
you cut down their weight by a factor of 2. How many mistakes will the
procedure make in the worst case as a function of n and the number of mistakes
made by the best tree in H?
- (30 points) Read one of the following papers and write a paper
critique follwing these guidelines. You will be required to have a
conference with Dr. Goldman to discuss the paper and part of your
grade will be based on this conference.
- CHOOSE YOUR OWN ADVENTURE. You can propose any additional homework options
(or variations of those given above) to Dr. Goldman. If approved a point value
will be given.
- (10 points) Simulate the DFA algorithm covered in class to learn
the following DFA. Show the classification tree and hypothesized
DFA at each step of the algorithm.