This homework is due on Wednesday February 6th with the standard late policy applying.
A signed cover sheet for Homework 2 must be submitted with your homework.
(a2 == warm) AND (a4 == weak) AND (a5==warm)The inductive bias of FindS is to output a most specific hypothesis in the version space.
In class we briefly argued that given the hypothesis space of conjunctions there is always a single element of S (the most specific boundary of the version space) and FindS simply computes this element. In your own words, write a proof that S always holds one element and that this is the hypothesis of FindS.
Suppose we change the hypothesis space to be a disjunction of constraints of the form
ai == specific_valueover the attributes a1,...an. That is, it is like above except the "AND" now becomes an "OR". Give an example to demonstrate that the cardinality of S can become exponential in the number of attributes. You do not need very many attributes (or very many updates) to achieve this.
(X1 AND X2) OR (X3 AND X4).In other words, use as your training data all 16 entries of the truth table with those examples in which the formula is true labeled as positive (+) and those examples in which the formula is false labeled as negative (-). Be sure to give the decision tree created by the learning algorithm versus just one you could create by hand that would properly represent the function. To make the grading easier, if there is a tie in the information gain break ties by using the attribute with the smallest index.
You can do this problem by hand, or by using a program. Just be sure to show your work. If you decide to do this by hand you might want to write a small program that computes the information gain for each attribute given a set of examples.
If you are just doing the first two parts (and using CEC) you should not need to compile the code. The executable is already included in the zip file. However, if you are doing the third part or using your own computer, to compile the code use make or gmake. On CEC you can run gmake by typin /pkg/gnu/bin/make Once you have done this you should find an executable called dt. To run it type
dt [-sIf you do not provide a random seed then the system clock will be used. Each of train %, prune %, and test % are real numbers between 0 and 1 with the specification that their sum is at most 1.0. They specify, respectively, the fraction of the data that will be used for training, pruning (i.e. the validation data) and testing. To be sure it is working correctly, try doing]
dt 1.0 0.0 0.0 play-tennis.ssvWhen you do this you should get the decision tree show in Figure 3.1 of the textbook.
There are three parts to this problem. If you do only the first part it is worth 30 points. If you do the first two parts, then it is worth 50 points, and if you do all three parts it is worth 100 points.
Try running the decision tree learner on a randomly chosen subset of half of the examples for training, and using half for testing. What are the training and test accuracies? What conclusion(s) can you draw from this?
Determine if each of the following are true or false assuming that the target concept is that described by the decision tree in Figure 3.1, that all examples you add must be consistent with the target concept, and without using any post pruning. Explain how you reached your answers and convince us of them.
Here you will use the voting data. The first attribute of each example describes the political party of the representative, and the remaining attributes indicate their yes/no/absent vote for each bill considered by congress. You will use this data to learn a decision tree that predicts the political party of the representative based on his/her vote.
Use the voting data to build a decision tree to predict the political party. Use 25% of the members of congress for training and the rest for testing. (So no pruning.) Try this several times and study the impact of different random splits. Report the tree sizes and accuracies of these trees over 6 distinct runs.
Now measure the impact of training set size on the accuracy and size of the learned tree (no pruning, and 30% of the data for testing). Consider training set sizes in the range of 0-40% (include at least the values .02, .1, .2, .3 and .4 for training fractions). Because of the high variance due to random splits you should repeat each experiment with at least 10 different random seeds. You must write a short report (similar to HW 1) describing and discussing your findings. Include in your report two plots showing how accuracy varies with training set size, and how the tree size varies with training set size. Also report the maximum, minimum and average accuracies and tree sizes for each training set size.
Next measure the impact of pruning on accuracy and size of the learned trees, under the same conditions as above. Use the same training set sizes, but this time use 30% of the data for pruning and 30% for testing. What is the impact, if any, of post-pruning the tree?
The code as provided uses information gain to select attributes while growing the tree. Modify it to instead select attributes at random and study the effect of this change by repeating all of the experiments you have done so far. To make this change, look at the function MaxGainAttribute() in entropy.c. Replace this by a method that randomly selects an attribute. From your experiments what can you conclude about the impact of learning with randomly selected attributes.
Finally, explore one more non-trivial topic of your own choosing. For example, develop a different method for select attributes that you think will work well and report on it. Or you could implement a rule post-pruning method and add it to the source code and report how its performance compares to that of reduced error pruning? Be creative.