CS 527A Frequently Asked Questions (FAQs)


Frequently Asked Questions:


Homework 3 Questions

I'm doing Problem 1 and am not sure what is asked for?

Let me go over what is required and give slightly more guidance than given. If you have already completed this problem using different choices, that is fine.

  • I would recommend that you pick a few hundred training examples. You can just have each coordinate be an integer in the range [-1000,1000].

  • You should generate a separate test data set of 1000 (or more) examples so that your reported squared errors are very accurate.

  • A training iteration is one time through the training examples.

  • Plot the decision surface at various times on the same plot with the training data. Ideally have the positive and negative examples in the training data color coded. Then after 5, 10, 50, 100 training iterations (or different choices that are more appropriate for your results) show the decision surface. All of these should be on the same plot.

  • To generate a plot showing the squared error you should use your test data to compute the squared error after each (or perhaps every 5 to 10) iterations.

  • In the parts when you are adding noise to the training data, do not add noise to the test data. If you generate validation data you should also add noise to the labels there.


Homework 2 Questions

For the last part of Problem 6, Part III do I have to implement the new method or just talk about one in a report?

Yes, you need to implement the method and report on what was found experimentally. Be sure to show the results, interpret them and explain why your change had the effect it did.

Can I do more than 100 points?

If you do more than 100 points since those are the problems you are interested in then the score will be rescaled so the maximum is out of 100 (i.e. the score recorded will be 100 * number of points you recieved divided by the number of points possible). To help provide flexibility you are allowed to submit either part 1A or part 1B or both.

Finally, if you do extra problems that you do not want included in your grade but would like feedback on your answers you can also do that. However, such problems should be clearly marked as not to be included as part of the recorded grade.

Could you remind me what the most specific hypothesis means?

The most specific hypothesis is one such that all other consistent hypotheses classify a subset of examples as positive. In general, there may not be a single most specific example. The version space can be viewed as a partial order and so a unique most specific need not exist.

For 1A do I need to prove that S is consistent with just the positive training examples or with all of the training examples?

For 1A you need to prove that S is consistent with ALL of the training examples.

For 1B should n be viewed as a variable?

For 1B, you should view the number of attributes, n, as a variable but all examples have n attributes. However, showing the cardinality of G is exponential can be exponential in n (by giving an set of training examples that causes it) is what you should do.

Remember that the version space holds all hypothesis consistent with ALL the training data which includes the positive and negative examples.

In regards to Problem 3, could you give an example to help clarify what hypotheses are in H?

Recall that X is the example space (i.e. the set of possible examples not the number of training examples). As a concrete example suppose there were three boolean features. Then |X| = 8. The initial unbiased hypothesis space would contain 2^8 examples one corresponding to each way to assign a + or - to each of the 8 possible examples. As training examples are received the VS will be reduced in the standard way. Each hypothesis in the VS (which initally has 2^8 hypotheses in this example) has a single classification for each example. If you try to use the VS using an unbiased algorithm to determine a prediction for a new example x you would count the number of hypothesis in the current VS in which x is + and those in which x is - and predict with the majority. In this problem you prove that all the way up until the last example exactly half of the hypotheses in the VS predict + and all predict -.

For the second set of questions for Part I and Problem 6, what is mean by saying that the target concept is that described by the decision tree in Figure 3.1?

You are to answer the second set of questions under the assumption that all possible examples have correct labels given by that in the decision tree of Figure 3.1. Note that any decision tree that classifies some example different than the tree in Figure 3.1 is an incorrect tree. Remember that you can add examples to play-tennis.ssv and experiment with what happens if that will help you.

I'm doing Part I of Problem 6 and it appears there are two parts within it. Please clarify what assumptions we are to make when answering the last three questions?

There is a missing paragraph break in the html document. Part I of problem 6 should have been formated as:

Try running the decision tree learner on a randomly chosen subset of half of the examples for training, and using half for testing. What are the training and test accuracies? What conclusion(s) can you draw from this?

Determine if each of the following are true or false assuming that the target concept is that described by the decision tree in Figure 3.1, that all examples you add must be consistent with the target concept, and without using any post pruning. Explain how you reached your answers and convince us of them.

  • Is it possible to get ID3 to further elaborate the tree below the right leaf in Figure 3.1 (and make no other changes), by adding a single new (correct) training example?
  • Is it possible to get ID3 to learn an incorrect tree (i.e. a tree that is not equivalent to the target concept of Figure 3.1) by adding a new correct training example?
  • Is it possible to get ID3 to include the attribute Temperature in the learned tree, even though the true target concept is independent of Temperature

So you are really responding to two set of questions. In the first you are training on a random half of the data. For the second set of questions you are to begin by training the basic decision tree algorithm using the data in Table 3.2 in the text (which is the same data in play-tennis.ssv). For the first two questions you are limited to adding one new example which is labeled according to the decision tree shown in Figure 3.1 of the text. For the last part you can add any number of examples to the 14 in Table 3.2 but they must all have a label as given by the decision tree in Figure 3.1.

What is Problem 2.10 in the text?

Here is Problem 2.10 (modified slightly to use the more standard notation as in class). The book shows the conjunction by just giving the desired value for the relevant attributes and a "?" for the irrelevant attributes.

Implement the Find-S algorithm. First verify that it successfully produces the trace in Section 2.4 for the EnjoySport example. Now use this program to study the number of random training examples needed to exactly learn the target concept. Implement a training example generator that generates randome instances, then classifies them according to the target concept:

   (Sky == Sunny) AND (Temp == warm)
Consider training your FIND-S program on randomly generated examples and measuring the number of examples required before the program's hypothesis is identical to the target concept. Can you predict the average number of examples required? Run the experiment at least 20 times and report the mean number of examples required. How do you expect this number to vary with the number of "?"s (the irrelevant attributes which are not in the target formula) in the target concept? How would it vary with the number of attributes available.

Note: To respond to the last two questions think about both analytically determining the answer (if you can) and run experiments and report on your findings.

I'm doing number 6 and am getting errors when I type make. What should I do?

In the zip file is the executable so you may be able to just skip that step. The other solution is to use the gnu make which is what the makefile is designed for. To do this type
/pkg/gnu/bin/make
You can ignore the warnings you get. Then you should be all set to run the program.

I'm having trouble using unzip to extract the components of dt-code.zip. What can I do?

This seems to only be a problem in windows. So if you can unzip it in unix that you should not have any problems. However, if you can only work in windows, here are links to the individual components. For Parts I and II you only need is dt (the executable), play-tennis.ssv, and vote.ssv.


Homework 1 Questions

Where do I submit my homework (if I didn't submit it in class)?
Submit the homework in Dr. S. Goldman's mail box in Bryan 509 (the CS office). If you can't find it just ask the secretary you see as you enter the CS office.

I seem to do a lot better when I just don't use the boards as viewed from black's perspective (i.e. the boards just before it was black's turn. What should I do?

That is a common experience. As an option remove the use of the board from viewed from black's perspective. Try variations when you do this from the start and when you wait until about 50 games to make the change. Look at the data and report on what happened (as one of your options for the last part). Now try to explain what was happening when you continued to use the boards from black's perspective for training. If you would like as a second option, try to think about anything you could change from what was proposed that will make it work better when using these boards. If you do this successfully (i.e. change the way in which you use the boards just before black moves to actually help performance beyond the initial games) then you can use that as one of your options. Explain the changes you made and why you believe they worked.

I've read the FAQs below and done these changes but the weights still seem to fluctuate a lot. What might be causing this?

Try using a smaller learning rate (say 0.01) and this will make the weights more stable. Ideally (to reduce training time) you can start with a learning rate of 0.1 and then after 50 games or so start dropping it. You can do this through a gradual change or just switch to 0.01. Of course, making these changes and reporting on the results can be used as on of your options for the last part.

My learner algorithm seems to be working pretty well and is winning about 80% of the games. However, the weights still seem to be fluctuating a lot. Is this suppose to happen?

Training using the learner's view and also the opponent's view does help speed up training time but it will cause fluctuations in the weight and hence performance won't be as good as it would if you just used the board states as viewed by the learner. Also, having the learner use one set of weights for both going first and second, will cause fluctuations in the weight since a slightly different strategy might be needed for these two situations.

Here is something you can do that would be approproate for one of the options to try for the last part. Once the learner is playing pretty well make the following two changes. First, stop using the sequence (b'_0, b'_1, .... b'_m') which are the boards just before the opponent moves for training. Instead only use the sequence of board states just before the learner moves. Second, keep two set of weights (initialized with the current weight values), one that is used for playing and is updated when the learner goes first and the second that is used for playing and is updated whent he learner goes second. This should help improve performance in terms of the percentage of games won and also should help stabilize the weights. Study how the weights change and from this can you say anything about how the strategy changes when the learner or opponent goes first. If you have enough cycles, you could try starting from the beginning when the learner always goes first and only the board states as viewed by the learner are used for training. How do the final weights and the performance of this learner compare to one which was trained as described in the lab and then switched (as described) above. Also, how does the total training time (i.e. number of games played) compare?

I read the question posted below about the 1-ply search to select a move but it doesn't address what to do if the opponent has no legal move (but the game is not over) or if after the opponent move's the learner has no legal move (but the game is not over). Can you please address this situation.

The quick answer is to think of the middle level of the tree to be the set of move(s) that the opponent makes before it is again the learner's turn. In most cases the opponent will make a single move. However, if the learner makes a move for which the opponent has no moves then this same board is the next board the learner sees. In the tree you can view the opponents only legal move as a no-op move that does nothing. Since the node at the middle level has just one child, the Vhat value for that board will directly progate up to the middle level. Alternatively, you can think of just directly applying Vhat for a board in which the opponent cannot move and using this value for the middle level. In the case in which the opponent's move leaves the learner with no move you can expand further until either a final board is reached (in which case you use the true value of -100, 0 , or 100) or until a board in which the learner can move is reached. Then compute the Vhat value for the board in which the learner can move. When propogating these values up the tree you always pick the minimum value of the children when it was the opponents move and then the maximum value only at the last step when the learner is picking the move. Finally, if after the learner moves or the opponent moves a final board is reached (i.e. neither player can move) then use the true value of -100 (learner loses), 0 (draw), or 100 (learner wins) for that node.

Do I have to explicitly build the 1-ply search tree?

No. You can implement this either iteratively or recursively and figure out which move to make without explcitly building the tree. See Dr. Goldman or one of the TAs if you don't see how to do this.

I decided to try playing the greedy opponent against the initial random player for a sequence of games just to see how the random player does. The random player seems to be winning over 40% of the games. Is this right?

This kind of winning percentage can be obtained if the random player is using a random number generator which is not really that good (like Java's random number generator). The reason for this is that if the random player tends to cluster all of its moves in a corner then it will tend to do pretty well. To see if this is happening with your game print out the state of the board about 2/3 of the way through the game and see if the random player's pieces are clustered together. If so, then the random number generator is to blame. Remember that for the assignment you only needed to use the random learner once and after that there should be very few ties to break and so you really don't need to fix the random number generator. However, if you want to try and fix it select a random number generator (directly from those in Java's library) that already returns an integer in the given range. Those should have been designed to have reasonable random looking behavior. A true random player will win about 30% of the games against greedy.

Could you please show a schematic of the board and go over the features we are to use?

C E E E E E E C
E X * * * * X E     C = corners
E * o o o o * E     E = non-corner edge
E * o o o o * E     X = diagonally one from corner
E * o o o o * E     * = adjacent to an edge space
E * o o o o * E     
E X * * * * X E     
C E E E E E E C
The spaces marked by an "o" would only be considered when counting the number of black and whites. For all of the other features you count the number of white discs in the designated type of space for one feature and count the number of black discs in the designated type of space for another feature. The number of empty spaces of the designated type do not get counted for any of the features.

The learner's weights are getting very large, what might be causing this?

I recommend that you check the following portions of your code.

  • Do you normalize all of your features so that they are in the range of 0 to 1? This is done by dividing by the maximum possible value for the feature as discussed in class.

  • Check that you correctly implement the LMS algorithm used to update the weights. Here's a good way to do this. Just before the inner loop in which each of the 11 weights are updates print the value of V_train and V_hat. Then after the weight update compute the new value of V_hat. The new value of V_hat should be about 10% closer to V_train than it was before the update. If this is not the case look carefully at your code and figure out why.

  • Finally, if the weights are correctly update then this must mean that the V_train values are themselves getting very large and they should roughly be in the -100 to +100 range. So if they are getting very large then check your procedure that computes the values of V_train. Remember for the last board you let V_train be +100 (for win), 0 (for tie) and -100 (for lose). For all other boards you let V_train(bi) = V_hat(bi+1).
If you are still having problems come to our office hours or send email to Dr. Goldman.

What should be done in playing the game when during the 1-ply search an end board is reached?

If the learner's move takes you to a final board then you should associate the value of 100 (if the learner wins), 0 (if the learner draws) and -100 (if the learner loses) with that node in the tree. In other words, if the learner sees that a move takes it to a winning board this move (or one that also has value 100) is taken. Similarly, if one of the learner's moves takes it to a losing board then this move is not taken unless all legal moves lead to a losing board.

If the opponents move takes you to a final board you can either go ahead and use the true board value or just use V_hat. The first option is certainly prefered in terms of playing the best game but either is fine for the purpose of this lab.

Are the V_train values for all boards computed prior to updating any weights?

Yes, you compute V_train(bi) = V_hat(bi+1) for all boards prior to updating the weights. (Of course, this rule shown above is applied to all boards but the last one. For the last one you use 100 if the learner won, 0 if it was a tie, and -100 if the learner lost.)

What is the late policy?

Here is the late policy.

Where can I submit my homework?

Either in class or in Professor Goldman's mailbox in Bryan 509 at least 15 minutes before class on the day it is due.


Return to the CS 527A Home Page