CS 527A, Spring 2002, Homework 1


Due on Wednesday January 23rd. Last day for late assignments is Monday January 28. Late assignments can be submitted up until 4pm but you must attend class if you are to submit an assignment anytime after 12:45pm on Monday.

A signed cover sheet for Homework 1 must be submitted with your homework.

For all homeworks in CS 527A you will always be given the option of replacing it by Choose Your Own Adventure in which you design your own homework related to the same topic being covered in the homework. As an example, for Homework 1 you could pick a different game than Othello. More significant variations are also allowed. You must get approval from Dr. Goldman for any Choose Your Own Adventure homework. This can be done by submitting a proposal via email or by an appointment.
In this homework you will implement the game learner described in class to learn to play Othello. The rules for Othello can be found at http://www.pressmangames.com/instructions/instruct_othello.html. The learner will be the white player. I recommend that you complete the components in the following order. Note that the first two steps which relate to Othello don't depend upon the lecture material so you can get started right away.
  1. (15 pts) Build the components needed that are specific to Othello. This includes:

    Spend some time thinking about how to structure your code. For example, to determine if a move is legal and to update the board given a legal move, it would be useful to have a method that takes a board, a move, and a direction (which can be specified as an change in the x of -1,0, or 1, and a change in y direction of -1, 0, or 1) and returns how many discs would be flipped in that direction for the given move.

  2. (10 pts) Implement a greedy opponent to train against which always picks a legal move that maximizes the number of discs flipped.

  3. (30 pts) Implement the learning algorithm given in class using the following 8 features: the number of spaces occupied by a white disc, the number of spaces occupied by a black disc, the number of corners occupied by a white disc, the number of corners occupied by a black disc, the number of non-corner edge spaces occupied by a white disc, the number of non-corner edge spaces occupied by a black disc, the number of spaces diagonally one away from the corner occupied by a white disc, the number of spaces diagonally one away from the corner occupied by a black disc, the number of spaces adjacent to an edge space that are occupied by a white disc, and the number of spaces adjacent to an edge space that are occupied by a black disc.

    In implementing your algorithm use a learning rate of 0.1 and start all weights at 0. Also be sure to normalize all of the feature values to be in the range of 0 to 1. Use 100 as the board value for a winning final board and -100 as the board value for a losing final board. Finally, for each game you should use the game from the view of the black player and the white player as training data. This really helps decrease training time when the strategy does not depend very much on which player goes first. In the games have the white player (i.e. the learner) go first every other round. Train your learner until it is no longer improving its performance.

  4. (15 pts) You should write a short report that has the following information. Be sure that all plots are clearly labeled.

  5. (30 pts) Consider at least two significant variations to what you have done to explore further. Think about variations that either enable the learner to obtain a better performance level (in terms of the percentage of games won) or speed-up training time (perhaps at the cost of performance). For example, you could study how varying the learning rate affects performance, you could see how varying/changing the features affects performance, you could study modifications to the value given to the final game board, you could add a component to pick a different starting board state for training, you could incorporate look-ahead in choosing the move, you could use different opponents to train agains. While I've given a lot of examples, there are many more variations that are not listed that you could consider. Be creative --- Think about something you would be curious about and try it. If you have any uncertainties about something you are considering just ask Dr. Goldman.

    Along with including additional plots, you should provide a 1-2 page write-up that clearly describes the changes you made, and discusses the affect these changes had on the learning quality and training time. Be sure to independently test each of the changes (and also together if you would like) to understand how the each affects the performance. It is not necessary that your changes improve performance as long as you explain why you thought they would help and then explain what happened.