CSE 131 Module 3: Arrays

Important!

Before beginning any work, do a Team...Pull in your repository.

Extensions

Extension 1: Pascal's Triangle (3 points):

Authors
Take a look at the Wikipedia page on Pascal's Triangle. We will implement this using a two-dimensional array, so it may be easier to imagine the entries left-justified rather than in triangular form:
        column
        0  1  2  3  4
row
0       1
1       1  1
2       1  2  1
3       1  3  3  1
4       1  4  6  4  1
        .
        .
        .
Viewed as a two-dimensional array, the computation is specified as follows, for each entry at row r and column c:

Procedure

  1. Open the PascalsTriangle class, found in the arrays package of the extensions folder.
  2. Insert code to obtain from the user the value N which is the number of rows you should compute of the triangle.
  3. Instantiate the two-dimensional array needed to hold the results.
    Java lets you do this as a ragged array, where each row can be of a different length.

    You are welcome to instantiate the array that way, but it is simpler and equally acceptable to instantiate the array with the same number of columns in each row. Java syntax for that is much simpler. An example of this appears in SelfAvoidingWalk, found in the book.ch1 package of the coursesupport source folder in your repositories.

  4. Compute the triangle as a two-dimensional array and print the results left-justified as shown above.
  5. For extra fun, try to print the triangle centered as shown on the Wikipedia page.
When you are done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.1
Last name WUSTL Key Propagate?
(NOT your numeric ID) Do not propagate
lower case only
e.g. Smith j.smith
1    

Acknowledgements and assertion of integrity

You must select one of the options below
The work submitted here was performed in accordance with this course's policy on collaboration.
On your honor, you have neither given nor received any unauthorized aid on this assignment.

However, the following TAs, students, or professors were supportive in completing this assignment.
Their help was also in accordance with course policies.

Thanks to (leave blank if appropriate):

In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

You would like to be contacted by an instructor to facilitate staying on track in this course.

Comments about this:

You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

You would like to be contacted by an intructor to faciliate staying on track in this course.

Comments about this:


TAs double check!
  • This demo box is for extension 3.1
  • The student has committed and pushed the work, and verified that it appears at bitbucket.
TA: Password:

End of extension 1


Extension 2: Birthday Problem (5 points):

Procedure

To make things simple, let's assume that all months have 31 days.
  1. Open the Birthday class, found in the arrays package of the extensions folder.
  2. Insert code to obtain from the user the number of people N that will enter the room.
  3. For each person, randomly generate a month and day on which that person was born.
  4. Using a two-dimensional array that is indexed by month and by day, keep track of the number of people born on that month and day.
    For example, perhaps you randomly decide the person was born in month 4 and day 28.
  5. After processing all of the N people, iterate over your array to compute:
When you are done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.2
Last name WUSTL Key Propagate?
(NOT your numeric ID) Do not propagate
lower case only
e.g. Smith j.smith
1    

Acknowledgements and assertion of integrity

You must select one of the options below
The work submitted here was performed in accordance with this course's policy on collaboration.
On your honor, you have neither given nor received any unauthorized aid on this assignment.

However, the following TAs, students, or professors were supportive in completing this assignment.
Their help was also in accordance with course policies.

Thanks to (leave blank if appropriate):

In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

You would like to be contacted by an instructor to facilitate staying on track in this course.

Comments about this:

You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

You would like to be contacted by an intructor to faciliate staying on track in this course.

Comments about this:


TAs double check!
  • This demo box is for extension 3.2
  • The student has committed and pushed the work, and verified that it appears at bitbucket.
TA: Password:

End of extension 2


Extension 3: GC Content (2 points):

Authors
Computer science plays a large role in modern biology. BioJava is a framework for working with biological data. It provides a framework for using sequences and performing common biology functions. In this extension you will be reading in a DNA sequence and computing its GC-Content.

Notes


Procedure

  1. Complete the method percentGC in GCContent.
    Note: BioJava already has a function to get the GC-count (number of G and C nucleotides) of a sequence.

    Do not use this function.

    Instead, devise a way to iterate through the array of characters (representing nucleotides) yourself.

  2. Run the unit test GCContentTest and make sure it passes.
    Most students' code fails when given a string with no DNA in it whatsover (an empty string or "". Its GC content is 0%.
  3. Run GCContent as a Java Application and be prepared to answer questions by the TA at the demo.
When you are done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.3
Last name WUSTL Key Propagate?
(NOT your numeric ID) Do not propagate
lower case only
e.g. Smith j.smith
1    

Acknowledgements and assertion of integrity

You must select one of the options below
The work submitted here was performed in accordance with this course's policy on collaboration.
On your honor, you have neither given nor received any unauthorized aid on this assignment.

However, the following TAs, students, or professors were supportive in completing this assignment.
Their help was also in accordance with course policies.

Thanks to (leave blank if appropriate):

In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

You would like to be contacted by an instructor to facilitate staying on track in this course.

Comments about this:

You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

You would like to be contacted by an intructor to faciliate staying on track in this course.

Comments about this:


TAs double check!
  • This demo box is for extension 3.3
  • The student has committed and pushed the work, and verified that it appears at bitbucket.
TA: Password:

End of extension 3


Extension 4: Reading Frame of a Sequence (10 points):

Authors
Sequences of nucleotides can be read as sequences of nucleotide triplets, known as codons. This means there are 3 different ways of reading a nucleotide sequence. One starts with the first nucleotide in the sequence, one starts with the second, and one starting with the third, as shown below:

Image from https://wikispaces.psu.edu/display/Biol230WFall09/Protein+Translation

How do we determine which reading frame is correct? Our approach is based on the following observations, which are somewhat simplified to suit our purposes. The interested student is encouraged to take Bio 2960 for a much more thorough and revealing treatment of this material.
Thus, the best reading frame for a sequence of DNA is the offset at which translation would produce the longest chain of amino acids. The DNA sequence that produces that chain begins with the start codon. The start codon and all subsequent codons are interpreted in groups of 3 bases. The sequence contains no stop codon until the end of the translated region. In other words, the translated region may contain multiple start codons in the middle (which are translated as the amino acid methionine), but the region contains no stop codons in the middle.

Your task in this extension is to analyze a sequence of DNA to determine its best reading frame.

Notes

Procedure

  1. The code given to you prompts the user for a genome DNA file, and then reads that file into a char array using methods provided by bioJava.

    If you are uncertain about char arrays, this would be a good time to review the relevant material in your text and in the lecture notes.

  2. Your work takes place in the method bestReadingFrame. Open that file now and take a look at what is provided.

    There are comments in that file to direct your work, which consists of the steps described below.

    You should read these instructions and the comments carefully before you begin. Solutions that fail to follow the instructions will receive no credit!
  3. First, define 3 char arrays, one for each of the possible stop codons: ochre, amber, and opal.
  4. Next, define a char array named methionine for the start codon.
    The most common start codon is Methionine, which is encoded by the DNA sequence AUG. For the purposes of this extension, Methionine is the only start codon you will consider.

    More information about start and stop codons can be found here.

  5. The rest of the code you write will attempt to read the DNA in each of the possible 3 reading frames. Find the best reading frame and return the index at which that best reading frame occurs. Thus, your method will return a value as follows:
    Hint: Scan the DNA one base a time, looking for the longest coding sequence that begins at that base. A coding sequence begins with a start codon and ends at the next stop codon that is found by scanning triplets.

    Based on the index at which the longest coding sequence occurs, compute the corresponding reading frame (0, 1, or 2). That is the value you want to return.

When you are done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.4
Last name WUSTL Key Propagate?
(NOT your numeric ID) Do not propagate
lower case only
e.g. Smith j.smith
1    

Acknowledgements and assertion of integrity

You must select one of the options below
The work submitted here was performed in accordance with this course's policy on collaboration.
On your honor, you have neither given nor received any unauthorized aid on this assignment.

However, the following TAs, students, or professors were supportive in completing this assignment.
Their help was also in accordance with course policies.

Thanks to (leave blank if appropriate):

In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

You would like to be contacted by an instructor to facilitate staying on track in this course.

Comments about this:

You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

You would like to be contacted by an intructor to faciliate staying on track in this course.

Comments about this:


TAs double check!
  • This demo box is for extension 3.4
  • The student has committed and pushed the work, and verified that it appears at bitbucket.
TA: Password:

End of extension 4


Extension 5: Data Sorting and Analysis (5 points):

Authors
Open sorting class in arrays folder... Do we want all extensions and assignments to have a class and folder already?
A big part of Computer Science revolves around finding ways to sort data to be organized in a useful manner. There are many different algorithms that computer scientists use to sort information. You can learn about a few popular ones below:
For the purposes of this assignment, we will implement our sorting using a naive process of repeatedly selecting the smallest value from a data set of type int[], and swapping it to the front of the collection. A naive process is the solution that seems must obvious but might not be the most efficient solution. The algorithm in this assignment is called Selection Sort .

Procedure

  1. Open the Sorting class, found in the arrays package of the extensions folder.
  2. Prompt the user for the size of the collection
    If the user enters a negative number, continue to prompt them with a useful message until they enter a positive number.
  3. After collecting the size of the array, prompt the user for integers one at a time, using an array to store the input data.
  4. After processing all of the numbers entered, the data will be sorted using the following naive algorithm:
  5. After the sorting the data, we will take advantage of its useful organization to compute the following statistics:
    Mean
    The simple average of the data.
    Median
    The middle value in the ordered dataset (Note: How does this computation vary for even and odd sized datasets?)
    Min
    The smallest value in the ordered dataset.
    Max
    The largest value in the ordered dataset.
    Range
    The difference between the maximum and minimum values of the data set.
  6. Finally, arrange for your output to display the sorted dataset and accompanying statistics in the following manner:
    1 2 3 4 5
    Mean: 3.0
    Median: 3.0
    Min: 1
    Max: 5
    Range: 4
When you are done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.5
Last name WUSTL Key Propagate?
(NOT your numeric ID) Do not propagate
lower case only
e.g. Smith j.smith
1    

Acknowledgements and assertion of integrity

You must select one of the options below
The work submitted here was performed in accordance with this course's policy on collaboration.
On your honor, you have neither given nor received any unauthorized aid on this assignment.

However, the following TAs, students, or professors were supportive in completing this assignment.
Their help was also in accordance with course policies.

Thanks to (leave blank if appropriate):

In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

You would like to be contacted by an instructor to facilitate staying on track in this course.

Comments about this:

You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

You would like to be contacted by an intructor to faciliate staying on track in this course.

Comments about this:


TAs double check!
  • This demo box is for extension 3.5
  • The student has committed and pushed the work, and verified that it appears at bitbucket.
TA: Password:

End of extension 5


Extension 6: Blackjack! (7 points):

Authors

Introduction

In this assignment, we create the game blackjack, the most widely played casino banking game in the world. In blackjack, each player plays against the dealer, not the other players. The goal of blackjack is to beat the dealer by attaining a score (the sum of the values of your cards) higher than the dealer's score without exceeding 21. Also, many casinos use an 8 deck shoe (meaning eight 52 card decks shuffled together). This practice prevents players from counting cards.

Background

Procedure

  1. In the blackjack package of the extensions folder, create a new class BlackJack and make sure it has the public static void main(String[] args) {…} method in it.
  2. Prompt the user for how many autonomous players, p, to have at the table, and how many games to play.
  3. Create a new array to keep track of everyone's scores, including yourself and the dealer.
  4. Construct a loop that will simulate the number of games to be played. Hint: We will have to reset each player's score between games.
  5. Next, create a mechanism that simulates the initial deal of two cards to each player. We must simulate drawing a random card. Because the suit of the card is not important for this game, we can think of the deck as only 13 possible cards with equal likelihood of being drawn. Use randomness to draw a card and then calculate its value.
    Remember that cards 2-9 will count as their number, 10-King all count as 10, and Aces count as 11.
  6. Verify that the array stores each player's score correctly. A player's score must be the sum of both cards drawn.
  7. Now we've reached the decision moment in blackjack. Each player decides whether to hit or stand. Start with the human player.
    Prompt the human player for their move. In order to make this decision, the human player should be able to see their score and the dealer's face-up card. In addition, after each hit, the human player should see their updated score.
  8. Once the human player is finished (either by stand, bust, or blackjack), simulate the autonomous players and then the dealer. The dealer follows the casino rule to hit until his score equals 17 or higher and then stand. Let's assume that the autonomous players follow the same rule. You can copy and paste the same game mechanics you used earlier to draw cards.
  9. After the dealer finishes his turn, print out the results of that game. Reminder: if a player busts, then they lose. If the dealer busts, all the players that did not bust win. If neither busts, players with a higher score than the dealer win and players with a lower score lose. If a player's score equals the dealer's (non-bust), then it is a tie, called a push.
  10. After all the games have ended, print out the percentage of the games that the human won.
  11. Here is a sample solution. In this implementation, player 0 is the dealer, player 1 is the human player, and players 2 and 3 are autonomous players. Notice that the human player got three blackjacks out of four games! This is very rare.
    You chose to play 4 games
    There are 2 autonomous players playing.
    
    Game 1
    The dealer's face-up card has the value of 10
    The players' scores are: 
    21 18 12 
    The Dealer's face-up card has the value of 10. And your current count is 21
    You chose to stand!
    
    Player 0 got 20
    Player 1 got Blackjack! (21)
     Player 1 beats the dealer!
    Player 2 got 18
    Player 3 got 20
     Player 3 pushed with 20
    
    Game 2
    The dealer's face-up card has the value of 8
    The players' scores are: 
    15 17 16 
    The Dealer's face-up card has the value of 8. And your current count is 15
    You hit!
    
    Player 0 got 19
    Player 1 Busts! 26
    Player 2 got 17
    Player 3 Busts! 23
    
    Game 3
    The dealer's face-up card has the value of 9
    The players' scores are: 
    15 14 15 
    The Dealer's face-up card has the value of 9. And your current count is 15
    You hit!
    The Dealer's face-up card has the value of 9. And your current count is 21
    Would you like to hit?
    You chose to stand!
    
    Player 0 got 17
    Player 1 got Blackjack! (21)
     Player 1 beats the dealer!
    Player 2 Busts! 24
    Player 3 Busts! 22
    
    Game 4
    The dealer's face-up card has the value of 11
    The players' scores are: 
    14 19 7 
    The Dealer's face-up card has the value of 11. And your current count is 14
    You hit!
    The Dealer's face-up card has the value of 11. And your current count is 21
    Would you like to hit?
    You chose to stand!
    
    Player 0 got Blackjack! (21)
    Player 1 got Blackjack! (21)
     Player 1 pushed with 21
    Player 2 got 19
    Player 3 got 17
    
    The fraction of human wins was 0.5
    
  12. Optional tasks:
When you are done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.6
Last name WUSTL Key Propagate?
(NOT your numeric ID) Do not propagate
lower case only
e.g. Smith j.smith
1    

Acknowledgements and assertion of integrity

You must select one of the options below
The work submitted here was performed in accordance with this course's policy on collaboration.
On your honor, you have neither given nor received any unauthorized aid on this assignment.

However, the following TAs, students, or professors were supportive in completing this assignment.
Their help was also in accordance with course policies.

Thanks to (leave blank if appropriate):

In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

You would like to be contacted by an instructor to facilitate staying on track in this course.

Comments about this:

You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

You would like to be contacted by an intructor to faciliate staying on track in this course.

Comments about this:


TAs double check!
  • This demo box is for extension 3.6
  • The student has committed and pushed the work, and verified that it appears at bitbucket.
TA: Password:

End of extension 6


Extension 7: Minesweeper (8 points):

Authors

Workspace Update

  • Update your eclipse workspace:

    Note and warning: It is very easy to find a solution to this problem in the Internet. This course does not have as one of its goals making you an expert on finding solutions on the Internet. The goal is to teach you the basic concepts of computer science and programming.

    You are not allowed to copy any code for your solution to this problem! The solution takes some 30 lines of code, so we are not asking you to write an air-traffic controller.

    However, even these 30 lines may be difficult for you to write at first. So, get help from a TA or the professor, come to office and TA hours, but do this work on your own so that you learn the material.

    Be assured that you will not do well on the exam if you do not understand this material.


    MineSweeper

    When you are done with this extension, you must be cleared by the TA to receive credit.

    This demo box is for extension 3.7
    Last name WUSTL Key Propagate?
    (NOT your numeric ID) Do not propagate
    lower case only
    e.g. Smith j.smith
    1    

    Acknowledgements and assertion of integrity

    You must select one of the options below
    The work submitted here was performed in accordance with this course's policy on collaboration.
    On your honor, you have neither given nor received any unauthorized aid on this assignment.

    However, the following TAs, students, or professors were supportive in completing this assignment.
    Their help was also in accordance with course policies.

    Thanks to (leave blank if appropriate):

    In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

    You would like to be contacted by an instructor to facilitate staying on track in this course.

    Comments about this:

    You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

    You would like to be contacted by an intructor to faciliate staying on track in this course.

    Comments about this:


    TAs double check!
    • This demo box is for extension 3.7
    • The student has committed and pushed the work, and verified that it appears at bitbucket.
    TA: Password:

    End of extension 7


    Extension 8: Linear Regression (4 points):

    Authors
    • Brandon Mendez

    Overview

    Machine learning is an area of study in computer science that falls within the larger area of study known as artificial intelligence. Here, we are trying to teach a computer to reason about a problem based on data we provide.

    One such simple task to consider is learning a geometric line, as characterized by its slope and y-intercept. Where two properties are related linearly, the line that characterizes their relationship can be used to predict the relationship between values that have not yet been seen. How many points must be provided to learn the implied line? We need only see two points that differ in their first coordinate to learn a line.

    In the real world, two properties are sometimes approximately but not exactly linear. In this assignment we consider the square footage of a house (one parameter) and that house's cost (the other parameter). If we assume that these two properties are linearly related, then we should be able to

    The simple example of machine learning that we study here is linear regression. This technique allows us to take many data points with two known variables, learn the line that best fits their relationship, and then predict the value of one variable when given the other. In this assignment you will learn a line given the actual selling prices and square footages of thousands of houses in the Broward County, Florida area. When your line is computed, you can then predict the price of a house of any square footage.

    Procedure

    1. In the extensions folder, find the regression package. You will write your code in the LinearRegression class.
      In this extension, you'll be working with slope and intercept, class variables that can be accessed from any method in the class. While we included these to make testing your code easier, they serve also as an introduction to instance variables, which you learn later in this course. The slope and intercept variables are
      public
      meaning they can be accessed outside of the class declaring them, and
      static
      a word which here means that they are available to all static methods in the class. Moreover, they will retain their values between calls to such methods. Feel free to ask a TA for more details if anything remains unclear.
    2. Finish the code for the learn() method.
      • This will require you to implement the simple linear regression formula included below.
      • More information on how the formula should be used can be found in this Khan Academy video.
      • You'll find the method StdIn.readDoubles() useful here to take in the data we provide for you in datafiles/housing/pricesarea.csv.
      • Note that you can only read from StdIn once in your entire program. If you want to access the values you read in more than one place, you will have to find a way to save and pass the values to wherever you need them.

      • In this assignment, x would be the square footage of a house, and y would be its selling price.
      • The notation x means the average of all of the x values.
      • The notation xy means the average of the products formed by multiplying each point's x and y values.
    3. Once you finish implementing the formula and getting a reasonable regression line equation, make sure you pass the testSlopeIntercept JUnit test before moving on.
    4. Finish the code for the predictPrice() method. This method should allow a user to pass in a square footage, and return an estimated price for a house of that size. The code for this method is very short; think of how the variables you just solved for relate to the price and area of a home. Make sure you pass the testPredictions JUnit test.
    5. When you finish, you can try using the RegressionGrapher class to see a graph of the data and your line. This is provided for you.
    6. Demo your results to a TA to receive credit.
    When you are done with this extension, you must be cleared by the TA to receive credit.

    This demo box is for extension 3.8
    Last name WUSTL Key Propagate?
    (NOT your numeric ID) Do not propagate
    lower case only
    e.g. Smith j.smith
    1    

    Acknowledgements and assertion of integrity

    You must select one of the options below
    The work submitted here was performed in accordance with this course's policy on collaboration.
    On your honor, you have neither given nor received any unauthorized aid on this assignment.

    However, the following TAs, students, or professors were supportive in completing this assignment.
    Their help was also in accordance with course policies.

    Thanks to (leave blank if appropriate):

    In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

    You would like to be contacted by an instructor to facilitate staying on track in this course.

    Comments about this:

    You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

    You would like to be contacted by an intructor to faciliate staying on track in this course.

    Comments about this:


    TAs double check!
    • This demo box is for extension 3.8
    • The student has committed and pushed the work, and verified that it appears at bitbucket.
    TA: Password:

    End of extension 8


    Extension 9: k Nearest Neighbors (4 points):

    Authors
    • Brandon Mendez

    Overview

    This assignment incorporates principles of Machine learning, which is an area of study in computer science that falls within the larger area of study known as artificial intelligence. Here, we are trying to teach a computer to reason about a problem based on data we provide.

    One such simple task to consider is learning the value of a function at some position in two-dimensional (say, x and y) space. The value of this function may not have an obvious relationship with its x or y coordinates. Moreover, we can't use linear regression to represent the function by a line, because the function's values depend on two inputs. We could try to generalize linear regression to use a plane to represent this function. Unfortunately, the problem we consider here is not amenable to accurate representation using a plane, but the data does possess one feature that suggests using a particular approach to approximate the function's values.

    The data we consider here is literally comprised of neighborhoods, such that points within a physical neighborhoods behave similarly. For the sake of this problem, we do not know the physical boundaries of any neighborhood. However, if we can impose neighborhoods of our own choosing over the data and use those implied neighborhoods to approximate the value of point within that neighborhood.

    In particular, we can approximate a position's value using this sample data by taking the k nearest points (neighbors) to the position of interest and averaging the neighbors' values, where k is an integer greater than 0.

    For any chosen value of k and any position of interest, that position's k nearest neighbors occur within some radius from the position. The value of k thus determines the circular neighborhood imposed at the position of interest, with the assumption that points within that circular area behave similarly. Where known data points are dense, the radius will be smaller; where known data points are sparse, the radius is adaptively larger, so that in any computation, k points participate in determining the value at a position of interest.

    But what exactly should k be? How many points must be provided to learn the implied value? As an example, consider computing some aspect of the weather (temperature, wind speed, etc.) at a point of interest based on averaging the aspect's value at the k neighbors nearest to the point of interest. If k is too large, then we will be computing an average value that holds over a large area, but probably does not hold at the specific point of interest. If k is too small, we may not have enough data to compute the point's value accurately.

    Thus, the choice of k is itself of interest in solving this problem.

    In this assignment we consider the location of a house (its position in space) and that house's cost (its value). If we assume that these two properties are related, then we should be able to predict the sales price of a house in any location by analyzing some actual examples of sales prices for houses and the location (latitude and longitude) of those houses.

    The simple example of machine learning that we study here is called k Nearest Neighbors. This technique allows us to take many data points with two known variables, plot them, and then predict the value of one variable when given the other. In this assignment you will approximate the values of hypothetical homes given the actual selling prices and locations of thousands of houses in the Broward County, Florida area. When your algorithm reads the given data, you can predict the price of a house at any location.

    Procedure

    1. In the extensions folder, find the neighbors package. You will write your code in the kNearestNeighbors class.
    2. Finish the code for the predictPrice() method.
      • This method takes in the x and y positions of a hypothetical home, along with an array containing the price, x, and y locations of actual homes. It also takes in a value for k so that we can test your code in different situations. Your method needs to be able to:
        • Calculate the distance of the house in question to each of the actual homes.
        • Use those distances to find the k nearest homes to the given house.
        • Average the values of those k nearest homes to predict the price of the house we are looking for.
    3. Once you finish implementing the algorithm, you can run NeighborhoodGraph to see it in action. This draws a real map of Broward County, with the housing data points on top of it. You can see that as houses get more expensive they show up as darker red, and as they get more affordable they show as a lighter yellow. If you click on the map, your prediction will show up in the Eclipse console. Make sure each area on the map corresponds to the appropriate price range relative to the other areas. If you are getting inconsistent results, make sure you are calculating distance properly, and storing the prices of the closest houses properly as well.
    4. Make sure that your code passes the testPrediction Unit test in the TestNeighbors class.
    5. Demo your results to a TA to receive credit.
    When you are done with this extension, you must be cleared by the TA to receive credit.

    This demo box is for extension 3.9
    Last name WUSTL Key Propagate?
    (NOT your numeric ID) Do not propagate
    lower case only
    e.g. Smith j.smith
    1    

    Acknowledgements and assertion of integrity

    You must select one of the options below
    The work submitted here was performed in accordance with this course's policy on collaboration.
    On your honor, you have neither given nor received any unauthorized aid on this assignment.

    However, the following TAs, students, or professors were supportive in completing this assignment.
    Their help was also in accordance with course policies.

    Thanks to (leave blank if appropriate):

    In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

    You would like to be contacted by an instructor to facilitate staying on track in this course.

    Comments about this:

    You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

    You would like to be contacted by an intructor to faciliate staying on track in this course.

    Comments about this:


    TAs double check!
    • This demo box is for extension 3.9
    • The student has committed and pushed the work, and verified that it appears at bitbucket.
    TA: Password:

    End of extension 9


    Extension 10: Apartment Hunting (4 points):

    Authors
    • Dotun Taiwo
    Adaptation of the famous Secretary Problem

    The Problem

    Imagine you want to choose the best apartment out of n apartments each with a rating from 0 to 1. The apartment hunting is done in a random order. When you are surveying an apartment, you can evaluate that apartment based on the ones you have seen so far, but you are oblivious to the quality of apartments you have not seen. If the decision to sign a lease for an apartment can be made at the end, this could be solved by simply tracking the maximum rating (and which apartment complex achieved it) and selecting the overall maximum. The difficulty is that the decision must be made immediately due to the high demand of apartments in the market. Once you reject an apartment choice it can not be renounced because it is likely that someone else bought it. The question is finding an optimal strategy that maximizes the probability of the best apartment being chosen.

    Such a strategy is the stopping rule. For example, consider the first k of n apartments, retaining the highest rating, but rejecting all. Continue from k+1 to n and select the first apartment who has a rating equal to or greater than the one computed for the first k, choosing the last apartment if it comes to it.

    In this extension, your job is to experiment and find the value of k that chooses the best apartment the most times.

    Procedure

    1. Create a class in the apartment package of the extensions folder
    2. Prompt the user for N apartment choices
    3. Create an array of N random doubles each ranging from 0 to 1 (utilize Math.random).
    4. Implement the stopping rule:
      • Iterate over the first k values of the array preserving the max then record the next highest number from the k+1 to N portion.
      • Record the maximum of the entire array.
        • The best secretary is picked when the max calculated using the stopping rule equals the overall max
    5. To experiment effectively, you should run multiple simulations for each value of k to see how many times the best aparment was picked.
    6. Calculate the optimal point-the k where the best apartment was picked the most times
    7. After the experimentation, print:
      • The calculated optimal point for the given number of apartments
      • The percentage of times the best apartment was picked
        This value per the optimal stopping rule should be approximately 37%. You may have to adjust the number of simulations to produce a more accurate output.
    8. Demo your results to a TA
    9. When you are done with this extension, you must be cleared by the TA to receive credit.
      • Do a Team…Pull to update your repository. You must do this or the commit/push below may fail.
      • Commit and push all your work to your repository.
        Make certain this has worked by logging into bitbucket. There you will see the commit(s) in your news feed if it was successful. You can also check the Source page to locate and ensure your code was received.

        It is your responsibility to make certain the code has been pushed. Some of your work receives credit through testing of your pushed code. You will receive no credit for such work if you failed to push. We generally reserve the right to revoke credit for any of your work that has not been pushed on-time.

      • Fill in the form below with the relevant information
      • Have a TA check your work
      • The TA should check your work and then fill in the TA's name
      • Click OK while the TA watches
      • If you request propagation, it does not happen immediately, but should be posted in the next day or so

      This demo box is for extension 3.10
      Last name WUSTL Key Propagate?
      (NOT your numeric ID) Do not propagate
      lower case only
      e.g. Smith j.smith
      1    

      Acknowledgements and assertion of integrity

      You must select one of the options below
      The work submitted here was performed in accordance with this course's policy on collaboration.
      On your honor, you have neither given nor received any unauthorized aid on this assignment.

      However, the following TAs, students, or professors were supportive in completing this assignment.
      Their help was also in accordance with course policies.

      Thanks to (leave blank if appropriate):

      In spite of seeking help as allowable by this course's policy on collaboration, you were unable to complete this assignment. No credit will be received for this assignment.

      You would like to be contacted by an instructor to facilitate staying on track in this course.

      Comments about this:

      You have NOT abided by this course's policy on collaboration. No credit will be received for this assignment, but by checking this box, no academic integrity violation will be filed for this assignment.

      You would like to be contacted by an intructor to faciliate staying on track in this course.

      Comments about this:


      TAs double check!
      • This demo box is for extension 3.10
      • The student has committed and pushed the work, and verified that it appears at bitbucket.
      TA: Password:

      End of extension 10