CSE 131 Module 3: Arrays

Extensions

Extension 1: Pascal's Triangle (3 points):

Authors
Take a look at the Wikipedia page on Pascal's Triangle. We will implement this using a two-dimensional array, so it may be easier to imagine the entries left-justified rather than in triangular form:
        column
        0  1  2  3  4
row
0       1
1       1  1
2       1  2  1
3       1  3  3  1
4       1  4  6  4  1
        .
        .
        .
Viewed as a two-dimensional array, the computation is specified as follows, for each entry at row r and column c:

Procedure

  1. Open the PascalsTriangle class, found in the arrays package of the extensions folder.
  2. Insert code to obtain from the user the value N which is the number of rows you should compute of the triangle.
  3. Instantiate the two-dimensional array needed to hold the results.
    Java lets you do this as a ragged array, where each row can be of a different length.

    You are welcome to instantiate the array that way, but it is simpler and equally acceptable to instantiate the array with the same number of columns in each row. Java syntax for that is much simpler. An example of this appears in SelfAvoidingWalk, found in the book.ch1 package of the book source folder in your repositories.

  4. Compute the triangle as a two-dimensional array and print the results left-justified as shown above.
  5. For extra fun, try to print the triangle centered as shown on the Wikipedia page.
When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.1
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 1


Extension 2: Birthday Problem (5 points):

A big part of Computer Science revolves around finding ways to organize and analyze statistical data. This often includes writing algorithms to sort the information in a useful manner. There are many different algorithms that computer scientists use to sort information. You can learn about a few popular ones below:

Before proceeding, you might think a bit about the above statistics and hypothesize what values you will see in your simulation.

Procedure

To make things simple, let's assume that all months have 31 days.
  1. Open the Birthday class, found in the arrays package of the extensions folder.
  2. Insert code to obtain from the user the number of people N that will enter the room.
  3. For each person, randomly generate a month and day on which that person was born.
  4. Using a two-dimensional array that is indexed by month and by day, keep track of the number of people born on that month and day.
    For example, perhaps you randomly decide the person was born in month 4 and day 28.
  5. After processing all of the N people, iterate over your array to compute:
When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.2
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 2


Extension 3: GC Content (2 points):

Authors
Computer science plays a large role in modern biology. BioJava is a framework for working with biological data. It provides a framework for using sequences and performing common biology functions. In this extension you will be reading in a DNA sequence and computing its GC-Content.

Notes


Procedure

  1. Complete the method percentGC in GCContent.
    Note: BioJava already has a function to get the GC-count (number of G and C nucleotides) of a sequence.

    Do not use this function.

    Instead, devise a way to iterate through the array of characters (representing nucleotides) yourself.

  2. Run the unit test GCContentTest and make sure it passes.
    Most students' code fails when given a string with no DNA in it whatsover (an empty string or "". Its GC content is 0%.
  3. Run GCContent as a Java Application and be prepared to answer questions by the TA at the demo.
When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.3
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 3


Extension 4: Reading Frame of a Sequence (10 points):

Authors
Sequences of nucleotides can be read as sequences of nucleotide triplets, known as codons. This means there are 3 different ways of reading a nucleotide sequence. One starts with the first nucleotide in the sequence, one starts with the second, and one starting with the third, as shown below:

Image from https://wikispaces.psu.edu/display/Biol230WFall09/Protein+Translation

How do we determine which reading frame is correct? Our approach is based on the following observations, which are somewhat simplified to suit our purposes. The interested student is encouraged to take Bio 2960 for a much more thorough and revealing treatment of this material.
Thus, the best reading frame for a sequence of DNA is the offset at which translation would produce the longest chain of amino acids. The DNA sequence that produces that chain begins with the start codon. The start codon and all subsequent codons are interpreted in groups of 3 bases. The sequence contains no stop codon until the end of the translated region. In other words, the translated region may contain multiple start codons in the middle (which are translated as the amino acid methionine), but the region contains no stop codons in the middle.

Your task in this extension is to analyze a sequence of DNA to determine its best reading frame.

Notes

Procedure

  1. The code given to you prompts the user for a genome DNA file, and then reads that file into a char array using methods provided by bioJava.

    If you are uncertain about char arrays, this would be a good time to review the relevant material in your text and in the lecture notes.

  2. Your work takes place in the method bestReadingFrame. Open that file now and take a look at what is provided.

    There are comments in that file to direct your work, which consists of the steps described below.

    You should read these instructions and the comments carefully before you begin. Solutions that fail to follow the instructions will receive no credit!
  3. First, define 3 char arrays, one for each of the possible stop codons: ochre, amber, and opal.
  4. Next, define a char array named methionine for the start codon.
    The most common start codon is Methionine, which is encoded by the DNA sequence AUG. For the purposes of this extension, Methionine is the only start codon you will consider.

    More information about start and stop codons can be found here.

  5. The rest of the code you write will attempt to read the DNA in each of the possible 3 reading frames. Find the best reading frame and return the index at which that best reading frame occurs. Thus, your method will return a value as follows:
    Hint: Scan the DNA one base a time, looking for the longest coding sequence that begins at that base. A coding sequence begins with a start codon and ends at the next stop codon that is found by scanning triplets.

    Based on the index at which the longest coding sequence occurs, compute the corresponding reading frame (0, 1, or 2). That is the value you want to return.

When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.4
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 4


Extension 5: Data Sorting and Analysis (0 points):

Authors
A big part of Computer Science revolves around finding ways to sort data to be organized in a useful manner. There are many different algorithms that computer scientists use to sort information. You can learn about a few popular ones below:
For the purposes of this assignment, we will implement our sorting using a naive process of repeatedly selecting the smallest value from a data set of type int[], and swapping it to the front of the collection. In computer science, this is called a Selection Sort .

Procedure

  1. Open the Sorting class, found in the arrays package of the extensions folder.
  2. Prompt the user for the size of the collection
    If the user enters a negative number, continue to prompt them with a useful message until they enter a positive number.
  3. Continue to prompt the user to input numbers one at a time, using an array to store the input data.
  4. After processing all of the numbers entered, the data will be sorted using the following naive algorithm:
  5. After the sorting the data, we will take advantage of its useful organization to compute the following statistics:
    Mean
    The simple average of the data.
    Median
    The middle value in the ordered dataset (Note: How does this computation vary for even and odd sized datasets?)
    Min
    The smallest value in the ordered dataset.
    Max
    The largest value in the ordered dataset.
    Range
    The difference between the maximum and minimum values of the data set.
  6. Finally, arrange for your output to display the sorted dataset and accompanying statistics in the following manner:
    1 2 3 4 5
    Mean: 3.0
    Median: 3.0
    Min: 1
    Max: 5
    Range: 4
When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.5
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 5


Extension 6: Blackjack! (0 points):

Authors
In this assignment, we will create the game blackjack. To simplify the code, assume that each player is dealt one card at a time and aces are always worth 11.

The following tasks will test your knowledge of

Task 1

  1. Prompt the user for the number of players
  2. Create a new array with size of the number of players
  3. Create a loop that iterates through each player

Task 2

Now we have our array of size players and our loop. We will store the score or count of each player in the array. The loop follows the game mechanics of Blackjack that are explained in wiki. Now we must think of how to randomly draw a card.

Remember that cards 2-9 will count as their number
      ex. 2 diamonds = 2
          7 hearts = 7
10-King all count as 10
      ex. queen spades = 10
Aces count as 11
We have 3 different scenarios that all have a different probability of occurring. How do we perform a random operation in java? (See a TA if you need help)

  1. Create the game mechanics inside the loop
  2. Store each player’s data in the array from task 1
  3. Verify that the array stores each player’s count correctly
  4. Verify that the outcomes have realistic values
    Ex. Very unlikely that each player would draw an ace (have a count of 11)
  5. Recommend to check with TA before moving on

Task 3

In Blackjack, each player is allowed to hit or draw an extra card. Create this mechanic in the game mechanic loop. Hint: Recommend using a while loop for this task

Task 4 If a player’s count exceeds 21, that hand is known as a bust.

  1. Which players bust?
  2. Which player wins the round? (player closest to 21 without going over)
When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.6
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 6


Extension 100: Pascal's Triangle (C++) (5 points):

Authors
Issues: This extension is not available for any credit.
Take a look at the Wikipedia page on Pascal's Triangle. We will implement this using a two-dimensional array, so it may be easier to imagine the entries left-justified rather than in triangular form:
        column
        0  1  2  3  4
row
0       1
1       1  1
2       1  2  1
3       1  3  3  1
4       1  4  6  4  1
        .
        .
        .
Viewed as a two-dimensional array, the computation is specified as follows, for each entry at row r and column c:

Procedure

  1. Create the pascals source file, in the arrays package of the extensions folder.
  2. Insert code to obtain from the user the value N which is the number of rows you should compute of the triangle.
  3. Instantiate the two-dimensional array needed to hold the results.
    C++ will not let you do this as a ragged array, unlike java, where each row can be of a different length. You will have to find a way to create filler values that are ignored when the triangle is printed.

    It is simpler to instantiate the array with the same number of columns in each row.

  4. Compute the triangle as a two-dimensional array and print the results left-justified as shown above.
  5. For extra fun, try to print the triangle centered as shown on the Wikipedia page.
When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.100
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 100


Extension 200: Birthday Problem (C++) (5 points):

Authors
Issues: This extension is not available for any credit.
Some number of people, say N, wander into a room. We are interested in computing the fraction of those people who are:
Before proceeding, you might think a bit about the above statistics and hypothesize what values you will see in your simulation.

Procedure

To make things simple, let's assume that all months have 31 days.
  1. Create the Birthday source file, in the arrays package of the extensions folder.
  2. Insert code to obtain from the user the number of people N that will enter the room.
  3. For each person, randomly generate a month and day on which that person was born.
  4. Using a two-dimensional array that is indexed by month and by day, keep track of the number of people born on that month and day.
    For example, perhaps you randomly decide the person was born in month 4 and day 28.
  5. After processing all of the N people, iterate over your array to compute:
When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.200
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 200


Extension 300: GC Content (C++) (4 points):

Authors
Issues: This extension is not available for any credit.
Computer science plays a large role in modern biology. In this extension you will be reading in a DNA sequence and computing its GC-Content.

Notes


Procedure

  1. Complete the method gcContent.cpp
  2. Write a test to prove to us that your methods work.
  3. Run these tests.
  4. Be prepared to answer questions by the TA at the demo.
When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.300
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 300


Extension 400: Reading Frame of a Sequence (C++) (8 points):

Authors
Issues: This extension is not available for any credit.
Sequences of nucleotides can be read as sequences of nucleotide triplets, known as codons. This means there are 3 different ways of reading a nucleotide sequence. One starts with the first nucleotide in the sequence, one starts with the second, and one starting with the third, as shown below:

Image from https://wikispaces.psu.edu/display/Biol230WFall09/Protein+Translation

How do we determine which reading frame is correct? Our approach is based on the following observations, which are somewhat simplified to suit our purposes. The interested student is encouraged to take Bio 2960 for a much more thorough and revealing treatment of this material.
Thus, the best reading frame for a sequence of DNA is the offset at which translation would produce the longest chain of amino acids. The DNA sequence that produces that chain begins with the start codon. The start codon and all subsequent codons are interpreted in groups of 3 bases. The sequence contains no stop codon until the end of the translated region. In other words, the translated region may contain multiple start codons in the middle (which are translated as the amino acid methionine), but the region contains no stop codons in the middle.

Your task in this extension is to analyze a sequence of DNA to determine its best reading frame.

Notes

Procedure

  1. The code given to you prompts the user for a genome DNA file, and then reads that file into a std::string.

    If you are uncertain about strings in C++, this would be a good time to review as they are very similar to Java. Open that file now and take a look at what is provided.

    There are comments in that file to direct your work, which consists of the steps described below.

    You should read these instructions and the comments carefully before you begin. Solutions that fail to follow the instructions will receive no credit!
  2. First, define 3 char arrays, one for each of the possible stop codons: ochre, amber, and opal.
  3. Next, define a char array named methionine for the start codon.
    The most common start codon is Methionine, which is encoded by the DNA sequence AUG. For the purposes of this extension, Methionine is the only start codon you will consider.

    More information about start and stop codons can be found here.

  4. The rest of the code you write will attempt to read the DNA in each of the possible 3 reading frames. Find the best reading frame and return the index at which that best reading frame occurs. Thus, your method will return a value as follows:
    Hint: Scan the DNA one base a time, looking for the longest coding sequence that begins at that base. A coding sequence begins with a start codon and ends at the next stop codon that is found by scanning triplets.

    Based on the index at which the longest coding sequence occurs, compute the corresponding reading frame (0, 1, or 2). That is the value you want to return.

When you done with this extension, you must be cleared by the TA to receive credit.

This demo box is for extension 3.400
Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Password:

End of extension 400