column
0 1 2 3 4
row
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
.
.
.
Viewed as a two-dimensional array, the computation is specified
as follows, for each entry at row r and column c:
Java lets you do this as a ragged array, where each row can be of a different length.You are welcome to instantiate the array that way, but it is simpler and equally acceptable to instantiate the array with the same number of columns in each row. Java syntax for that is much simpler. An example of this appears in SelfAvoidingWalk, found in the book.ch1 package of the book source folder in your repositories.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.1
A big part of Computer Science revolves around finding ways to organize and analyze statistical data. This often includes writing algorithms to sort the information in a useful manner. There are many different algorithms that computer scientists use to sort information. You can learn about a few popular ones below:
Before proceeding, you might think a bit about the above statistics and hypothesize what values you will see in your simulation.
To make things simple, let's assume that all months have 31 days.
For example, perhaps you randomly decide the person was born in month 4 and day 28.
Be careful! A 1 entry means just one person was born on that particular day. You are looking for the fraction of people who share a birthday with at least one other person.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.2
Locate the following files in the biojavagc package:
- GCContent.java is the file you complete. When you run it, analysis is performed on four genomes and the results are printed as output.
- GCContentTest.java is a unit test for your work. You will receive credit for this extension only if this test passes.
Consider a bacterium and a phage that might be hosted by that bacterium. It turns out that the DNA of the host and of the phage often need to be similar in terms of their GC content for the bacterium to play host to the phage:
Most bacteriophage and other bacteria are lower in GC content than Salmonella and its relatives, so invading DNA is an obvious target for H-NS. "It's like a primitive immune system," says Fang. "Reduce their expression, and the foreign genes can be tolerated."In other words, if the GC content in the bacterium and phage is dissimilar, the bacterium may be immune to infection by the phage.
From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064161/
Note: BioJava already has a function to get the GC-count (number of G and C nucleotides) of a sequence.Do not use this function.
Instead, devise a way to iterate through the array of characters (representing nucleotides) yourself.
Most students' code fails when given a string with no DNA in it whatsover (an empty string or "". Its GC content is 0%.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.3
How do we determine which reading frame is correct? Our approach is based on the following observations, which are somewhat simplified to suit our purposes. The interested student is encouraged to take Bio 2960 for a much more thorough and revealing treatment of this material.
Image from https://wikispaces.psu.edu/display/Biol230WFall09/Protein+Translation
- The sequences shown above are RNA. The corresponding DNA sequences would have a T everywhere you see a U.
- Reading frame 1 begins at the first nucleotide. Interpreting AGT using the genetic code, Serine would be generated as the first amino acid of the resulting protein, if this is the proper reading frame.
- However, if the reading frame should start one base later, then the first DNA triple is GTC, which encodes Valine.
- Finally, if the reading frame should start two bases later, then the first DNA triple is TCT, which encodes Serine. While this protein begins with the same amino acid as in reading frame 1, the next amino acid is different, as shown in the above figure.
- There is no other reading frame of interest. If we considered a reading frame that begins at the fourth nucleotide, that is the same as if the frame began at the first nucleotide.
- Important: In the picture above, the reading frames are denoted 1, 2, or 3. Keeping with computer science custom, you must compute these as 0, 1, or 2. The unit test will not work if you fail to follow this convention!
Note that it is the first stop codon that ends the translation. While the translated region could contain other start codons, which translate as the amino acid methionine, any stop codon encounered after the start codon ends the translation process.
For example, we expect the start codon to occur with probability 1 in 64.
Thus, the best reading frame for a sequence of DNA is the offset at which translation would produce the longest chain of amino acids. The DNA sequence that produces that chain begins with the start codon. The start codon and all subsequent codons are interpreted in groups of 3 bases. The sequence contains no stop codon until the end of the translated region. In other words, the translated region may contain multiple start codons in the middle (which are translated as the amino acid methionine), but the region contains no stop codons in the middle.
Your task in this extension is to analyze a sequence of DNA to determine its best reading frame.
Locate the following files in the biofindframe package:
- FindTheFrame.java is the file you complete.
- FindTheFrameTest.java is a unit test for your work. You will receive credit for this extension if this test (usually) passes.
If you are uncertain about char arrays, this would be a good time to review the relevant material in your text and in the lecture notes.
There are comments in that file to direct your work, which consists of the steps described below.
You should read these instructions and the comments carefully before you begin. Solutions that fail to follow the instructions will receive no credit!
The most common start codon is Methionine, which is encoded by the DNA sequence AUG. For the purposes of this extension, Methionine is the only start codon you will consider.More information about start and stop codons can be found here.
Hint: Scan the DNA one base a time, looking for the longest coding sequence that begins at that base. A coding sequence begins with a start codon and ends at the next stop codon that is found by scanning triplets.Based on the index at which the longest coding sequence occurs, compute the corresponding reading frame (0, 1, or 2). That is the value you want to return.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.4
For the purposes of this assignment, we will implement our sorting using a naive process of repeatedly selecting the smallest value from a data set of type int[], and swapping it to the front of the collection. In computer science, this is called a Selection Sort .
If the user enters a negative number, continue to prompt them with a useful message until they enter a positive number.
The unsorted portion will always be the sub-array with indexes sortCount to size inclusive.
1 2 3 4 5
Mean: 3.0
Median: 3.0
Min: 1
Max: 5
Range: 4
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.5
The following tasks will test your knowledge of
Task 1
Task 2
Now we have our array of size players and our loop. We will store the score or count of each player in the array. The loop follows the game mechanics of Blackjack that are explained in wiki. Now we must think of how to randomly draw a card.
Remember that cards 2-9 will count as their numberWe have 3 different scenarios that all have a different probability of occurring. How do we perform a random operation in java? (See a TA if you need help)
ex. 2 diamonds = 2
7 hearts = 7
10-King all count as 10
ex. queen spades = 10
Aces count as 11
Task 3
In Blackjack, each player is allowed to hit or draw an extra card. Create this mechanic in the game mechanic loop. Hint: Recommend using a while loop for this task
Task 4 If a player’s count exceeds 21, that hand is known as a bust.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.6
Issues: This extension is not available for any credit.Take a look at the Wikipedia page on Pascal's Triangle. We will implement this using a two-dimensional array, so it may be easier to imagine the entries left-justified rather than in triangular form:
column
0 1 2 3 4
row
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
.
.
.
Viewed as a two-dimensional array, the computation is specified
as follows, for each entry at row r and column c:
C++ will not let you do this as a ragged array, unlike java, where each row can be of a different length. You will have to find a way to create filler values that are ignored when the triangle is printed.It is simpler to instantiate the array with the same number of columns in each row.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.100
Issues: This extension is not available for any credit.Some number of people, say N, wander into a room. We are interested in computing the fraction of those people who are:
Before proceeding, you might think a bit about the above statistics and hypothesize what values you will see in your simulation.
To make things simple, let's assume that all months have 31 days.
For example, perhaps you randomly decide the person was born in month 4 and day 28.
Be careful! A 1 entry means just one person was born on that particular day. You are looking for the fraction of people who share a birthday with at least one other person.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.200
Issues: This extension is not available for any credit.Computer science plays a large role in modern biology. In this extension you will be reading in a DNA sequence and computing its GC-Content.
Locate the following files in the fastaReader directory:
- fastaReader.h This header, combined with the source file, has been provided for you in order to allow you to use the .fasta files.
- fastaReader.cpp This file has only one function readFasta which takes a filename and a string, the file is read into the string and returned to you for use.
- cgContent.cpp This is where your extension code and testing goes.
Consider a bacterium and a phage that might be hosted by that bacterium. It turns out that the DNA of the host and of the phage often need to be similar in terms of their GC content for the bacterium to play host to the phage:
Most bacteriophage and other bacteria are lower in GC content than Salmonella and its relatives, so invading DNA is an obvious target for H-NS. "It's like a primitive immune system," says Fang. "Reduce their expression, and the foreign genes can be tolerated."In other words, if the GC content in the bacterium and phage is dissimilar, the bacterium may be immune to infection by the phage.
From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064161/
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.300
Issues: This extension is not available for any credit.Sequences of nucleotides can be read as sequences of nucleotide triplets, known as codons. This means there are 3 different ways of reading a nucleotide sequence. One starts with the first nucleotide in the sequence, one starts with the second, and one starting with the third, as shown below:
How do we determine which reading frame is correct? Our approach is based on the following observations, which are somewhat simplified to suit our purposes. The interested student is encouraged to take Bio 2960 for a much more thorough and revealing treatment of this material.
Image from https://wikispaces.psu.edu/display/Biol230WFall09/Protein+Translation
- The sequences shown above are RNA. The corresponding DNA sequences would have a T everywhere you see a U.
- Reading frame 1 begins at the first nucleotide. Interpreting AGT using the genetic code, Serine would be generated as the first amino acid of the resulting protein, if this is the proper reading frame.
- However, if the reading frame should start one base later, then the first DNA triple is GTC, which encodes Valine.
- Finally, if the reading frame should start two bases later, then the first DNA triple is TCT, which encodes Serine. While this protein begins with the same amino acid as in reading frame 1, the next amino acid is different, as shown in the above figure.
- There is no other reading frame of interest. If we considered a reading frame that begins at the fourth nucleotide, that is the same as if the frame began at the first nucleotide.
- Important: In the picture above, the reading frames are denoted 1, 2, or 3. Keeping with computer science custom, you must compute these as 0, 1, or 2. The unit test will not work if you fail to follow this convention!
Note that it is the first stop codon that ends the translation. While the translated region could contain other start codons, which translate as the amino acid methionine, any stop codon encounered after the start codon ends the translation process.
For example, we expect the start codon to occur with probability 1 in 64.
Thus, the best reading frame for a sequence of DNA is the offset at which translation would produce the longest chain of amino acids. The DNA sequence that produces that chain begins with the start codon. The start codon and all subsequent codons are interpreted in groups of 3 bases. The sequence contains no stop codon until the end of the translated region. In other words, the translated region may contain multiple start codons in the middle (which are translated as the amino acid methionine), but the region contains no stop codons in the middle.
Your task in this extension is to analyze a sequence of DNA to determine its best reading frame.
Locate the following files in the biofindframe folder:
- FindTheFrame.cpp is the file you complete.
- fastaReader.cpp and fastaReader.h are files for reading in teh provided fasta genome files.
If you are uncertain about strings in C++, this would be a good time to review as they are very similar to Java. Open that file now and take a look at what is provided.
There are comments in that file to direct your work, which consists of the steps described below.
You should read these instructions and the comments carefully before you begin. Solutions that fail to follow the instructions will receive no credit!
The most common start codon is Methionine, which is encoded by the DNA sequence AUG. For the purposes of this extension, Methionine is the only start codon you will consider.More information about start and stop codons can be found here.
Hint: Scan the DNA one base a time, looking for the longest coding sequence that begins at that base. A coding sequence begins with a start codon and ends at the next stop codon that is found by scanning triplets.Based on the index at which the longest coding sequence occurs, compute the corresponding reading frame (0, 1, or 2). That is the value you want to return.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for extension 3.400