column 0 1 2 3 4 row 0 1 1 1 1 2 1 2 1 3 1 3 3 1 4 1 4 6 4 1 . . .Viewed as a two-dimensional array, the computation is specified as follows, for each entry at row r and column c:
Java lets you do this as a ragged array, where each row can be of a different length.You are welcome to instantiate the array that way, but it is simpler and equally acceptable to instantiate the array with the same number of columns in each row. Java syntax for that is much simpler. An example of this appears in SelfAvoidingWalk in your repositories.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
Before proceeding, you might think a bit about the above statistics and hypothesize what values you will see in your simulation.
To make things simple, let's assume that all months have 31 days.
For example, perhaps you randomly decide the person was born in month 4 and day 28.
Be careful! A 1 entry means just one person was born on that particular day. You are looking for the fraction of people who share a birthday with at least one other person.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
Locate the following files in the biojavagc package:
- GCContent.java is the file you complete. When you run it, analysis is performed on four genomes and the results are printed as output.
- GCContentTest.java is a unit test for your work. You will receive credit for this extension only if this test passes.
Consider a bacterium and a phage that might be hosted by that bacterium. It turns out that the DNA of the host and of the phage often need to be similar in terms of their GC content for the bacterium to play host to the phage:
Most bacteriophage and other bacteria are lower in GC content than Salmonella and its relatives, so invading DNA is an obvious target for H-NS. "It's like a primitive immune system," says Fang. "Reduce their expression, and the foreign genes can be tolerated."In other words, if the GC content in the bacterium and phage is dissimilar, the bacterium may be immune to infection by the phage.
From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064161/
Note: BioJava already has a function to get the GC-count (number of G and C nucleotides) of a sequence.Do not use this function.
Instead, devise a way to iterate through the array of characters (representing nucleotides) yourself.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
How do we determine which reading frame is correct? Our approach is based on the following observations, which are somewhat simplified to suit our purposes. The interested student is encouraged to take Bio 2960 for a much more thorough and revealing treatment of this material.
Image from https://wikispaces.psu.edu/display/Biol230WFall09/Protein+Translation
- The sequences shown above are RNA. The corresponding DNA sequences would have a T everywhere you see a U.
- Reading frame 1 begins at the first nucleotide. Interpreting AGT using the genetic code, Serine would be generated as the first amino acid of the resulting protein, if this is the proper reading frame.
- However, if the reading frame should start one base later, then the first DNA triple is GTC, which encodes Valine.
- Finally, if the reading frame should start two bases later, then the first DNA triple is TCT, which encodes Serine. While this protein begins with the same amino acid as in reading frame 1, the next amino acid is different, as shown in the above figure.
- There is no other reading frame of interest. If we considered a reading frame that begins at the fourth nucleotide, that is the same as if the frame began at the first nucleotide.
For example, we expect the start codon to occur with probability 1 in 64.
Thus, the best reading frame for a sequence of DNA is the offset at which translation would produce the longest chain of amino acids.
Your task in this extension is to analyze a sequence of DNA to determine its best reading frame.
Locate the following files in the biofindframe package:
- FindTheFrame.java is the file you complete.
- FindTheFrameTest.java is a unit test for your work. As of this writing, this test is not yet available.
If you are uncertain about char arrays, this would be a good time to review the relevant material in your text and in the lecture notes.
There are comments in that file to direct your work, which consists of the steps described below.
You should read these instructions and the comments carefully before you begin. Solutions that fail to follow the instructions will receive no credit!
The most common start codon is Methionine, which is encoded by the DNA sequence AUG. For the purposes of this extension, Methionine is the only start codon you will consider.More information about start and stop codons can be found here.
Hint: Scan the DNA one base a time, looking for the longest coding sequence that begins at that base. A coding sequence begins with a start codon and ends at the next stop codon that is found by scanning triplets.Based on the index at which the longest coding sequence occurs, compute the corresponding reading frame (0, 1, or 2). That is the value you want to return.
When you done with this extension, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches