Warning: This studio is too long to be finished in the studio time you have today. Please get as far as you can, but going sufficiently slowly that all group members understand what is happening.Authors:Then, on your own, after studio is over, please continue working on the studio. This material is covered on Exam I, so be sure you get to the end of studio work on your own, if necessary, and feel free to work collaboratively on all elements of this studio.
We will propagate code as you direct us in the green box, and the propagation will be done the day after studio.
In this assignment, you study various ways to formulate a bigger array to replace one that has no room left. The point of this work is to understand that the choices you make matter greatly in terms of performance.
What you should learn through this assignment:
- You can use our course infrastructure to measure ticks and time.
Also, because computers are so fast, the time of a relatively brief computation may register as 0, even though the number of ticks is positive.
- Ticks are a reliable way to measure the number of operations in a computation, but you must insert instrumentation (Java code) to count ticks into our code.
- Time is a more realistic measurement, but it varies between computers.
- Allocation of a n integers (or boolean, objects, etc.) does not take constant time. It takes time proportional to n.
In studio this semester, you will be supplying answers to questions posed here. Such questions appear in boxes like this one.You are in a group, but only one of you needs to open and complete the text file. When you demo, you can specify whose repository contains the write up, and we will propagate code and write ups to the other group members.
Ideally, the write up is on your shared monitor if you have one.
So, one of you, please find and open the studio0.txt file in the studiowriteups folder.
The most widely available timer that we can use measures time in milliseconds, or one thousandth of a second. If your computer's clock runs at 3 GHz (billions of cycles per second), then in one millisecond, your computer can execute 3 million instructions. Thus, as much as we would like to measure the time of relatively short programs, unless that short program does at least 3 million things, it won't even take a measurable amount of time.
Let's investigate this.
public void run() { for (int i=0; i < n; ++i) { ticker.tick(); this.value = this.value + i; } }
The particular values of n that are used in the run() method above were generated in the main method by:GenSizes sizes = GenSizes.arithmetic(10000, 100000, 10000);which generates an arithmetic sequence for n, starting at 10,000, up to but not including 100,000, in steps of 10,000. The result is then used by:ExecuteAlgorithm.timeAlgorithm( "linear", "timing.examples.Linear", new IntArrayGenerator(), sizes );which runs the Linear class on the specified sizes, with values for n of 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, and 90000.Throughout this course, and in the text as well, you will see n as the variable we use to describe the size of an input. It's generally a better practice to name a variable such that it corresponds more closely to how it is used. So, while size would have been a better variable name, we use n so that you become accustomed to thinking of the size of an input as n.
For example, we will consider in this course:
- Sorting n numbers
- Adding to the end of a linked list that has n items in it
- Finding the shortest path from one intersection to another in a map that has n street segments
- Operations on a set containing n elements
To see what's there, you can right (control) click on the outputs folder and select Refresh.
- Highlight all data including labels
- Go to the Insert tab on the Excel toolbar
- In the Charts header, click on Scatter Plot and select Scatter with Straight Lines and Markers
- The legend may flip the axes by default, in which the legend names will appear as Series1-5
- To resolve this go to the Chart Tools tab and select Switch Row/Column
The plot depicts the running time of Linear as a function of the input paramter n. Not surprisingly, the plot shows a linear relationship between the input parameter and the number of times ticker.tick() was called.
this.value = this.value + i;n times is indeed a linear function of n.
which implies that the cost of executing the sum and assignment to this.value takes 1 tick (or, constant time)
You should see values at or near 0, because the cost of running this program is so low, it cannot be measured in milliseconds.
Modify the line we saw earlier so that much larger values are generated for n:GenSizes sizes = GenSizes.arithmetic(1000000, 10000000, 1000000);so that the sizes that are used are sufficiently large to generate positive times.If you are unsure of what each parameter means, mouse over the arithmetic method name in eclipse and the JavaDoc will be shown to you.
You may notice in the console output that the Linear program is run several times with the same input value. Each time we run for a given size (value of n), the ticker count should be the same. However, because of other things running on your computer, the time may vary.The code we provide for you runs your run() method several times, and then takes the fastest time found among those runs as the best indicator of the time taken by your program.
Let's try this same thing with a different toy algorithm:
Question A1: What do you see in both plots? Are there any differences between the two? What could account for those differences?Discuss this with your TA.
Question A2: Why do the times provided for Quadratic produce such a nice plot, while the original values of Linear did not?
public void run() { for (int i=0; i < n; ++i) { for (int j=0; j < n; ++j) { ticker.tick(); this.value = this.value + i; } } }
public void run() { ticker.tick(); // i = 0 for (int i=0; i < n; ++i) { ticker.tick(); // i < n ticker.tick(); // j = 0 for (int j=0; j < n; ++j) { ticker.tick(); // j < n // // original one below // ticker.tick(); // add two values this.value = this.value + i; ticker.tick(); // ++j } ticker.tick(); // ++i } }
Question A3: From the runs you have tried so far, how does the placement of ticker.tick() calls affect the plots you see? In particular, do the changes affect the shapes of the curves, the values plotted, or both?
Question A4: In terms of n, how would you characterize, in the most simple terms, the time and ticks curves that have been generated so far?Question A5: What would happen if you deleted all ticker.tick() calls in the innermost loop, while leaving other calls to ticker.tick() that you just placed in run()?
Discuss in your team and with your TA:Throughout this course, you will be asked to place ticker.tick() calls in your code. Placing them everywhere seems like the right thing to do, so that every operation is counted, as shown in the box above.
You could put them everywhere, and the results will be right, but isn't that tedious? Did your added ticker.tick() statements affect the shape of the resulting curves that show ticks or time?
How do you decide where to put them most parsimoniously?
We will soon speak of this as analyzing an algorithm's asymptotic complexity:
- From a practical point of view, it's the part of your algorithm whose time dominates any curve you could generate that reflects the time or ticks taken by your algorithm.
- We will say that constant differences in time really don't matter: they could attributed, for example, to the difference in speed between one computer and another.
- We will care about the shape of the curves that correspond to ticks and time, and we will reason about those curves in terms of their asympototic behavior.
For now, you have to look at the code and reason about where it spends most of its time, as a function of n.
For the run() method we have been considering, most of the time must be spent in the innermost loop, where we have:
this.value = this.value + i;That is why placing just one ticker.tick() there creates operation counts that adequately model the time spent by the entire run() method.
When we look at a line of Java code above, for example:
this.value = this.value + i;we want to reason about how much time it takes to execute that line of code. Our model so far, in terms of ticker.tick() is that it takes one tick, or operation, for such a line of code to execute.
But is this true for all statements? Surely not. For example, the single line of code:
Arrays.sort(array);would sort an array of values, which surely takes more than one operation. In fact, for an array of n values, it would take at least n operations to show the results of the sorted array.
Let's investigate this for allocating an array of integers:
Question B1: What do you see? How do the curves reflect the code inside AddsTwoNumbers?Do you think the value of n matters here in terms of the time it takes to perform the operation? Why or why not?
Question B2: What do the data and plot tell you about the time it takes to allocate an array of n integers?Is it reasonable to say that the line of code
this.array = new int[this.n]takes a constant amount of time, independent of the value of this.n?
Question B3: Do the ticks agree in shape with the time we measured in running the Allocates code?
ticker.tick(n);The above call causes the ticker to register n ticks instead of just 1.
Question B4: Are the plots more similar to each other than before? What does this tell you about how much time it takes to allocate an array of n integers?
So we see (hopefully) that a line of code doesn't always take a constant amount of time. While it was fair to judgethis.value = this.value + i;as taking one tick, it is not fair to say thatthis.array = new int[this.n]takes one tick. Instead we must insist it takes n ticks.Why does the time depend on n? Java must allocate a chunk of memory for the n integers. While that might take constant time, Java must also initialize that chunk of storage to 0. That cannot take constant time: that cost must depend on n.
Throughout this course, you are asked to reason about the time of a computation. Eventually, you must reason about how much time a given operation or statement takes, in terms of the computation's input size n.
You will be producing timing curves, and we will be looking at your code to ensure you have accounted properly for how time is spent.
How you achieve a particular result—how you implement an algorithm—can have a profound impact on the time necessary to obtain that result.
Question B5: Which group do you expect to finish first?Can you formalize, in terms of n the amount of work (ticks) that each group must do to write n in the form required for that group?
Both groups achieve the same result, namely the recording of an integer n.
- Under what circumstances is the decimal notation more efficient than tally marks?
- Under what circumstances is the tally mark notation more efficient than decimal notation? If you are not sure, take a look this.
Let's apply what we have learned to growing an array-based list.
You will see that this is an abstract class, because the method int getNewSize() is missing. This is intentional, because we want to experiment with various ways of replacing the full array, and we do that by specifying the array's new size.We provide two extensions of Rarrays for you:
Each of those classes completes the Rarrays abstract class by providing the missing method int getNewSize().
- Doubling
- causes the new array to be twice the old array's size. This may seem wasteful: a full 1024 cell array would grow to contain 2048 cells just to accommodate, at present, one more cell.
- AddOne
- causes the new array to be one item greater than the previous array's size. This is the least we can do to make room for one new element in an already full array.
Take a look.
public void reset(Ticker ticker) { this.ticker = ticker; this.array = new int[2]; ticker.tick(2); }
For example, if for some reason two elements are insufficent for the array, the following would allow the instance variable to reference an 8-element array:However, reassignment of the array loses the reference to the previous array. Any data in that previous array would be lost.this.array = new int[8];
To preserve information as you provision for a larger array, you must
Your first task is to complete this method until it passes the TestRarrays unit tests.
The TestRarrays contains 3 test cases:
- testInit should work as given to you.
- testGrowPreservesData ensures that you retain information from the old array as you make the newer, larger one.
- testGrowSufficientTicks ensures that you account properly for the ticks taken for allocating and filling the newer, larger array.
Work on this until you get the green bar, indicating all tests are passing.
OK we really don't grow the array, but we continually replace it with a larger one. Let's study how the growth rate of the arrays affects the overall cost of Rarrays.
Question D1: How would you describe the curve you see?As a team, think about possible polynomial functions that could generate such a curve.
- Question D2 Why does your program generate the data you see?
- While the data has a certain disconnected quality to it, let's analyze the plotted data as follows:
- Using any straight edge you can find nearby, see how tightly the straight edge can bound all the points from below.
- Then see if that straight edge can also tightly bound all the points from above.
- Question D3: Describe what you were able to do with the straight edge.
- If you are successful, you see that the points generated by Doubling, while somewhat strange-looking,
- make sense, because the jumps are due to the time spent doubling
- are boundable below by one linear function
- are boundable above by one linear function
You may not be able to complete this part in studio. Take a look at it on your own if necessary.
Now let's look mathematically at how much time is spent growing the list by our two methods: add one, and doubling.
How much time does this take? We copy 1, then 2, then 3, and so on, until we have copied n-1 items to make room for the n^{th} item. If we count the time for allocation, we can extend the sum to n. If T(n) represents the total time taken to achieve an array of n elements by growing one at at time, then
Question E1: What is the closed form solution for T(n)?
Search the web for summation formulas if you need help. You are expected to know such sums in this course.Does it agree with what you have seen empirically?
Question E2: What is the closed-form solution for T(n)?Recalling that n=2^{k}, can you express the result in terms of n?
Based on your analysis and what you have seen, what effect does the growing strategy have on the time necessary to accommodate the array-based list that grew to n items?
Returning to your code, you will see an OurGrowth1 and OurGrowth2 class. Use those to experiment with these ideas and see what results you get.
When you done with this studio, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for studio 0