CSE 247 Module 0: Introduction

Studio


Racing Arrays

Authors:
Abstract: Arrays are a fundamental and useful data type. Once an array is allocated, it cannot change size. Nonetheless, we can build lists and other data structures using arrays if we are willing to replace an old, full array with a bigger one.

In this assignment, you study various ways to formulate a bigger array to replace one that has no room left. The point of this work is to understand that the choices you make matter greatly in terms of performance.

What you should learn through this assignment:

Throughout this assignment you will be supplying answers to questions posed here. Such questions appear in boxes like this one.

You are in a group, but only one of you needs to open and complete the text file. When you demo, you can specify whose repository contains the write up, and we will propagate code and write ups to the other group members.

Ideally, the write up is on your shared monitor if you have one.

So, one of you, please find and open the studio0.txt file in the studiowriteups folder.

A. Time and Ticks

In this course, we are interested in the time taken to execute an algorithm. We will analyze algorithms mathematically and empirically, but how do we measure time? We will be using two measurements:
Time
It seems most natural to measure time using time. However, on modern computers, most computations are so fast that their time is difficult to measure with any accuracy. But a prof can dream, so we will try to measure time.
Ticks
We can count operations in your program, using a counter that you advance as appropriate in your code. This assignment investigates how you can do that. You will be using this technique throughout this course.
Your computer is busy doing many things. When you run a program on it, that program competes with other programs for your computer's attention.

The most widely available timer that we can use measures time in milliseconds, or one thousandth of a second. If your computer's clock runs at 3 GHz (billions of cycles per second), then in one millisecond, your computer can execute 3 million instructions. Thus, as much as we would like to measure the time of relatively short programs, unless that short program does at least 3 million things, it won't even take a measurable amount of time.

Let's investigate this.

You may notice in the console output that the Linear program is run several times with the same input value. Each time we run for a given size (value of n), the ticker count should be the same. However, because of other things running on your computer, the time may vary.

The code we provide for you runs your run() method several times, and then takes the fastest time found among those runs as the best indicator of the time taken by your program.

Let's try this same thing with a different toy algorithm:

Question A1: What do you see in both plots? Are there any differences between the two? What could account for those differences?

Discuss this with your TA.

Question A2: Why do the times provided for Quadratic produce such a nice plot, while the original values of Linear did not?

  • Look at the run() method of Quadratic, and notice that ticker.tick() is called in just one spot:
    	public void run() {
    		for (int i=0; i < n; ++i) {
    			for (int j=0; j < n; ++j) {
    				ticker.tick();
    				this.value = this.value + i;
    			}
    		}	
    	}
    
  • Each ticker.tick() call models the cost of one operation. In the extreme, you might have arrived at the code shown below, with a ticker.tick() call for each operation performed in your code, as shown in the comments:
    	public void run() {
    		ticker.tick();                      // i = 0
    		for (int i=0; i < n; ++i) {
    			ticker.tick();                  // i < n
    			ticker.tick();                      // j = 0
    			for (int j=0; j < n; ++j) {
    				ticker.tick();                  // j < n
                                    //
                                    // original one below
                                    //
    				ticker.tick();                     // add two values
    				this.value = this.value + i;
    				ticker.tick();                  // ++j
    			}
    			ticker.tick();                  // ++i
    		}	
    	}
    
  • Experiment now by different placements of ticker.tick() throughout the run() method, and run Quadratic to see how the ticks and time change.
    Question A3: From the runs you have tried so far, how does the placement of ticker.tick() calls affect the plots you see? In particular, do the changes affect the shapes of the curves, the values plotted, or both?

    B. Some statements take constant time, some don't

    When we look at a line of Java code above, for example:

       this.value = this.value + i;
    
    we want to reason about how much time it takes to execute that line of code. Our model so far, in terms of ticker.tick() is that it takes one tick, or operation, for such a line of code to execute.

    But is this true for all statements? Surely not. For example, the single line of code:

       Arrays.sort(array);
    
    would sort an array of values, which surely takes more than one operation. In fact, for an array of n values, it would take at least n operations to show the results of the sorted array.

    Let's investigate this for allocating an array of integers:

    Question B1: What do you see? How do the curves reflect the code inside AddsTwoNumbers?

    Do you think the value of n matters here in terms of the time it takes to perform the operation? Why or why not?

  • Now find, open, and run the Allocates class, which generates ticks and time data for allocating an array of n integers, with n varying from 0 to 10000.
  • Refresh the outputs folder and then open the allocates-time.csv file. Plot the runtime against the value of n.
    Question B2: What do the data and plot tell you about the time it takes to allocate an array of n integers?

    Is it reasonable to say that the line of code

         this.array = new int[this.n]
    
    takes a constant amount of time, independent of the value of this.n?
  • Now open and plot the data in allocates-ticks.csv.
    Question B3: Do the ticks agree in shape with the time we measured in running the Allocates code?
  • There is another call you can make on a ticker which is
       ticker.tick(n);
    
    The above call causes the ticker to register n ticks instead of just 1.
  • Modify the call to ticker in Allocates so that it registers this.n ticks for the new int[this.n] allocation.
  • Rerun Allocates and refresh the outputs folder.
  • Generate plots for allocates-ticks.csv and allocates-time.csv.
    Question B4: Are the plots more similar to each other than before? What does this tell you about how much time it takes to allocate an array of n integers?
    So we see (hopefully) that a line of code doesn't always take a constant amount of time. While it was fair to judge
       this.value = this.value + i;
    
    as taking one tick, it is not fair to say that
         this.array = new int[this.n]
    
    takes one tick. Instead we must insist it takes n ticks.

    Why does the time depend on n? Java must allocate a chunk of memory for the n integers. While that might take constant time, Java must also initialize that chunk of storage to 0. That cannot take constant time: that cost must depend on n.

    Throughout this course, you are asked to reason about the time of a computation. Eventually, you must reason about how much time a given operation or statement takes, in terms of the computation's input size n.

    You will be producing timing curves, and we will be looking at your code to ensure you have accounted properly for how time is spent.

  • Perhaps the most important message of this course is the following:
    How you achieve a particular result—how you implement an algorithm—can have a profound impact on the time necessary to obtain that result.
  • To continue with this idea, perform the following exercise in your group:
    Question B5: Which group do you expect to finish first?

    Can you formalize, in terms of n the amount of work (ticks) that each group must do to write n in the form required for that group?

    Both groups achieve the same result, namely the recording of an integer n.


    C. Growing a List

    Let's apply what we have learned to growing an array-based list.


    D. How much does it cost to grow the array?

    OK we really don't grow the array, but we continually replace it with a larger one. Let's study how the growth rate of the arrays affects the overall cost of Rarrays.

    You may not be able to complete this part in studio. Take a look at it on your own if necessary.

    E. Mathematical analysis

    Now let's look mathematically at how much time is spent growing the list by our two methods: add one, and doubling.

    AddOne
    Here, we reach a total of n cells in the array, growing by one cell each time, and copying the old array into the slightly larger new array.

    How much time does this take? We copy 1, then 2, then 3, and so on, until we have copied n-1 items to make room for the nth item. If we count the time for allocation, we can extend the sum to n. If T(n) represents the total time taken to achieve an array of n elements by growing one at at time, then

    T(n) = 1 + 2 + … n = i = 1 n i
    Question E1: What is the closed form solution for T(n)?
    Search the web for summation formulas if you need help. You are expected to know such sums in this course.
    Does it agree with what you have seen empirically?
    Doubling
    Here, we reach a total of n cells. Let's assume n=2k for some k, so that we have performed work:
    T(n) = 1 + 2 + 4 + … 2k = i = 0 k 2 i
    Question E2: What is the closed-form solution for T(n)?

    Recalling that n=2k, can you express the result in terms of n?

    Based on your analysis and what you have seen, what effect does the growing strategy have on the time necessary to accommodate the array-based list that grew to n items?

    Further exploration

    Based on what you have seen emprically and shown mathematically, what do you think would happen if you tried the following strategies for increasing the size of the current array as it fills:

    Returning to your code, you will see an OurGrowth1 and OurGrowth2 class. Use those to experiment with these ideas and see what results you get.


    Submitting your work (read carefully)



    Last modified 14:54:18 CDT 31 October 2016
    When you done with this studio, you must be cleared by the TA to receive credit.

    This demo box is for studio 0
    Last name WUSTL Key Propagate?
    (NOT your numeric ID) Do not propagate
    e.g. Smith j.smith
    1 Copy from 1 to all others
    2 Copy from 2 to all others
    3 Copy from 3 to all others
    4 Copy from 4 to all others

    TA: Password: