CSE 132 (Spring 2015)
Lab 3a: Introduction to HaWUp
This lab is new, see below.
Warning: You need Java 8 installed for this lab to work. The Urbauer computers don't seem to have Java 8 so please do this with somebody who has Java 8.

This is a new lab, which has the following implications:

Thanks for your patience!

Introduction

There are many cloud and other distributed or cluster-based services available today that allow a set of tasks to be run concurrently on many processor cores, possibly hosted by multiple, distinct computers. Our Enginering School has such a cluster, and many of us could not do our research expeditiously or reliably without it.

A more global and famous example of such a facility is Hadoop. From its beginnings, this service aimed to provide service that could scale from a single server to thousands of nodes. The service anticipates failure, in which case it can restart a computation on another node.

The most common programming paradigm for services such as Hadoop is the MapReduce paradigm:

As an example, consider summing a million numbers:
Map
The million numbers are distributed among n nodes, so that each node receives approximately 1,000,000/n numbers to be summed.
Reduce
When a node has finished its work, it has a partial sum that must be added to all of the other nodes' sums to obtain the final answer. The reduce step therefore consists of obtaining and summing the nodes' partial sums. This can be done naively or cleverly.
In this portion of the course, you will implement a form of such an infrastructure. The work will be limited to a single machine, but tasks can be mapped across multiple threads. The pedagogical goals of this lab are related to the following questions:

Design

Instead of providing a design for you, we shall develop the design together during lecture. There are a few principles we want to observe:

User Stories

One way to elicit information necessary to create a software project is to ask for a user story. User stories are one tenet of the agile software development process. In such settings, the user resides with the development group and can tell such stories on demand. Following is an example of a user story for this project:
I have workstations that have many cores, each capable of operating independently. Some of my cores are hyperthreaded, allowing even more concurrent execution.

Abstractly, I think of the workstations has having a certain nubmer of nodes, each of which can execute some task independently. I would like to make use of these nodes in running some MapReduce computations. I would like to submit a job, have the job decomposed into independent tasks, and then have those tasks distributed across the nodes of my machine. When the tasks are done, their partial results will be turned into a result for the job by some code I will supply.

This user will have more to say about what he or she wants in the upcoming installments of this lab. For now, we must implement only what has been described in the story. From that story, some unit tests will be written (my job, perhaps yours as well) and you will develop the code to meet those tests. Tempted though you may be, you will not at this point introduce any superfluous features, from the point of view of the user story. This parsimonious development style is also a tenet of the agile software development process.

The Objects and Interfaces

The details of these objects and interfaces will be decided in class, but it seems we need at least the following types for our design, based on the user story (see underlined words above).
HaWUp
is the main class for this project.
Node
is a resource that can execute a sequence of tasks, one after the other, when its run() method is invoked. A Node must execute independently, and so it must be .start()ed in its own Thread.

You have some work to do inside Node but the instantiation and .start()ing of Nodes is done in HaWUp's constructor.

Task<T>
is the smallest unit of execution that is defined for our system. A Task has a run() method that choreographs its execution. The actual work is performed by the abstract taskWork() method.

Each Task has a waitForResult() method that must block until the Task has finished. There are comments in the source file to direct your activity.

PartialResult
is the type of answer computed by each Task. A PartialResult can return its value and can also combine its result with another PartialResult.
Job<T>
is the work to be performed on HaWUp. This object is already programed for you. Each Job has an array of Task<T>s and an array of Nodes. The run() method of a Job distributes the tasks on the nodes, which causes them to begin execution. Its waitForAllTasks waits for tasks to complete, combines their results, and returns the result when everything is done.

Your work for this lab

As described in class, using Java's built-in wait() statement can make code look messy because of the checked exception you must expect.

As in class, you have Wrappers.wait(o) and Wrappers.notify(o) which handle the exception for you, leaving the rest of your code looking nicer. There is also Wrappers.sleep(ms), and all of these are found in the hawup.utils package.

  1. Update your repository to obtain the code for this lab, which will appear in the labs source folder in a package called hawup.

    The unit tests and other tests are found in the hawup.testing package.

    Different computers can operate at different speeds. To show the benefit of hawup, it is helpful to have an example that takes a known amount of time on one Node.

    You will next experiment to determine the appropriate repeat factor so that a computation takes between 45 and 60 seconds.

  2. Open and run the TestSumTakesTime unit test.
    It should fail, unless you are using a computer from the 1950's. But if you were doing that, your computer would be the size of the Urbauer building, and it probably does not have a web browser, so you wouldn't be reading this. Enough said.
    Your task here is to modify the repeatFactor value until the test takes between 45 and 60 seconds. The unit test will check for this.
    Once you have determined the proper value, set the repeatFactor static variable to that value in the SumLoToHi class.
  3. Open and run the TestTimedRunnable. It will fail, and you need to fix it as described in the comments of TimedRunnable's getTime() method, which you will find in the hawup.utils package.
    Using what you learned in class:
    • You must Wrappers.wait(this) in the getTime() method until this.end != null.
    • You must also call Wrappers.notifyAll(this) just after any point in the code where this.end could become non-null.
  4. A similar problem exists in HaWUp, so open and run HaWUpTester. It should fail until you make the appropriate fixes in HaWUp as directed by the comments there.
Once you have finished your work here, and your unit tests are passing, demo your work to a TA.

Demo

When you done with this studio, you must be cleared by the TA to receive credit.


Last name WUSTL Key Propagate?
(or your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others

TA: Make sure they commit their code before you sign them out!

TA: Password:




Last modified 14:57:07 CST 16 February 2015 by Ron K. Cytron