CSE 247 Module 8: Hashcodes

Studio


Hashing
Authors: Yonatan David and Ron Cytron

Lecture slides can be found here.
Abstract: An Object's hashCode() function is an organizational method that creates unique values to represent that object. As we will see soon in this class, hashCode() is very useful in compartmentalizing objects.
What you should learn through this assignment:


Object equality and hashcode values

Let's look a little deeper at the first rule of hashCode. Begin by updating your repositories and opening the studio8 package in your studios source folder.

We will begin by exploring the object Color, which has three integer values for its red, green and blue intensities. After running the ColorAndPoint class, which outputs a file studio8p1.csv in your outputs folder (you will have to refresh, as usual) with the hashCodes of 1000 random Color and Point objects.

We would like to obtain a histogram of the Color hash values to see how they are distributed. Here are some instructions for trying to obtain a histogram, but if your version of Excel is uncooperative, then just generate a scatter plot of the values and see if they are distributed uniformly.

Let's try to get histograms working.

If the above frustrates you, just highlight the column of Color Values and Insert an XY Scatter plot.

You might obtain something like the following:
Color Object Graph

Notice the wide distribution of hashCode values for Color. You hopefully see something similar in the plot you have produced.

In the studio8.txt file of the studiowriteups folder, answer the following questions, and all questions that are this kind of a box:

What distribution do you see plotted (in the histogram or an XY Scatter plot) for Color hashcodes?

If the distribution were not uniform, what would the plot look like?

What about correctness? As you saw in lecture, if two objects equal each other, they should have the same hashcode. That allows the objects to be tracked correctly in a HashSet implementation.

How many objects (Colors) are added to the set?

How many objects are contained in the set after all objects are added?

Why are the above two numbers different?


By now you hopefully understand that a proper hashCode will give two objects that .equal() each other the same hash value.

Designing your own hash function for an object

You will next design your own hashCode for three different objects.

Open the Pancake class in eclipse.

You should notice the following about the Pancake class:

  1. A Pancake has an int representing its radius.
  2. A Pancake has a boolean representing if it was made with wheat flour.
  3. The .equals() method is given.

Discuss with your partner what aspects of the equals() method could help in your design of a Pancake's hashCode(). How can you use the two characteristics of a Pancake to design the most unique HashCode?
As a team, try out some ideas for Pancake's hashCode() implementation. Record your best idea so far in the studio8.txt file.

How would you analyze the following implementation?

	public int hashCode(){		
		int hash = radius;		
		if (wheat){				
			hash = hash + 5;	
		}						
		return hash;			
	}							
	
Record answers to the following in the studio8.txt file:

Your group should now settle on an implementation and write that into the Pancake class.

and paste it into the write up file too please

To see how well hashCode() performs, run the HashCodeRunner class, which outputs studio8p2.csv that records various objects' hashCode() values.

As you have done before, analyze the values in the spreadsheet that are in column B for the Pancake objects.

Do your hashCode() values for Pancake appear to be uniformly distributed?


Now continue with the Syrup class:

  1. A Syrup has an String representing its Brand.
  2. A Syrup has a double representing its thickness in density.
  3. The .equals() method is given.

Brainstorm a good hash function for Syrup, and deploy it in the Syrup class.
and paste it into the write up file please
Rerun HashCodeRunner, refresh outputs and plot the data you see in column C, which are the Syrup hashCode() values.
How uniformly distributed are your Syrup hashCode() values?
Finally, take a look at column D. You do not need to rerun anything because this column's values depend on your hashCode() implementations for Pancake and Syrup.
How uniformly distributed are these values?


If you have time, please use it to:

Further Exploration

You can imagine that if hashCode() is known, that an adversary could create objects that all have the same hashCode(), thus ruining the beautiful asymptotic behavior of a hash table where the simple uniform hashing assumption holds.

In class we talked about hash functions that are hard to reverse, such as SHA256.

Investigate using SHA256 as a hash function for the objects you have worked on today.


Submitting your work (read carefully)



Last modified 14:54:21 CDT 31 October 2016
When you done with this studio, you must be cleared by the TA to receive credit.

This demo box is for studio 8
Last name WUSTL Key Propagate?
(NOT your numeric ID) Do not propagate
e.g. Smith j.smith
1 Copy from 1 to all others
2 Copy from 2 to all others
3 Copy from 3 to all others
4 Copy from 4 to all others

TA: Password: