Lecture slides can be found here.Abstract: An Object's hashCode() function is an organizational method that creates unique values to represent that object. As we will see soon in this class, hashCode() is very useful in compartmentalizing objects.
What you should learn through this assignment:
- If object1.equals(object2) object1 and object2 MUST have the same hashCode.
- Except as prescribed by the above point, objects' hashCodes should be as different as possible.
We will begin by exploring the object Color, which has three integer values for its red, green and blue intensities. After running the ColorAndPoint class, which outputs a file studio8p1.csv in your outputs folder (you will have to refresh, as usual) with the hashCodes of 1000 random Color and Point objects.
- We are using Java's standard Color object, which already has its hashCode() and equals() methods defined.
- Those are defined both correctly and with the goal of getting a uniform distribution of hash values for random Color objects.
- The Point object is in the studio8 package, and so it is an object over which we can exercise control about hash values and equality. We will do so.
- But let's first study the Color object.
We would like to obtain a histogram of the Color hash values to see how they are distributed. Here are some instructions for trying to obtain a histogram, but if your version of Excel is uncooperative, then just generate a scatter plot of the values and see if they are distributed uniformly.
Let's try to get histograms working.You might obtain something like the following:
- Excel doesn't just come with histograms already working. That would be too easy and convenient.
- So you have to install the Analysis ToolPak. On the Urbauer computers, the following steps might work. On your computer, give this a try, search for help in Excel, but if it gets too gnarly, just produce a scatter plot of the first column's values.
- From the File tab, open Options.
- Click Add ins
- Choose the Analyis Toolpak and click Go
- Choose the Analysis Toolpak and follow through whatever comes your way.
- When back to the main Excel page, click on the Data tab.
- If you are successful, you may see a Data Analysis tab now. That's good.
- When you click on that you get many choices in a scroll menu, and you should pick Histogram.
- On the Histogram dialog that pops up, enter A2:A1001 for the Input Range. Click Chart Output. and they OK
- The histogram should show in a new sheet of your Excel workspace.
If the above frustrates you, just highlight the column of Color Values and Insert an XY Scatter plot.
Notice the wide distribution of hashCode values for Color. You hopefully see something similar in the plot you have produced.
In the studio8.txt file of the studiowriteups folder, answer the following questions, and all questions that are this kind of a box:What distribution do you see plotted (in the histogram or an XY Scatter plot) for Color hashcodes?
If the distribution were not uniform, what would the plot look like?
What about correctness? As you saw in lecture, if two objects equal each other, they should have the same hashcode. That allows the objects to be tracked correctly in a HashSet implementation.
How many objects (Colors) are added to the set?How many objects are contained in the set after all objects are added?
Why are the above two numbers different?
Are the Point objects' hash values distributed uniformly?
How many Point objects are added to the set?How many Point objects are in the set after all Point objects are added?
What is wrong with this result?
Based on the code you see in Point, how is equality determined between two Point objects?
Looking at the output in the console window, how does the implementation of hashCode() for Point explain the bad behavior in the set of Points?
How well does your hash function fare in terms of uniformly distributed hash values for the random point objects?
By now you hopefully understand that a proper hashCode will give two objects that .equal() each other the same hash value.
Open the Pancake class in eclipse.
You should notice the following about the Pancake class:Discuss with your partner what aspects of the equals() method could help in your design of a Pancake's hashCode(). How can you use the two characteristics of a Pancake to design the most unique HashCode?
- A Pancake has an int representing its radius.
- A Pancake has a boolean representing if it was made with wheat flour.
- The .equals() method is given.
As a team, try out some ideas for Pancake's hashCode() implementation. Record your best idea so far in the studio8.txt file.
How would you analyze the following implementation?
public int hashCode(){
int hash = radius;
if (wheat){
hash = hash + 5;
}
return hash;
}
Record answers to the following in the studio8.txt file:
- If two Pancake objects equal each other, will they get the same hashCode() results?
- How good of a job does the implementation do of obtaining uniform distributions of hashCode() values for two Pancake objects that do not equal each other?
- Just by inspection of the hashCode() method above, state two Pancakes that are different yet have the same hashCode() value.
Your group should now settle on an implementation and write that into the Pancake class.
and paste it into the write up file too please
To see how well hashCode() performs, run the HashCodeRunner class, which outputs studio8p2.csv that records various objects' hashCode() values.
As you have done before, analyze the values in the spreadsheet that are in column B for the Pancake objects.
Do your hashCode() values for Pancake appear to be uniformly distributed?
Brainstorm a good hash function for Syrup, and deploy it in the Syrup class.
- A Syrup has an String representing its Brand.
- A Syrup has a double representing its thickness in density.
- The .equals() method is given.
and paste it into the write up file pleaseRerun HashCodeRunner, refresh outputs and plot the data you see in column C, which are the Syrup hashCode() values.
How uniformly distributed are your Syrup hashCode() values?Finally, take a look at column D. You do not need to rerun anything because this column's values depend on your hashCode() implementations for Pancake and Syrup.
How uniformly distributed are these values?
In class we talked about hash functions that are hard to reverse, such as SHA256.
Investigate using SHA256 as a hash function for the objects you have worked on today.
When you done with this studio, you must be cleared by the TA to receive credit.
- Commit all your work to your repository
- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so
This demo box is for studio 8