Authors: David Jurgens and
Ron K. Cytron
Thanks to David Warner and
ACM
Student Chapter for the fortunes used in this lab.
| Lab |
Assigned |
Design Due (In class) 10 AM |
Implement (In Lab) Wednesday |
Demo (In Lab) Wednesday |
Lab Due (In lab) Wednesday |
| | 21 | Jan |
| |
22 | Jan |
29 | Jan |
29 | Jan |
Goals:
By the end of this lab, you should...
- Understand how to organize code into a package.
-
Understand more about the Relation and Set ADTs from
CS 101
-
Understand how to use Java collection objects imported
from
java.util.*
-
Understand the basics of a hash table and the relevance of Java's
hashCode() method.
- Understand basic String processing
- Know how to write console output using
System.out instead
of Transcript.
- Appreciate the value of the Iterator pattern and understand
its usage in Java.
Before starting:
- Read over the entire lab. We really mean it this time.
- Run the sample solution
- Review the
Relation and
Set
ADTs from CS101.
- Browse the documentation , to gain
an understaning of the objects' interrelation as well as their methods.
Helpful resources:
Overview:
In CS101 you studied the Relation and Set ADTs. Relation implements a function
or a map, and a Set is a duplicate-free collection of objects. In this lab
you will use both of those ADTs but in the form of implementations already
available in java.util.
Specifically, you will
implement a KWIC (KeyWord In Context)
object and to place it and its supporting objects in a package
called kwic.
Supose you wanted to search a document for all phrases that
include a given word, say "swordfish". There are two approaches that
could be used in this endeavor:
- Once "swordfish" is supplied, a computer could search the document and
return each phrase that contains that word. Every time a word is supplied,
the entire document is scanned to find matching phrases.
- The document could be preprocessed offline , in anticipation
of the need for search. When "swordfish" is supplied, the result is
already computed and simply returned.
Which approach is best? It depends on
- How stable is the document? Do phrases come and go or is
the document relatively constant?
- How often is the same word supplied for a search? Would "swordfish" be
supplied many times or just once?
In essence, the choice depends on the frequency of insertions and deletions
to the document as compared with the frequency of lookups for its words.
We shall assume that offline preprocessing pays off, and that it would be
expensive to search the document each time a word is supplied. As an
analogy, consider a search using Google .
Imagine how slow it would be for Google to search the entire WWW each time
you ask it to find a word. (Note: It takes about a month for Google to
crawl the web currently!)
As an example for this lab, consider the following phrases:
- Swordfish goes well with pasta; the pasta should not be overcooked.
- The password for entry to the castle is: "swordfish".
- All's well that ends well.
The following table shows how phrases should be returned for words
that might be supplied for KWIC:
| Word | Set of Phrases |
| swordfish |
- Swordfish goes well with pasta; the pasta should not be overcooked.
- The password for entry to the castle is: "swordfish".
|
| Well |
- Swordfish goes well with pasta; the pasta should not be overcooked.
- All's well that ends well.
|
Notice that case and punctuation do not matter in matches, but that the
returned phrases are exactly as they were entered.
Suggested implementation:
- Make a directory for your Lab 2a stuff.
- Save the
Demo.java file
there.
- Make a kwic directory into which your classes will go.
- Save the
WordCanonical.java file there.
- Type in stubs for the other classes found in the
documentation
- Complete and test the
DefaultWordFilter class. You can just return the
input string for now.
- Complete and test the
Word class.
- Complete and test the
Phrase class.
- Complete and test the
KWIC class.
What To Turn In:
For every CS102 lab you turn in, you should fill in
a cover
sheet and staple it on the front of your lab.
Attach a paper printout of the following:
- All classes written for this lab.
- Output from the class test
Demo2a.java .
This must be turned in by
the end of your lab section
on the due date. Check that you have
header information (name, email, date, and lab section) at the top of the
file, and you must have demonstrated your lab to have the printout graded.
If you need
help printing, ask a TA or refer to the help homepage, which has detailed
instructions for how to print from the labs.
Last modified 14:57:02 CST 29 January 2003
by Ron K. Cytron