CS102 Lab 4:
File I/O, Parsing, and URL connections
Assigned: Tuesday, March 16
Demonstration (5 points) in lab section: Monday, March 29
Hard copy of code (15 points) due to the CS102 mailboxes: Tuesday, March 30
Goals:
By the end of this lab, you should...
- have experience writing an application that takes its
input from text files.
- understand how to process an input text stream by parsing it into
tokens and processing the tokens.
- be familiar with some of the I/O support provided by the
package java.io.
- have experience creating an Applet that reads data from input streams
for URL connections,
- know how to issue simple commands to a web browser from within an
applet.
Motivation and Overview:
As described in lecture, Java provides excellent support for applications
that want to save objects
to persistent storage and load them back later. However, when
data needs to be extracted from human-generated text, the problem is
harder to automate, since the program reading the data did not
generate it. Therefore, rather than knowing ahead of time exactly
what to expect, it is necessary to read through the data in order
to determine what is there and to pick out what is useful.
In lecture, we saw that the java.io.StreamTokenizer provides useful
support for text parsing by
breaking up an input stream into its logical units
called tokens. However, having a stream of tokens is not the end of
the story, since the application must still decide what to do with them.
In this lab, you will gain some experience with file I/O, text parsing,
Applets, URL connections, and web browser control
by writing an Applet that reads an .html file and generates a
user-selectable list of hypertext links available in the file.
When the user selects from the list, the document corresponding to the
chosen link will be shown in
a separate frame within the browser and the same process will be repeated
on the selected link.
Before Starting:
Before beginning this lab, you should familiarize yourself with certain
parts of some Java packages.
- Become familiar with the API provided by the java.io package.
Pay particular attention to the classes seen in lecture, such as
File, InputStream, FileInputStream, Reader, FileReader, and StreamTokenizer.
- Read over the methods provided by the String class.
It provides a lot of methods you may find handy for completing this assignment.
- Look at the classes URL and URLconnection in the java.net package.
- Review the Applet class and related classes
in the java.applet package, especially the getAppletContext method in
Applet and the showDocument method in AppletContext.
- Look at the List class and the ItemListener interface
in the java.awt package. While you're there, also
review the TextField class,
particularly the addActionListener method.
- Before doing the second extra credit option,
look at the class Stack in the java.util package.
Assumptions:
We will make the following assumptions about
the names and contents of the files you will be parsing.
-
All of the actual files you will parse will end with the ".html" suffix.
However, the link names typed into your user interface
or found within the files themselves may not
show the suffix explicitly. If a link ends with a "/", you should
append the string "index.html" before processing.
If a link does not end with a "/" and also does not end with ".html", you
should append the string "/index.html" before processing.
-
If you are currently looking at a page whose full pathname is
"http://www.foo.bar/abc/nonsense.html", then
the path "http://www.foo.bar/abc/"
is considered to be current directory URL.
-
If a link does not begin with "http://" then it is a relative link,
meaning that you should prefix it with the current directory URL before
processing.
Part I: Finding links in HTML files
Create a Cafe project of type "application."
As usual, create a file called Lab4.java and
a Startup.java file. The method main in
Lab4 should create a thread out of an instance of Startup and
start the thread running.
Begin by
creating a class called Parser with the following methods.
- The method
getTokenizer takes a filename as its parameter
and returns a StreamTokenizer that is positioned at the beginning of
the given file.
- The method
tokenMatch takes a StreamTokenizer and
a String targetWord as parameters. It should return true if the
next token in the stream matches the given targetWord and false
otherwise. Note that there are two ways a match can occur: either
the next token has token type TT_WORD and the token's sval
is the same as the targetWord, OR targetWord contains only one
character and that character matches the current token type.
- The method
consumeThroughPatternMatch takes a
StreamTokenizer and an array of strings as its two parameters.
It should read tokens until seeing a sequence of tokens that match
the sequence of strings in the provided array (in order). For example,
if the provided array is {"<","a","href","="} then if the
method were called with the StreamTokenizer at the start of the
following file, then it would leave the StreamTokenizer positioned
just before the text "http://students.cec.wustl.edu/~xyz99"
In general, it is possible to consume the entire stream before finding a
match for the given pattern, in which case the method should
return with the StreamTokenizer positioned at the end of the file.
<HTML>
<TITLE>My Award-Winning Home Page<TITLE>
<H3>My Home Page</H3>
This page is under construction.
<a hbuf+"http://www.goofy.com">
<P>
See my <a href="http://students.cec.wustl.edu/~xyz99">friend's
home page</a>.
Also, you can look at <a href="myFavoriteThings/foo.html">my favorite things.</a>
</HTML>
- The method
getNextQuotedString should take a
StreamTokenizer its a parameter and keep reading tokens until reaching
one whose token type is '"'. It should then return the sval
for that token. Again, EOF may be reached before a matching token is found.
- The method
getTitle should take a
StreamTokenizer its a parameter and should extract the title from the
input stream. For example, for the file given above, the method would
return the String "My Award-Winning Home Page" as its result.
- The method
getLinks should take a
StreamTokenizer its a parameter and should return a Vector that
contains all the links in the remainder of the input stream.
For example, for the file given above, the method would
return a vector containing two String objects:
http://students.cec.wustl.edu/~xyz99 and
myFavoriteThings/foo.html
Thoroughly test your methods from within the run()
method in Startup. You can create your own test files, but
you should also find some HTML files on the web, save them locally,
and test your methods on them as well.
Part II. Add a simple user interface
To create a user interface for your application,
create a Panel that contains two components:
- A TextField into which a user can type a file name.
- A List in which you'll display all the links in the file.
Register an ActionListener to the TextField, so that when the user
hits the enter (return) key on the keyboard, the file with the given
name is parsed and the list of links is displayed.
In your run() method in Startup,
put the Panel into a Frame and display it on the screen.
Test thoroughly.
Part III. A Navigation Applet
In this part of the lab, you'll use the code you wrote in Parts I and II
to create an Applet that works as follows:
- The user types a URL in the TextField and presses the enter key.
- The Applet opens an input stream for the given URL, extracts all
the links from the file, and displays them in the List. In addition,
the browser is instructed to show the document at the given URL in
another frame (which we'll call the target frame).
- The user either types another URL and hits enter OR the user
selects one of the items from the list.
The typed or selected item is processed as the next input file.
If it was a selected item,
the full URL should be displayed in the TextField
(even if the link in the list is a relative link).
First, modify your code to read its input from a URLconnection instead of from
a local file.
Then, create a new Cafe project of type "applet", and create as its main
html file, a file something like
Lab4frames.html that creates
two frames in the browser. For the contents of
the first frame, you should create a file
Sitemap.html in
which to display your applet. The other frame will be used by the
applet when the user requests a URL.
Create an Applet class whose init() method creates an instance of
your Startup class, and starts it running in a thread.
Modify your code so that when a URL is selected (either typed or
chosen from the list) by a user, your program will instruct the
browser to display that URL as a document in the target frame.
You will need to register an ItemListener to the List object to
be informed of any ItemEvent that occurs when the user selects
from the list.
Before testing,
be sure to set the permissions appropriately. (The .class files need
to be world readable. The directory itself
needs to be both world readable and world executable.)
Test thoroughly. Remember that your applet will only have permission
to open URL connections to the web server from which it was loaded,
so you won't be able to test your applet on web sites outside of CEC.
Extra-Credit Features:
- In your user interface, get the title from
each HTML file in the list, and display that information in the list
instead of the URL. For each file that doesn't contain title information,
just show the URL instead. (This feature
is worth 1 extra credit point if completed
as specified.)
-
Add forward and back buttons to the GUI that allow the user to go back
to previously selected URLs and also to go forward.
Don't parse the file again when you go back (or forward).
Instead, save the Vector of URLs on a stack. Use two Stack objects,
one for the forward direction and one for the backward direction.
Be sure that the stacks are initialized at appropriate times and that
the forward and back buttons are enabled only when appropriate.
For example, if the user types in a new URL or makes a selection,
then the forward button should no longer be enabled (and the forward
stack should be made empty). (This feature is worth 2 extra credit points if
completed as specified.)
Demonstration:
In your lab section on March 29, you will demonstrate your complete
working lab.
Have a completed CS102
cover sheet ready for the TA to record your demo grade and demo comments (what worked and didn't work).
Hard Copy:
After your demo, clean up your code, add documentation,
and make it beautiful. If there was a problem during the demo,
you should mark on your code where you think the problem is.
If you have time, you can try fixing it, and you should describe how you
fixed it and whether or not you were able to get it to work.
You may replace your demo grade by doing another demo only if
you use a late coupon or a rewrite coupon.
By 5:00pm on March 30, turn in your cover sheet (with the demo grade recorded on it) and a printed copy of all your code to the CS102 mailbox.
Kenneth J. Goldman (kjg@cs.wustl.edu)