| Lab |
Assigned |
Lab Due (Start of class) |
| | 16 | Jan |
6 | Feb |
My output for the problems can be found here
Abstract:
In this assignment you will implement a simple finite-state transducer, which uses the
actions of a finite-state automaton to perform a simple language translation.
You will also
gain experience with CUP (Constructor of
Useful Parsers)
notation.
Specifically,
you will design and implement a deterministic finite-state automaton
that recognizes fully-declared right-linear CUP grammar specifications.
The output (transduction) from your program consists of
node and edge declarations for
a finite-state automaton.
As
with each machine problem assigned, you will find
a TestFiles subdirectory.
That directory contains
a collection of files you can use to test your solution.
While your solution is expected to work on these inputs, your code will be
evaluated on other, secret inputs. You are encouraged to test your program
to ensure that it operates reasonably and will pass muster with the our secret
test files.
Behold the general structure of a CUP specification:
| Section | Example |
| Symbol Declarations |
terminal stringtype lparen, rparen, digit;
non terminal inttype Number;
non terminal stringtype Parens;
|
Start Declaration |
start with Number; |
Productions |
Number ::= Number digit
| digit
| Parens
;
Parens ::= lparen rparen;
|
The phrases terminal, non terminal,
and start with
indicate the declaration of terminal and nonterminal symbols, respectively.
The word immediately following each of those is not a symbol to be declared,
but is instead the semantic type of the symbol.
So, in the example above, stringtype and inttype are
object types for the symbols declared on those lines.
The distributon of white space is arbitrary, but the above examples
were constructed to show good formatting style.
Upon encountering a terminal symbol, your code should record the symbol
as terminal in the symbol table by issuing:
symboltable.enterTerminal(symbol)
Similarly, nonterminals are recorded by issuing:
symboltable.enterNonTerminal(symbol)
I'm assuming you will want to use
Eclipse for development this
semester. I'm relatively new to it, but the following instructions worked
for me. Any advice is appreciated. If you'd rather do this on Linux
without eclipse please let me know and I'll post instructions
on how to do it that way.
Eclipse instructions:
- The files for this assignment are in
this zip file.
Click on the link and save the archive file somewhere handy.
- After launching eclipse, create a New Java Project and call
it something meaningful (like CSE431 hw1 or Best Lab Ever).
- With that project selected in the Package Explorer, click
File → Import …
- You want to import an archive file, so look for this in the types
of files you can import. For me, it is under the General folder.
- Let Eclipse import all the files in the zip you downloaded in step 1.
- Eclipse should automatically build the project by compiling
all of the source java files. You will see some warnings
issued from the compilation, but you should ignore them (silly
Eclipse! Those warnings are from automatically generated code).
- To run the application, you will have to provide a command-line
argument to specify the file to be processed by the application.
- Click Run → Run …
- Configure the run as a Java Application
- Click on the Arguments tab
- Fill inthe Program arguments with a file you
want to process, such as TestFiles/good1.cup
- Click Apply and Run
For this assignment, you will modify only the file
Fsa.java, and that
is the only file you will turn in.
In your submitted file, be sure to document your approach, including how
you decided to treat keywords such as non, with, etc.
When you are done, mail only your
Fsa.java file (and no other files) to legrand@cs.wustl.edu.
You can send it as a single, solitary attachment using pine (or
just about any mail reader), or you can use the Linux
incantation:
mail legrand@cs.wustl.edu < Fsa.java
which runs the mail program and sends it Fsa.java as standard input.
The file as supplied contains a skeletal
finite-state machine (simulator), driven by the tables GOTO[][] and
ACTION[][], as described in class.
Your assignment is to expand and fill in these tables appropriately,
and to add code in the switch statement to perform the
required translation.
While processing the rules section, you will check each production for
right-linear form. That is, there should be only nonterminals on the
left side of a rule, and the right side of a rule should be
a terminal followed by at most one nonterminal.
To determine the type of a symbol previously entered into the
symbol table, invoke
symboltable.isTerminal(symbol)
which will return true if symbol is terminal, and
false otherwise. The method may throw the
error SymbolNotFoundError if the symbol was not recorded.
-
Upon encountering any error, you may abort, but produce the white-space
delimited
string
ABORT
in your output.
-
You may assume that
- All symbols used in the CUP specification are
declared, so that SymbolNotFoundError will not be thrown.
- There are no empty productions
- Each production group is properly terminated by a semicolon.
-
For your translation, you are to produce a labeled set of transition
edges for the grammar's corresponding finite state machine:
-
For each production you encounter, you must output a line of the
following form:
Edge NonTerm1 NonTerm2 label
where you supply the strings for NonTerm1,
NonTerm2,
and label.
If the transition is to a final state, then NonTerm2
should be
the string $FINAL$.
-
You must also produce a line to indicate the start state:
Start NonTerm
-
For extra fun, try handling
undeclared symbols by inferring their type from the first rule containing
their use.
- For some fun, feed the output of your program into
my MakeTable program.
It will convert your edge
descriptive output into tables (quasi)suitable for inclusion in Fsa.java.
My program runs as a self-contained jar file:
java -jar maketable.jar
You can give it a filename or redirect standard input.
To assist you,
the Fsa constructor is passed an Enumeration that
is the token stream.
There is no end-of-input token: the token stream is exhausted when the
Enumeration has no more elements.
Each element of the Enumeration is a Token
x such that
x.type() is one of the following:
| Token.Blank |
A string comprised of blanks, tabs, and newlines.
Although Blank tries to return the longest such string it can,
there are situations when the scanner will return multiple consecutive
Blanks. Your finite-state machine must accommodate this.
|
| Token.Define |
The character sequence ::= |
| Token.Semi | The character ; |
| Token.Or | The character | |
| Token.Comma | The character , |
| Token.Str |
A string of characters has been recognized that is not otherwise
covered by other tokens. The actual
string is accessible as x.strValue() for token x.
|
| Token.Terminal | The string terminal |
| Token.Non | The string non |
| Token.Start | The string start |
| Token.With | The string with |
| Token.Other | Anything else (always an error) |
The tokens above are conveniently in the range of 0...10; see the
file Token.java for the correspondence.
| Sample input | Sample output |
non terminal obj S;
terminal obj plus, minus, zero, one;
non terminal obj Rest, D;
terminal obj bogus, not, really, used;
start with S;
S
::= zero Rest
| one Rest
;
Rest
::= plus D
| minus D
;
D
::= zero
| one
| really
;
|
Start S
Edge S Rest zero
Edge S Rest one
Edge Rest D plus
Edge Rest D minus
Edge D $FINAL$ zero
Edge D $FINAL$ one
Edge D $FINAL$ really
|
Last modified 16:23:16 CST 05 February 2008
by Ron K. Cytron