Lab 1: Finite-State Transducers
Lab Assigned Lab Due
(Start of class)
15 Jan 29 Jan


Abstract: In this assignment you will implement a simple finite-state transducer, which uses the actions of a finite-state automaton to perform a simple language translation. You will also gain experience with CUP (Constructor of Useful Parsers) notation. Specifically, you will design and implement a deterministic finite-state automaton that recognizes fully-declared right-linear CUP grammar specifications. The output (transduction) from your program consists of node and edge declarations for a finite-state automaton.

As with each machine problem assigned, you will find a TestFiles subdirectory. That directory contains a collection of files you can use to test your solution. While your solution is expected to work on these inputs, your code will be evaluated on other, secret inputs. You are encouraged to test your program to ensure that it operates reasonably and will pass muster with the our secret test files.

Behold the general structure of a CUP specification:

Section Example
Symbol Declarations
    terminal stringtype lparen, rparen, digit;
non terminal inttype    Number; 
non terminal stringtype Parens; 
Start Declaration start with Number;
Productions
Number ::= Number digit
         | digit
         | Parens
         ;

Parens ::= lparen rparen;
The phrases terminal, non terminal, and start with indicate the declaration of terminal and nonterminal symbols, respectively. The word immediately following each of those is not a symbol to be declared, but is instead the semantic type of the symbol. So, in the example above, stringtype and inttype are object types for the symbols declared on those lines. The distributon of white space is arbitrary, but the above examples were constructed to show good formatting style.

Upon encountering a terminal symbol, your code should record the symbol as terminal in the symbol table by issuing:

  symboltable.enterTerminal(symbol)
Similarly, nonterminals are recorded by issuing:
  symboltable.enterNonTerminal(symbol)
The files for this assignment are found in the ~cs431/hw1 subdirectory, or can be gotten by this zip file. To begin:
  1. Make sure you have the java package added to your environment. This is best done by
    pkgaddperm java
    You then have to log out and back in again to have it take effect.
  2. Create a new directory for your work: mkdir hw1
  3. Change into that directory: cd hw1
  4. Set things up by typing
    make -f ~cs431/hw1/Makefile Setup
The above instructions should work from either CEC or from the CS Unix systems. Once this is done, and while you are developing your solution, you can use the following commands:
make Brings your work up-to-date by compiling modified files.
java rlgram file Runs your work on file
cd ~cs431/hw1/Solved
java rlgram file
Runs my solution for you

For this assignment, you will modify only the file Fsa.java, and that is the only file you will turn in. When you are done, mail only your Fsa.java (and no other files) to cs431@cec.wustl.edu. You can send it as a single, solitary attachment using pine, or you can use the UNIX incantation:

mail cs431@cec.wustl.edu < FSA.java
which runs the mail program and sends it FSA.java as standard input.

The file as supplied contains a skeletal finite-state machine (simulator), driven by the tables GOTO[][] and ACTION[][], as described in class. Your assignment is to expand and fill in these tables appropriately, and to add code in the switch statement to perform the required translation.

While processing the rules section, you will check each production for right-linear form. That is, there should be only nonterminals on the left side of a rule, and the right side of a rule should be a terminal followed by at most one nonterminal. To determine the type of a symbol previously entered into the symbol table, invoke

   symboltable.isTerminal(symbol)
which will return true if symbol is terminal, and false otherwise. The method may throw the error SymbolNotFoundError if the symbol was not recorded.

To assist you, the Fsa constructor is passed an Enumeration that is the token stream. There is no end-of-input token: the token stream is exhausted when the Enumeration has no more elements. Each element of the Enumeration is a Token x such that x.type() is one of the following:

Token.Blank A string comprised of blanks, tabs, and newlines. Although Blank tries to return the longest such string it can, there are situations when the scanner will return multiple consecutive Blanks. Your finite-state machine must accommodate this.
Token.Define The character sequence ::=
Token.Semi The character ;
Token.Or The character |
Token.Comma The character ,
Token.Str A string of characters has been recognized that is not otherwise covered by other tokens. The actual string is accessible as x.strValue() for token x.
Token.Terminal The string terminal
Token.Non The string non
Token.Start The string start
Token.With The string with
Token.Other Anything else
The tokens above are conveniently in the range of 0...10; see the file Token.java for the correspondence.
Sample input Sample output
non terminal obj S;
    terminal obj plus, minus, zero, one;
non terminal obj Rest, D;
    terminal obj bogus, not, really, used;
start with S;
S
	::= zero Rest
	|   one Rest
	;

Rest
	::= plus    D
	|   minus   D
	;

D
	::= zero
	|   one
	|   really
	;
                     
Start S
Edge S Rest zero
Edge S Rest one
Edge Rest D plus
Edge Rest D minus
Edge D $FINAL$ zero
Edge D $FINAL$ one
Edge D $FINAL$ really
                      


Last modified 14:00:37 CST 20 January 2002 by Ron K. Cytron