Lab 1: Finite-State Transducers

Lab Assigned Lab Due
(Start of class)
16 Jan 6 Feb

My output for the problems can be found here


Abstract: In this assignment you will implement a simple finite-state transducer, which uses the actions of a finite-state automaton to perform a simple language translation. You will also gain experience with CUP (Constructor of Useful Parsers) notation. Specifically, you will design and implement a deterministic finite-state automaton that recognizes fully-declared right-linear CUP grammar specifications. The output (transduction) from your program consists of node and edge declarations for a finite-state automaton.

As with each machine problem assigned, you will find a TestFiles subdirectory. That directory contains a collection of files you can use to test your solution. While your solution is expected to work on these inputs, your code will be evaluated on other, secret inputs. You are encouraged to test your program to ensure that it operates reasonably and will pass muster with the our secret test files.

Behold the general structure of a CUP specification:

Section Example
Symbol Declarations
    terminal stringtype lparen, rparen, digit;
non terminal inttype    Number; 
non terminal stringtype Parens; 
Start Declaration start with Number;
Productions
Number ::= Number digit
         | digit
         | Parens
         ;

Parens ::= lparen rparen;
The phrases terminal, non terminal, and start with indicate the declaration of terminal and nonterminal symbols, respectively. The word immediately following each of those is not a symbol to be declared, but is instead the semantic type of the symbol. So, in the example above, stringtype and inttype are object types for the symbols declared on those lines. The distributon of white space is arbitrary, but the above examples were constructed to show good formatting style.

Upon encountering a terminal symbol, your code should record the symbol as terminal in the symbol table by issuing:

  symboltable.enterTerminal(symbol)
Similarly, nonterminals are recorded by issuing:
  symboltable.enterNonTerminal(symbol)

I'm assuming you will want to use Eclipse for development this semester. I'm relatively new to it, but the following instructions worked for me. Any advice is appreciated. If you'd rather do this on Linux without eclipse please let me know and I'll post instructions on how to do it that way.

Eclipse instructions:

  1. The files for this assignment are in this zip file. Click on the link and save the archive file somewhere handy.
  2. After launching eclipse, create a New Java Project and call it something meaningful (like CSE431 hw1 or Best Lab Ever).
  3. With that project selected in the Package Explorer, click File → Import …
  4. You want to import an archive file, so look for this in the types of files you can import. For me, it is under the General folder.
  5. Let Eclipse import all the files in the zip you downloaded in step 1.
  6. Eclipse should automatically build the project by compiling all of the source java files. You will see some warnings issued from the compilation, but you should ignore them (silly Eclipse! Those warnings are from automatically generated code).
  7. To run the application, you will have to provide a command-line argument to specify the file to be processed by the application.
    1. Click Run → Run …
    2. Configure the run as a Java Application
    3. Click on the Arguments tab
    4. Fill inthe Program arguments with a file you want to process, such as TestFiles/good1.cup
    5. Click Apply and Run

For this assignment, you will modify only the file Fsa.java, and that is the only file you will turn in.

In your submitted file, be sure to document your approach, including how you decided to treat keywords such as non, with, etc.

When you are done, mail only your Fsa.java file (and no other files) to legrand@cs.wustl.edu. You can send it as a single, solitary attachment using pine (or just about any mail reader), or you can use the Linux incantation:

mail legrand@cs.wustl.edu < Fsa.java
which runs the mail program and sends it Fsa.java as standard input.

The file as supplied contains a skeletal finite-state machine (simulator), driven by the tables GOTO[][] and ACTION[][], as described in class. Your assignment is to expand and fill in these tables appropriately, and to add code in the switch statement to perform the required translation.

While processing the rules section, you will check each production for right-linear form. That is, there should be only nonterminals on the left side of a rule, and the right side of a rule should be a terminal followed by at most one nonterminal. To determine the type of a symbol previously entered into the symbol table, invoke

   symboltable.isTerminal(symbol)
which will return true if symbol is terminal, and false otherwise. The method may throw the error SymbolNotFoundError if the symbol was not recorded.

To assist you, the Fsa constructor is passed an Enumeration that is the token stream. There is no end-of-input token: the token stream is exhausted when the Enumeration has no more elements. Each element of the Enumeration is a Token x such that x.type() is one of the following:

Token.Blank A string comprised of blanks, tabs, and newlines. Although Blank tries to return the longest such string it can, there are situations when the scanner will return multiple consecutive Blanks. Your finite-state machine must accommodate this.
Token.Define The character sequence ::=
Token.Semi The character ;
Token.Or The character |
Token.Comma The character ,
Token.Str A string of characters has been recognized that is not otherwise covered by other tokens. The actual string is accessible as x.strValue() for token x.
Token.Terminal The string terminal
Token.Non The string non
Token.Start The string start
Token.With The string with
Token.Other Anything else (always an error)
The tokens above are conveniently in the range of 0...10; see the file Token.java for the correspondence.
Sample input Sample output
non terminal obj S;
    terminal obj plus, minus, zero, one;
non terminal obj Rest, D;
    terminal obj bogus, not, really, used;
start with S;
S
	::= zero Rest
	|   one Rest
	;

Rest
	::= plus    D
	|   minus   D
	;

D
	::= zero
	|   one
	|   really
	;
                     
Start S
Edge S Rest zero
Edge S Rest one
Edge Rest D plus
Edge Rest D minus
Edge D $FINAL$ zero
Edge D $FINAL$ one
Edge D $FINAL$ really
                      


Last modified 16:23:16 CST 05 February 2008 by Ron K. Cytron