Behold the general structure of a CUP specification:
Section | Example |
Symbol Declarations |
terminal stringtype lparen, rparen, digit; non terminal inttype Number; non terminal stringtype Parens; |
Start Declaration | start with Number; | Productions |
Number ::= Number digit | digit | Parens ; Parens ::= lparen rparen; |
The phrases terminal, non terminal, and start with indicate the declaration of terminal and nonterminal symbols, respectively. The word immediately following each of those is not a symbol to be declared, but is instead the semantic type of the symbol. So, in the example above, stringtype and inttype are object types for the symbols declared on those lines. The distributon of white space is arbitrary, but the above examples were constructed to show good formatting style.
Upon encountering a terminal symbol, your code should record the symbol as terminal in the symbol table by issuing:
symboltable.enterTerminal(symbol)Similarly, nonterminals are recorded by issuing:
symboltable.enterNonTerminal(symbol)
For this assignment, you will modify only the file Fsa.java, and that is the only file you will turn in.
In your submitted file, be sure to document your approach, including how you decided to treat keywords such as non, with, etc.
The file as supplied contains a skeletal finite-state machine (simulator), driven by the tables GOTO[][] and ACTION[][]. Your assignment is to expand and fill in these tables appropriately, and to add code in the switch statement to perform the required translation.
While processing the rules section, you will check each production for right-linear form. That is, there should be only nonterminals on the left side of a rule, and the right side of a rule should be a terminal followed by at most one nonterminal. To determine the type of a symbol previously entered into the symbol table, invoke
symboltable.isTerminal(symbol)which will return true if symbol is terminal, and false otherwise. The method may throw the error SymbolNotFoundError if the symbol was not recorded.
To assist you, the Fsa constructor is passed an Enumeration that is the token stream. There is no end-of-input token: the token stream is exhausted when the Enumeration has no more elements. Each element of the Enumeration is a Token x such that x.type() is one of the following:
Token.Blank | A string comprised of blanks, tabs, and newlines. Although Blank tries to return the longest such string it can, there are situations when the scanner will return multiple consecutive Blanks. Your finite-state machine must accommodate this. |
Token.Define | The character sequence ::= |
Token.Semi | The character ; |
Token.Or | The character | |
Token.Comma | The character , |
Token.Str | A string of characters has been recognized that is not otherwise covered by other tokens. The actual string is accessible as x.strValue() for token x. |
Token.Terminal | The string terminal |
Token.Non | The string non |
Token.Start | The string start |
Token.With | The string with |
Token.Other | Anything else (always an error) |
Sample input | Sample output |
non terminal obj S; terminal obj plus, minus, zero, one; non terminal obj Rest, D; terminal obj bogus, not, really, used; start with S; S ::= zero Rest | one Rest ; Rest ::= plus D | minus D ; D ::= zero | one | really ; | Start S Edge S Rest zero Edge S Rest one Edge Rest D plus Edge Rest D minus Edge D $FINAL$ zero Edge D $FINAL$ one Edge D $FINAL$ really |
Expected output for the lab's inputs can be found in the Solns folder of the download.