|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
dummy | fake class, triggers javadoc. |
This is the homepage of jay, a LALR(1) parser generator: Berkeley yacc © retargeted to C# and Java.
Belarussian translation courtesy of Vicky Rotarova.
Czech translation courtesy of Barbora Lebedová.
German translation courtesy of Philip Egger.
Hungarian translation courtesy of Szabolcs Csintalan.
Indonesian translation courtesy of Jordan Silaen.
Irish translation courtesy of Ava Flynn at Travel-Ticker.com.
Japanese translation courtesy of Jianhua Ma.
Latvian translation courtesy of Nadia Karbowska.
Portugese translation courtesy of Artur Weber.
Romanian translation courtesy of Translate Team.
Russian translation courtesy of Nikolay Pershikov.
Slovakian translation courtesy of Blahoslav Konopka.
Slovenian translation courtesy of NextRanks.
Tatar translation courtesy of Timur Ganeev.
jay reads a grammar specification from a file and generates an LALR(1) parser for it. A parser consists of a set of parsing tables and a driver routine from a skeleton which is read from standard input. Suitable skeletons exist for Java and C#. Tables and driver are written to standard output.
jay [-ctv] [-b file-prefix] grammar skeleton|<skeleton java -jar jay.jar [-ctv] [-b file-prefix] grammar skeleton|<skeleton
The following options are available:
-b file-prefix | changes the prefix prepended to the secondary output file names to the string denoted by file_prefix. The default prefix is the character y. |
-c | arranges for C preprocessor #line directives to be incorporated in the output. This is only useful for C#. |
-t |
arranges for debugging information to be incorporated in
the output. The actual information is controlled by
the skeleton files; as
distributed it depends on additional runtime packages. For
C# this is part of the source download, for Java see jay.yydebug .
|
-v | causes a human-readable description of the generated parser to be written to the file file_prefix.output. |
If one of the environment variables TMPDIR, TMP, or TEMP is set, the string from the environment variable will be used as the name of the directory where the temporary files are created.
The input format and the LALR(1) algorithm have not been changed from yacc. One should consult the extensive literature on yacc for details on writing and debugging grammars, error recovery, strategies for actions, etc.
The only differences are the value stack, the embedding of the generated parser in a class, and the interface to the scanner. All of these can be changed by modifying the skeleton files. The remainder of this section is based on the skeleton files distributed with jay.
The %union directive has been removed. jay
uses Object
(or System.Object in C#)
for the value stack. Consequently, the name in
the tag notation <name> refers
to a class or an interface.
This has implications for the casts that jay generates:
Neither C# nor Java permit assignments to casted variables.
Therefore, the notation $$ refers to an Object
without cast because $$ is usually
assigned to. If $$ is used for other purposes, it usually
will have to employ an explicit type
$<name>$ which is turned into
a cast to name.
Similarly, the notation $n is rarely assigned to. Therefore, jay will generate a cast unless the notation $<>n is used to prevent casting.
jay does not emit casts to Object
.
These casts are usually unnecessary and this strategy avoids
numerous warning messages but it could cause a surprise in an
overloading situation.
jay has no notion of inheritance. This can lead to unwarranted warning messages complaining about questionable assignments. It was felt that these messages are generally useful even if some of them are erroneous.
The tables and skeleton files of jay do not use
parametrized types. jay.yydebug
is coded without parametrized
types; however, the sources contain code with generics in lines
which at this point are commented out.
The notation <tag> may contain nested angle brackets and within them the characters [ ] blank ? , in addition to the usual alphanumerics and . $ _. However, references to the value stack $n are cast using the applicable tag and a cast to a parametrized type will draw an unchecked warning in Java.
The parser class could be annotated with @SuppressWarnings("unchecked"); however, while this may be a way of life for Java 5 it is probably unwise.
The binary or source download includes two skeleton files for Java and one for C#. A skeleton file controls the format of the generated tables and it includes the actual parser algorithm that interprets the tables. The algorithms are the same in all distributed files but skeleton.tables initializes the various tables by reading a resource file at execution time; this avoids a limit which the Java system imposes on the size of the code segment for a class.
To create the resource file, generate the parser using skeleton.tables. From the parser source extract exactly the lines starting with //yy and remove exactly that prefix. The resulting file should be located in the same directory as the class file of the parser and should use the class name of the parser and the suffix .tables.
It should not be necessary to change the skeleton files, but just in case they are extensively commented. The files are line-oriented. A character in the first column determines what happens to a line: # marks a comment and the line is ignored. . marks a line which is copied without the leading period.
t marks a line that is relevant for tracing. Normally it is copied with a leading //t; if the option -t is set the line is copied without the leading t.
Finally, a line with a leading blank contains a command which results in the output of some table information and which can use the rest of the line as a parameter.
actions | emit code from the actions as body of a switch. | ||||||||||||||
epilog | emit the text following the second %%. | ||||||||||||||
local | emit the text within %{ %} following the first %%. | ||||||||||||||
prolog | emit the text within %{ %} prior to the first %%. | ||||||||||||||
tokens prefix
emit each token value as an initialized identifier with
the remainder of the line as a prefix.
| version comment
| emit a // comment with the remainder of the line.
| yyCheck prefix
| yyDefRed prefix yyDgoto prefix yyGindex prefix yyLen prefix yyLhs prefix yyRindex prefix yySindex prefix yyTable prefix emit the body of the relevant table with
the remainder of the line as a prefix for each output line.
| yyFinal prefix
| emit the value as an initializer with the remainder of
the line as a prefix.
| yyNames prefix
| emit the table as a list of words with the remainder of
the line as a prefix for each output line.
| yyNames-strings
| emit the table as a list of string initializers.
| yyRule prefix
| emit the table as a list of lines with the remainder of
the line as a prefix for each output line.
| yyRule-strings
| emit the table as a list of string initializers.
| |
Each table is prefixed by a comment with dimension information.
The design of a skeleton file has to consider two problems: how to embed the parser in a class and how to interface to the scanner.
The distributed skeleton files expect the user to supply a prolog within %{ %} containing a class header and to supply an epilog following the second %% which closes this class. jay does not know the class name of the parser.
The interface to the scanner yyInput is generated as a member of each parser class; this may or may not be a good choice. There are three methods: advance has no arguments and must return a boolean value indicating that the scanner has successfully extracted another input symbol; token has no arguments and must return the current input symbol as an integer value which the parser expects; value has no arguments and can return an object value to be placed on the state/value stack for the input symbol. Tracing expects token and value to be constant functions between each call to advance.
Explicit token values are generated as constants in the parser
class. Single characters represent themselves; however, for those
jay believes in the ASCII rather then the Unicode character
set. It might have been better to define the constants in the
scanner interface but it is expected that the scanner is implemented
as an inner class of the parser. pj
supports this view
even if the scanner is explicitly constructed using JLex.
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |