jay (Language Processing)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package jay

This is the homepage of jay, a LALR(1) parser generator: Berkeley yacc © retargeted to C# and Java.

See:
Description

Class Summary
dummy	fake class, triggers javadoc.

Package jay Description

This is the homepage of jay, a LALR(1) parser generator: Berkeley yacc © retargeted to C# and Java.

Usage
Input Format
Generics
Skeleton Files
Class Management
Downloads

Belarussian translation courtesy of Vicky Rotarova.

Czech translation courtesy of Barbora Lebedová.

German translation courtesy of Philip Egger.

Hungarian translation courtesy of Szabolcs Csintalan.

Indonesian translation courtesy of Jordan Silaen.

Irish translation courtesy of Ava Flynn at Travel-Ticker.com.

Japanese translation courtesy of Jianhua Ma.

Latvian translation courtesy of Nadia Karbowska.

Portugese translation courtesy of Artur Weber.

Romanian translation courtesy of Translate Team.

Russian translation courtesy of Nikolay Pershikov.

Slovakian translation courtesy of Blahoslav Konopka.

Slovenian translation courtesy of NextRanks.

Tatar translation courtesy of Timur Ganeev.

Usage

jay reads a grammar specification from a file and generates an LALR(1) parser for it. A parser consists of a set of parsing tables and a driver routine from a skeleton which is read from standard input. Suitable skeletons exist for Java and C#. Tables and driver are written to standard output.

  jay [-ctv] [-b file-prefix] grammar skeleton|<skeleton
  java -jar jay.jar [-ctv] [-b file-prefix] grammar skeleton|<skeleton

The following options are available:

`-b file-prefix`	changes the prefix prepended to the secondary output file names to the string denoted by `file_prefix`. The default prefix is the character `y`.
`-c`	arranges for C preprocessor `#line` directives to be incorporated in the output. This is only useful for C#.
`-t`	arranges for debugging information to be incorporated in the output. The actual information is controlled by the skeleton files; as distributed it depends on additional runtime packages. For C# this is part of the source download, for Java see `jay.yydebug`.
`-v`	causes a human-readable description of the generated parser to be written to the file `file_prefix.output`.

If one of the environment variables TMPDIR, TMP, or TEMP is set, the string from the environment variable will be used as the name of the directory where the temporary files are created.

Input Format

The input format and the LALR(1) algorithm have not been changed from yacc. One should consult the extensive literature on yacc for details on writing and debugging grammars, error recovery, strategies for actions, etc.

The only differences are the value stack, the embedding of the generated parser in a class, and the interface to the scanner. All of these can be changed by modifying the skeleton files. The remainder of this section is based on the skeleton files distributed with jay.

The %union directive has been removed. jay uses Object (or System.Object in C#) for the value stack. Consequently, the name in the tag notation <name> refers to a class or an interface.

This has implications for the casts that jay generates: Neither C# nor Java permit assignments to casted variables. Therefore, the notation $$ refers to an Object without cast because $$ is usually assigned to. If $$ is used for other purposes, it usually will have to employ an explicit type $<name>$ which is turned into a cast to name.

Similarly, the notation $n is rarely assigned to. Therefore, jay will generate a cast unless the notation $<>n is used to prevent casting.

jay does not emit casts to Object. These casts are usually unnecessary and this strategy avoids numerous warning messages but it could cause a surprise in an overloading situation.

jay has no notion of inheritance. This can lead to unwarranted warning messages complaining about questionable assignments. It was felt that these messages are generally useful even if some of them are erroneous.

Generics

The tables and skeleton files of jay do not use parametrized types. jay.yydebug is coded without parametrized types; however, the sources contain code with generics in lines which at this point are commented out.

The notation <tag> may contain nested angle brackets and within them the characters [ ] blank ? , in addition to the usual alphanumerics and . $ _. However, references to the value stack $n are cast using the applicable tag and a cast to a parametrized type will draw an unchecked warning in Java.

The parser class could be annotated with @SuppressWarnings("unchecked"); however, while this may be a way of life for Java 5 it is probably unwise.

Skeleton Files

The binary or source download includes two skeleton files for Java and one for C#. A skeleton file controls the format of the generated tables and it includes the actual parser algorithm that interprets the tables. The algorithms are the same in all distributed files but skeleton.tables initializes the various tables by reading a resource file at execution time; this avoids a limit which the Java system imposes on the size of the code segment for a class.

To create the resource file, generate the parser using skeleton.tables. From the parser source extract exactly the lines starting with //yy and remove exactly that prefix. The resulting file should be located in the same directory as the class file of the parser and should use the class name of the parser and the suffix .tables.

It should not be necessary to change the skeleton files, but just in case they are extensively commented. The files are line-oriented. A character in the first column determines what happens to a line: # marks a comment and the line is ignored. . marks a line which is copied without the leading period.

t marks a line that is relevant for tracing. Normally it is copied with a leading //t; if the option -t is set the line is copied without the leading t.

Finally, a line with a leading blank contains a command which results in the output of some table information and which can use the rest of the line as a parameter.

`actions`	emit code from the actions as body of a `switch`.
`epilog`	emit the text following the second `%%`.
`local`	emit the text within `%{ %}` following the first `%%`.
`prolog`	emit the text within `%{ %}` prior to the first `%%`.
`tokens prefix`	emit each token value as an initialized identifier with the remainder of the line as a prefix.
`version comment`	emit a `//` comment with the remainder of the line.
`yyCheck prefix yyDefRed prefix yyDgoto prefix yyGindex prefix yyLen prefix yyLhs prefix yyRindex prefix yySindex prefix yyTable prefix`	emit the body of the relevant table with the remainder of the line as a prefix for each output line.
`yyFinal prefix`	emit the value as an initializer with the remainder of the line as a prefix.
`yyNames prefix`	emit the table as a list of words with the remainder of the line as a prefix for each output line.
`yyNames-strings`	emit the table as a list of string initializers.
`yyRule prefix`	emit the table as a list of lines with the remainder of the line as a prefix for each output line.
`yyRule-strings`	emit the table as a list of string initializers.

Each table is prefixed by a comment with dimension information.

Class Management

The design of a skeleton file has to consider two problems: how to embed the parser in a class and how to interface to the scanner.

The distributed skeleton files expect the user to supply a prolog within %{ %} containing a class header and to supply an epilog following the second %% which closes this class. jay does not know the class name of the parser.

The interface to the scanner yyInput is generated as a member of each parser class; this may or may not be a good choice. There are three methods: advance has no arguments and must return a boolean value indicating that the scanner has successfully extracted another input symbol; token has no arguments and must return the current input symbol as an integer value which the parser expects; value has no arguments and can return an object value to be placed on the state/value stack for the input symbol. Tracing expects token and value to be constant functions between each call to advance.

Explicit token values are generated as constants in the parser class. Single characters represent themselves; however, for those jay believes in the ASCII rather then the Unicode character set. It might have been better to define the constants in the scanner interface but it is expected that the scanner is implemented as an inner class of the parser. pj supports this view even if the scanner is explicitly constructed using JLex.

Downloads

Version:: 1.1.1, June 2006.
Author:: Axel T. Schreiner .