j-guru-blue.jpg (8086 bytes)

ANTLR

jGuru

ANTLR 2.7.1
Release Notes

October 1, 2000

The ANTLR 2.7.1 release is a bug fix release, brought to you by those hip cats at jGuru.com.   One of the bug fixes, however, allows UNICODE characters to be recognized for the first time. :)

Enhancements

ANTLR 2.7.1 has a few enhancements:

  • ANTLR now allows UNICODE characters because Terence made case-statement expressions more efficient ;)  See the unicode example in the distribution and the brief blurb in the documentation.
  • Massively improved C++ code generator (see below).
  • Added automatic column setting support.  See updated doc and new examples/java/columns directory.
  • Ter added throws to tree and regular parsers .
  • Added an antlr/extras directory, currently containing only antlr-emacs.el by Christoph.Wedler@sap-ag.de.  Thanks, Christoph!

C++ Code Generation

Pete Wells and Ric Klaren have pretty much gutted the C++ code generator to use templates and so on.  Here are few notes (with lib/cpp/Changelog having more goodies).  Ric has totally worked his ass off to make the C++ what it is now! :)

Enhancements to C++ code generator for:

  • * #line generation for easier debugging of action code. Turn on/off
    with option genHashLines (grammar option).
  • * Cleaner generated code, by providing options to specify namespace
    prefixes. Grammar options namespaceAntlr and namespaceStd can
    be set to "antlr::" and "std::" or to blank if your compiler
    doesn't support namespaces.
  • * Generate comments to explain what the bitsets represent.
  • * Fix bug with -traceTreeParser code.
  • * Avoid warnings about unused variable _saveIndex.
  • * Remove final, illegal comma in token types enum.

Enhancements to C++ support library for:

  • * Performance enhancements. Thanks to several people for
    suggestions/patches here. Improvements to memory management for
    building strings, and buffering of tokens.
  • * Support for Metrowerks Codewarrior, and Sun CC 5.0.
  • * Fix problem with multi-threaded lexers using static variable.
  • * Slight tidy up (more planned).

Additionally, there have been enhancements made to the C++ side to mirror the Java side changes.

Ric Klaren (2.7.1a3 C++ changes) says:

  • - action.g allow ':' in ID rule so C++ namespace qualifiers work.
  • - CppCodegernator the '::' fix for namespaceXXX options. As requested by Michael Schmitt.
  • - Several cleanups in the Exception classes (basically a hoisting of code) and one or two new constructors with more line/column param's.
  • - Default value for column in LexerInputState to 1. as suggested by someone on the list.. (name I would have to look up)
  • - A makefile for the C++ lib directory. Not yet the autoconf stuff posted by someone (whose name I would also have to look up) it would imply a bigger workover of the lib/cpp directory. Which is harder to do with diff's.
  • - Some changes I made after enabling the effective C++ warnings on g++ (minor drivle basically.. in most places not really needed)
  • - Several virtuals added to methods. Based on a suggestion also by Ernest Pasour. It makes the error messages from the thrown exceptions a lot better

Bug Fixes

In no particular order, here are the improvements/fixes made to 2.7.0 to arrive at 2.7.1 (via 2.7.1a1..a4):

  • columns started at 0 for line 1.  fixed.
  • Bob McWhirter added -o fix so that antlr looks for import vocab stuff in -o director if not found in $CWD (current working directory).
  • Added optimization so that large unicode ranges don't result in giant switch case expressions. For example, added charVocabulary='\u0003'..'\uffff' to java.g. Took antlr 24s to generate 51k lexer file vs 9sec without. New 2.7.1 did it with big vocab in 14 sec. Oh, and the interesting thing is that with the big vocab and new optimization, it's actually smaller than with vocab set to ASCII. :)
  • added a build script.
  • Robert Colquhoun rjc@trump.net.au gave me a patch to pull stuff out of Tool.java that was causing it to be required for runtime even.
  • Jerry James (james@eecs.ukans.edu) gave me a patch to make the labels for heterogeneous tree nodes match the specific AST type rather than plain AST.
  • ANTLR didn't like curlies in quotes (preproc.g was hosed).   It now parses:

class A extends Parser;

tokens {
// hi |}
/*
fds
*}*/
TOK_LBRACE="{";
TOK_RBRACE="}";
}

a : "{" B "}";

  • Fixed C++ code generator to allow ~(Z|G)
  • Parser.getInputState called setInputState.
  • ANTLR now allows comments between header, options, and tokens and then '{' now.  Examples:

options //fdkjfds
{
k = 1;
}

tokens //testing
{
A = "a";
}

  • Made fields of CommonToken protected (open to subclasses), added col. added column tracking support; tabs are counted as 1 unless you override tab(). Called from consume(); bumps by one by default. Overhead is minimal; only called on tabs. extra increment for all consume()s now extra int in CommonToken.

/**
advance the current column number by an appropriate amount. If you do not override this to specify how much to jump for a tab, then tabs are counted as one char. This method is called from consume().
*/
public void tab() {
// update inputState.column as function of
// inputState.column and tab stops.
// For example, if tab stops are columns 1

// and 5 etc... and column is 3, then add 2
// to column.
  inputState.column++;
}

  • added CharScanner.setColumn
  • warnings were going to stdout, make go to stderr.
  • added check for unterminated rules. Labels in column 1 result in a warning.
  • wasn't providing always exactly 4 digits for \u chars in JavaCharFormatter.escapeChar.
  • Fixed that nasty follow cycle grammar analysis bug Tom Moog and others found.
  • C++: CharScanner.cpp toLower, changed arg from char to int.
  • added column support to C++ output
  • Sather fixes put in, brought up to snuff with Java/C++.
  • ANTLR continued on after discovering duplicate grammar. 'caused later exception.
  • Bug fix: $setType( w ); didn't work because of the leading space.
  • For the java.tree.g grammar: the NEW operator didn't allow an optional (objBlock)?
  • HTML: Added lots of tweaks to html.g, Made blockquote handle nested content.  Fixed bug in COMMENT_DATA that wouldn't let '-' appear in comment. Made COMMENT scarf WS after comment
  • Added to runtime jars (bigger but too lazy to weed out unnecessary var refs that force inclusion):

antlr/DefineGrammarSymbols.class
antlr/ANTLRGrammarParseBehavior.class
antlr/MakeGrammar.class
antlr/ANTLRParser.class
antlr/ANTLRTokenTypes
antlr/LLkGrammarAnalyzer
antlr/GrammarAnalyzer

  • Added constructors.

public CommonASTWithHiddenTokens() {
  super();
}

public CommonASTWithHiddenTokens(Token tok) {
  super(tok);
}

ANTLR Installation

ANTLR comes as a single zip or compressed tar file. Unzipping the file you receive will produce a directory called antlr-2.7.1 with subdirectories antlr, doc, examples, cpp, and examples.cpp. You need to place the antlr-2.7.1 directory in your CLASSPATH environment variable. For example, if you placed antlr-2.7.1 in directory /tools, you need to append

/tools/antlr-2.7.1

to your CLASSPATH or.

\tools\antlr-2.7.1

if you work on an NT or Win95 box.

References to antlr.* will map to /tools/antlr-2.7.1/antlr/*.class.

You must have at least JDK 1.1 installed properly on your machine.  The ASTFrame AST viewer uses Swing 1.1.

JAR FILE

Try using the runtime library antlr.jar file. Place it in your CLASSPATH instead of the antlr-2.7.1 directory. The jar includes all parse-time files needed (if it is missing a file, email parrt@jguru.com) You cannot run the antlr tool itself with the jar, but your parsers should run with just this jar file.   It's pretty small, around 75k uncompressed.

RUNNING ANTLR

ANTLR is a command line tool (although many development environments let you run ANTLR on grammar files from within the environment). The main method within antlr.Tool is the ANTLR entry point.

java antlr.Tool file.g

The command-line option is -diagnostic, which generates a text file for each output parser class that describes the lookahead sets. Note that there are number of options that you can specify at the grammar class and rule level.

Options -trace, -traceParser, -traceTreeParser may be used to track the lexer, parser, and tree parser invocations.

Try the new -html option to generate HTML output of your grammar(s); this is only partially done.

If you have trouble running ANTLR, ensure that you have Java installed correctly and then ensure that you have the appropriate CLASSPATH set.

Version: $Id: //depot/code/org.antlr/release/antlr-2.7.1/doc/antlr271release.html#2 $