OpenToken Package Readme

Version 3.0b


The OpenToken package is a facility for performing token analysis and parsing within the Ada language. It is designed to provide all the functionality of a traditional lexical analyzer/parser generator, such as lex/yacc. But due to the magic of inheritance and runtime polymorphism it is implemented entirely in Ada as withed-in code. No precompilation step is required, and no messy tool-generated source code is created.

Additionally, the technique of using classes of recognizers promises to make most token specifications as simple as making an easy to read procedure call. The most error prone part of generating analyzers, the token pattern matching, has been taken from the typical user's hands and placed into reusable classes. Over time I hope to see the addition of enough reusable recognizer classes that very few users will ever need to write a custom one. Parse tokens themselves also use this technique, so they ought to be just as reusable in principle, athough there currently aren't a lot of predefined parse tokens included in OpenToken.

Ada's type safety features should also make misbehaving analyzers and parsers easier to debug. All this will hopefully add up to token analyzers and parsers that are much simpler and faster to create, easier to get working properly, and easier to understand.

History

Version 3.0b

This version contains another code reorganization to go with another new parsing facility. This time it is recursive decent parsing. The new method has the following advantages over table-driven parsers: The disadvantages are: Given the above balance, I do intend to make this the standard supported parsing facility for future versions of OpenToken. The "b" designation is there to indicate that some things might not be in quite their permanent form yet, and that there isn't yet the full set of reusable tokens to support it that I would like to see in a release. I'm hoping for feedback both in the form of criticisms/suggestions, and reusable tokens in order to help finalize this facility.

A general list of the changes is below:

Version 2.0

This is the first version to include parsing capability. The existing packages underwent a major reorganization to accommodate the new functionality. As some of the restructuring that was done is incompatible with old code, the major revision has been bumped up to 2. A partial list of changes is below:

Version 1.3.6

This version fixes a rare bug in the Ada style based numeric recognizers. The SLOC counter can now successfully count all the source files in Gnat's adainclude directory.

Version 1.3.5

This version adds a simple Ada SLOC counting program into the examples. A bug with the Real token recognizer that caused constraint_errors has been fixed. Also bugs causing constraint errors in the ada-style based integer and real recognizers on long non-based numbers have been fixed.

Version 1.3

This version adds the default token capability to the Analyzer package. This allows a more flexible (if somewhat inefficient) means of error handling to the analyzer. The default token can be used as an error token, or it can be made into a non-reportable token to ignore unknown elements entirely.

Identifier tokens were generalized a bit to allow user-defined character sets for the first and subsequent characters. This not only gives it the ability to handle syntaxes that don't exacly match Ada's, but it allows one to define identifiers for languages that aren't latin-1 based. Also, the ability to turn off non-repeatable underscores was added.

Integer and Real tokens had an option added to support signed literals. This option is set on by default (which causes a minor backward incompatibility). Syntaxes that have addition or subtraction operators will need to turn this option off.

A test to verify proper handling of default parameters was added to the Test directory. A makefile was also added to the same directory to facilitate automatic compiling and running of the tests. This makefile will not work in a non-Gnat/NT environment without some modification.

New recognizers were added for enclosed comments (eg: C's /* */ comments)and  single character escape sequences. Also a "null" recognizer was added for use as a default token.
 

Version 1.2.1

This version adds the CSV field token recognizer that was inadvertently left out of 1.2. This recognizer was designed to match fields in comma-separated value (CSV) files, which is a somewhat standard file format for databases and spreadsheets. Also, the extraneous CVS directories in the zip version of the distribution were removed.

Version 1.2

The long-awaited string recognizer has been added. It is capable of recognizing both C and Ada-style strings. In addition, there are a great many submissions by Christoph Grein in this release. He contributed mostly complete lexical analyzers for both Java and Ada, along with all the extra token recognizers he needed to accomplish this feat. He didn't need as many extra recognizers as I would have thought he'd need. But even so, slightly less than 1/2 of the recognizers in this release were contributed by Chris (with a broken arm, no less!)

Version 1.1

The main code change to this version is a default text feeder function that has been added to the analyzer. It reads its input from Ada.Text_IO.Current_Input, so you can change the file to whatever you want fairly easily. The capability to create and use your own feeder function still exists, but it should not be necessary in most cases. If you already have code that does this, it should still compile and work properly.

The other addition is the first version of the OpenToken user's guide. All it contains right now is a user manual walking through the steps needed to make a simple token analyzer. Feedback and/or ideas on this are welcome.

Version 1.0

This is the very first publicly released version. This package is based on work I did while working on the JPATS trainer for FlightSafety International. The germ of this idea came while I was trying to port a fairly ambitious, but fatally buggy Ada 83 token recognition package written for a previous simulator. But once I was done, I was rather suprised at the flexibility of the final product. Seeing the possible benefit to the community, and to the company through user-submitted enhancement and debugging, I suggested that this code be released as Open Source. They were open-minded enough to agree. Bravo!
 

Future

As it stands, I am developing and maintaining this package as part of my master's thesis. Thus you can count on a certain amount of progress in the next few months

You may notice that most of the stuff I had marked for last release has been delayed or thrown out. So of course plans do change. :-) But with that caveat...

Things on my plate for the next release:

Things you can help with: Again, I hope you find this package useful for your needs.

T.E.D.  - dennison@telepath.com