Warning: This page contains preliminary results that are likely to change as development teams review and polish their parsers and common methodology. The current intent of this page is not to provide reliable results but to facilitate production of future high quality results. For example, performance of parsers is likely to improve significantly and new parsers may be added to the set. Any help in that direction is more than welcome!
This page discusses compares Hapy performance with performance of other parsers.
Our goal is to measure and document parser performance. Performance is a factor when selecting a parser and when planning how to use it. There are many other important factors (e.g., programming language, parser interface, licensing, and support) that are outside of this page scope. The following performance metrics are measured:
Parsing performance: How much time and RAM it takes to parse given input. This metric is important for users of an application that contains the parser. Parsing speed often affects overall product throughput, feasibility of frequent runtime reconfigurations, or the size and complexity of supported configuration files.
Compilation performance: How much time and RAM it takes to compile the code that will parse input. Also, what is the size of a stripped executable. Compilation speed is important for developers. During active development phase, developers change grammar and parser configuration a lot and, hence, have to recompile a lot. Executable size is especially important for those working with/in embedded or small footprint environments. These metrics are unrelated to parsing speed.
Scale: how do primary metrics above change with input or grammar size?
All of the above metrics may depend on the test environment. For example, operating system, compiler version, and parser configuration often have drastic affects on some measurements. When possible, we provide results for different environments.
The results in this section are based on the tests and methodology documented later in this document. The results are given separately for each test. You may want to pay more attention to test(s) and environments that are relevant to your use cases.
In tables, compilation memory usage is given based on Unix time tool output. Executable size is measured after stripping the binary from debugging symbols. Parsing speed and greed figures are averages over all corresponding tests (same parser, same environment, various input sizes). Parsing speed is the total input size divided by the total parsing time (higher speed is better). Parsing greed corresponds to the coefficient between memory required to parse input and input size (lower greed is better). Both derivative measurements do not depend much on input size, which makes them good invariants for summaries.
The graphs below compare Hapy and Spirit parser performance when generating a parsing tree for simple XML input of increasing size.
And here is a summary table with compile-time measurements and average parsing performance. Complete results are available elsewhere.
Parser | Environment | Compiling | Parsing | |||
---|---|---|---|---|---|---|
time (sec) |
RAM (MB) |
exe (KB) |
speed (KB/sec) |
greed (RAM/input) |
||
Hapy 0.0.3 |
BSD1 | 2 | 10 | 70 | 268 | 18 |
Spirit 1.6.1 |
BSD1 | 32 | 121 | 348 | 50 | 100 |
This section describes the tests used to benchmark parsers. For each test, we define a grammar, valid input generation method, and interpretation task. Parser correctness can be checked using invalid input. Parsers that accept (instead of rejecting with a syntax error) input that does not match the grammar are disqualified.
The Simple XML test requires that the parser creates a parsing tree for an XML input based on a drastically simplified XML grammar. The parser must accept any valid input and must reject any invalid input. This is not a "validating" parser test. Input validity is defined by the grammar.
grammar = node*; node = pi | element | text; element = openElement node* closeElement | closedElement; pi = "<?" name (CHARACTER - "?>")* "?>"; openElement = "<" name attr* ">"; closeElement = "</" name ">"; closedElement = "<" name attr* "/>"; text = (CHARACTER - '<')+; attr = name '=' value; name = ALPHA (ALPHA | DIGIT | '_' | ':')*; value = '"' (CHARACTER - '"')* '"'; - Grammar terminals can be separated by any amount of whitespace. - Grammar terminals are "text", "name", "value", and all literals.
The grammar for this test only recognizes three kinds of XML nodes: text, element, and a processor instruction. Element attributes and nested elements are recognized. Entities, comments, and CDATA sections are not recognized. Many (most?) XML documents or messages produced by machines are limited to XML features recognized by this grammar. The complexity of the grammar is comparable to simple configuration languages and protocol messages.
The input for this test is generated by the tests/xmlgen tool. The original generator has been developed by the XMark project. We have slightly modified the original tool to fix a bug and to generate output of a given size (these changes were sent back to xmlgen authors).
The Full XML test requires that the parser creates a parsing tree for an XML input based on a full XML grammar. The parser must accept any valid input and must reject any invalid input. This is not a "validating" parser test. Input validity is defined by the grammar.
The grammar for this test is based on the XML EBNF extracted from the W3C XML 1.0 specification. All XML constructs must be correctly recognized. Many (most?) XML documents produced by humans use a nearly full set of XML "features". The complexity of the grammar is comparable to scripting languages and protocol messages of moderate-to-high complexity.
We are looking for sources of input data for this test. Suggestions are welcome.
The IP Packets test requires that the parser/interpreter looks at each IP packet within a stream of IPv4 and IPv6 packets. The number of packets matching some simple criteria (e.g., invalid checksum) must be returned as a result of the test.
We are working on the details of this test. Suggestions are welcome.
This section outlines our testing methodology. The overall intent behind our rules is to produce performance results meaningful to an average user that is either making a choice among several parsers or is doing capacity planning for a given parser. That average user is expected to write their own parser for an unknown but similar grammar and use case. Consequently, benchmark-specials of any kind and guru-level optimizations or tricks are not allowed.
No benchmark-special compilation options are used to built executables. All tests executables are built with default parameters for the parser package. When there is no clear default, compiler options optimizing for speed are preferred to options reducing executable size.
The resources to install the parser package itself are not measured. Compilation resources are measured only for building the parser executable from its source file(s). We assume that most users do not modify the library or package often and, hence, would not care how much time it would take to build.
Parsing speed is measured after the whole input is prefetched into RAM to avoid dependency on I/O code implementation. We are interested in pure parsing speed (i.e., the speed it takes to generate parsing tree assuming all input is instantly available).
Parsing memory usage is measured after the input is prefetched and before the parser is built (for runtime generated parsers). It may make sense to report memory usage before prefetching the input, but this initial memory footprint will not depend on the size of the input and would probably be negligible compared to the amount of memory it takes to produce a complete parsing tree. The RAM to store original (prefetched) input is not measured.
The grammar and parsing tree properties must follow typical parser usage style and documentation advice oriented to a novice-to-midlevel user. No benchmark-special or guru-level optimizations and tricks are allowed. We want to report performance that an average user should expect if they implement a similar grammar from scratch.
Parsers that accept some invalid input or reject some valid input (for any reason other than resource exhaustion) are disqualified. Input validity is determined by published EBNF grammar and not parser implementation of that grammar. We are interested in performance of correct parsers only.
For tests using xmlgen XML generator, please use the tests/xmlgen tool that comes with Hapy. XML input must be generated with scale factor of 1 (which is also the default) and variable "slice" sizes. For example,
./xmlgen -f 1 -s 1 | ./parser >> test.log ./xmlgen -f 1 -s 2 | ./parser >> test.log ./xmlgen -f 1 -s 4 | ./parser >> test.log ... ./xmlgen -f 1 -s 1024 | ./parser >> test.log ./xmlgen -f 1 -s 2048 | ./parser >> test.log ./xmlgen -f 1 -s 4096 | ./parser >> test.log ./xmlgen -f 1 -s 8192 | ./parser >> test.log
If you want to submit your own results, please look at tests/SpeedTest.sh script for the test harness we use and modify it to suit your needs (in most cases only OS portability modifications may be necessary).
If you want to write and test your own parser, please look at tests/SpeedTest.cc and tests/SpiritSpeedTest.cc for examples that can be used as templates. Those examples include code to produce statistics that is used to populate tables and graphs provided here.
When in doubt, please ask before investing a lot of effort into something that may end up incompatible with existing rules or results.
Ideally, all tests should have been performed by an independent unbiased 3rd party, with all parser development teams providing methodology and rules input. While we wait for such a 3rd party to appear, we want to provide our users with meaningful performance results and comparisons, which implies obtaining and publishing results for parsers other than Hapy. We try to ensure basic fairness despite the obvious conflict of interests:
All tests are designed to mimic typical real-world scenarios and to avoid corner cases that would give unfair advantage to Hapy.
All tests use the same methodology, rules, and harness. No benchmark-special tuning or configuration is allowed.
The test methodology, rules, and results are publicly available for review, and feedback is solicited from other parser development teams.
Submissions and comments from other parser development teams or users are welcome.
If you have suggestions on how to improve quality and fairness of these tests, please let us know.
Besides general feedback, we do solicit test result submissions from users and other parser development teams. When running tests, please follow out testing methodology closely so that results are comparable. When submitting results, please disclose all necessary details, including:
Submittors are responsible for accuracy and quality of the results they submit. Accepted submissions are acknowledged in and become a part of the Hapy documentation.