The Lire::DlfConverter interface requires two kinds of methods. First, it requires methods which provide information to the framework on your converter. Second, it requires methods which will actually implement the conversion process. It this the format that this section documents.
The method name() should returns the name of our DLF converter. It is this name that is passed to the lr_log2report command. This name must be unique among all the converters registered and it should be restricted to alphanumerical characters (hyphens, period and underscores can also be used).
We will name our converter common_syslog:
sub name { return "common_syslog"; }
The next two required methods are used to give more verbose information on your converter to the users. The converter's title() and description() can be use to display information about your converter from the user interface or to generate documentation.
The title() should simply returns a string:
sub title { return "Common Log Format embedded in Syslog DLF Converter"; }
The description() method should returns a DocBook fragment describing your converter and the log formats it support. If you don't know DocBook just restrict yourself to using the para elements to make paragraphs:
sub description { return <<EOD; <para>This DLF Converter extracts web server's requests and error information from a syslog file. </para> <para>The requests and errors should be logged under the <literal>httpd</literal> program name. The errors are mapped to the <type>syslog</type> schema, the requests are mapped to the <type>www</type> schema. </para> <para>Syslog records from another program than <literal>httpd</literal> are ignored. </para> EOF }
Two other meta-data methods are used by the framework itself. The first one specifies to what DLF schemas your DLF converter is converting to:
sub schemas { return ( "www", "syslog" ); }
In our case, we are converting to the syslog and www schemas. Like we described it in our converter's description, we will map the web server's error message to the syslog schema and the request logs to the www schema. Other alternatives would have been to only map the requests information to www schema or map all the non-request records to the syslog schema. The rationale behind the current choice (besides this being an example) is that it make it convenient to process one log file to obtain a report containing the requests and errors from our web server. For that use case, it is best to ignore the non-web server related stuff.
The other method affects how the conversion process will be handled. Lire offers two mode of conversion, the line oriented one and the file oriented one. (Both will be described in the next section). If your log file is line-oriented (each lines is one log record) like most log files are, you should use the line-oriented conversion mode:
sub handle_log_lines { return 1; }
The actual conversion process is handled through three methods: init_dlf_converter, finish_conversion() and either process_log_file() or process_log_line() depending on the conversion mode (as determined by handle_log_lines()'s return value.
The method init_dlf_converter() will be called once before the log file is processed. It should be use to initialize the state of your converter. Since our DLF Converter doesn't need any initialization and doesn't need any configuration, the method is simply empty:
sub init_dlf_converter { my ( $self, $process ) = @_; return; }
The $process parameter which is passed to all the processing methods is an instance of Lire::DlfConverterProcess. This is the object which is driving the conversion process and it defines several methods which you will use in the actual conversion process.
The method finish_conversion() will be called once after the log file has been completely processed. This method will be mostly of use to stateful converter, that is DLF converters which generates DLF records from more than one line. Since this is not our case, we simply leave the method empty:
sub finish_conversion { my ( $self, $process ) = @_; return; }
Whether you are using the file-oriented or line-oriented conversion mode, the principles are the same. You extract information from the log file and creates DLF records from it. Your DLF converter communicates with the framework by calling methods on the Lire::DlfConverterProcess object which is passed as parameter to your methods.
Here is the complete code of our conversion method:
use Lire::Apache qw/parse_common/; sub process_log_line { my ( $self, $process, $line ) = @_; my $sys_rec = eval { $self->{syslog_parser}->parse( $line ) }; if ( $@ ) { $process->error( $@, $line ); return; } elsif ( $sys_rec->{process} ne 'httpd' ) { $process->ignore_log_line( $line, "not an httpd record" ); return; } else { my $common_dlf = {}; eval { parse_common( $sys_rec->{content}, $common_dlf ) }; if ( $@ ) { $sys_rec->{message} = $sys_rec->{content}; $process->write_dlf( "syslog", $sys_rec ); } else { $process->write_dlf( "www", $common_dlf ); } } }
The first thing that should be noted is that in the line-oriented conversion mode, the method process_log_line() will be called once for each line in the log file.
Secondly, the actual parsing of the line is done using two functions: parse_common and Lire::Syslog's parse. These methods simply uses regular expressions to extract the appropriate information from the line and put it in an hash reference. What is important is that these methods already uses as key names the schema's field names.
Finally, you can see that there are four different methods used on the $process object to report different kind of information:
The example uses the eval statement to trap errors during the syslog record parsing. If the line cannot be parsed as a valid syslog record, it is an error and it is reported through the error() method. The first parameter is the error message and the second one is the line to which the error is associated. This last parameter is optional.
When the syslog event doesn't come from the httpd process, we ignore the line. Ignored line are reported to the framework by using the ignore_log_line() method. The first parameter is the line which is ignored. The second optional parameter gives the reason why the line was ignored.
Finally, DLF records are created by using the write_dlf() method. Its first parameter is the schema to which the DLF record complies. This schema must be one that is listed by your converter's schemas() method. The second parameter is the DLF data contained in an hash reference. The DLF record will be created by taking for each field in the schema the value under the same name in the hash. (Since in the syslog schema, the field which contains the actual log message is called message, this is the reason we are assigning the content value to the message key.) Missing fields or fields whose value is undef will contains the special LR_NA missing value marker. Keys in the hash that don't map to a schema's field are simply ignored.
In our example, we distinguish between the server's error message (mapped to the syslog schema) and the request information (mapped to the www schema) based on whether parse_common succeeded in parsing the line.
Another possibility, not shown in our example, is to ask that the line be saved for a later processing. This is mostly of use to converters who maitains state between lines. In the cases, it is quite the case that there are related lines that are missing from the end of the log file. In that case, you save the line and they will automatically seen by the next run of your converter on the same DLF store. This option is only available in the line-oriented mode of conversion.
The same principles apply when you are using the file-oriented mode of conversion. This mode will usually be used for binary log formats or format which aren't line-oriented like XML.
For demonstration purpose, the following code could be added to transform our line-oriented converter into a file-oriented one:
sub handle_log_lines { return 0; } sub process_log_file { my ( $self, $process, $fh ) = @_; my $line; while ( defined( $line = <$fh> ) { chomp $line; $self->process_log_line( $process, $line ); } }
The difference between the above code and using the line oriented mode is that the framework won't be aware of the number of log lines processed and your converter might have troubles when processing log files which uses a different line-ending convention than the host you are runnig on. Bottom line is that you should use the line-oriented conversion mode when your log format is line oriented.