Configuration - RRDs
[Note: there is a list of supplied rrds with descriptions.]
These files are the most complicated. Here's an example, again
taken from the if-
rrd supplied with remstats.
collector unix-status
step 300
keys xyzzy
data in=interface_packets_in:* COUNTER:600:0:U
data ierr=interface_errors_in:* COUNTER:600:0:U
data out=interface_packets_out:* COUNTER:600:0:U
data oerr=interface_errors_out:* COUNTER:600:0:U
data coll=interface_collisions:* COUNTER:600:0:U
alert in < 100
alert out < 100
alert in nodata WARN
archives day-avg week-avg month-avg 3month-avg year-avg
times day yesterday week month 3month year
graph if-* desc='Interface data for ##RRD##'
--title 'Interface ##RRD## ##GRAPHTIME##'
--lower-limit 0
--vertical-label 'packets'
DEF:in=##DB##:in:AVERAGE
DEF:out=##DB##:out:AVERAGE
DEF:ierr=##DB##:ierr:AVERAGE
DEF:oerr=##DB##:oerr:AVERAGE
'LINE1:in###COLOR1##:Input Packets'
'LINE1:out###COLOR2##:Output Packets'
'LINE1:ierr###COLOR3##:Input Errors'
'LINE1:oerr###COLOR4##:Output Errors'
This example shows most things that can be done, except multiple graphs on
the same rrd, which is as simple as adding another graph line and its
definition.
First, the rrd name is special, in this case. Any rrd definition file
which ends in a '-' is assumed to be for a wildcard rrd, in this
case if-*
. This avoids problems with file-systems which are overly
fussy about which characters can be in file-names.
This rrd definition will match any rrd beginning with 'if-' specified
in a host config-file. Wildcard rrds are necessary when a given host
may have more than one of whatever the rrd is referring to, in this
case network interfaces. The network interface name will replace the
'*' in the rrd line in the host config-file. It will also be available
in the ##WILDPART##
magic cookie.
The collector XXX
tells which collector supplies the data
for this RRD. collector unix-status
means that this RRD gets its data
from the unix-status-collector. [This used to be called source
.]
The step
line sets the step value for the rrd. This is the expected
frequency of data updates. [See the manpage for rrdcreate.]
N.B. Setting this is required, but changing some RRDs won't
change how often the collectors run. You'll have to change the crontab
entries which run run-remstats2 to specify the shortest interval.
However, in the run-stages config-files you can
specify how often certain lines should be run.
The keys
line attaches an arbitrary attribute to this rrd. This
can be used by various programs to select subsets to operate on. E.G.
page-writer can be told to use this to select only some RRDs.
(FIXME - the writing stinks)
The data
lines define various DS elements for this RRD. [See the
manpage for rrdcreate.] The first part is the DS name, with an extension.
The collectors produce long names and may have instance-names added to the
variable name, in this case to tell which interface this data is for. So
the first part looks like dsname=variable:instance
. The
dsname
is used for the RRD DS name and the variable:instance
part is used to tell updater which collector information applies to this DS.
Most of the rest of the line is straight from rrdcreate's description of DS.
It's also possible to invoke remstats functions or configuration-supplied
private functions on the incoming raw data. The data
line
would look like:
data xyzzy=&function(variablename) ...
It's your responsibility to make sure that function
is available and that it
returns a valid number.
There is a pseudo-function builtin to the updater called FAKE
. Its
purpose is to allow testing of alerts, but it could also be used to allow
some foreign system to inject data, by putting its values there. It allows
you to replace collected values with values of your choice.
It will look for a file in the host's data directory called
FAKE-rrdname-varname
with rrdname
and varname
munged in the usual
fashion to make an acceptable file-name. If the file is there, then its
content will be returned (with leading and trailing whitespace removed)
as the value. If the file is not there, then the collected value will be
returned.
The alert
lines are setting the thresholds for alerts, in this
case for the variables in
and out
. They must
specify, in order: the variable-name, the relation (<, =, >, delta< and delta>)
and a space-separated list of thresholds. Since these examples only
supply one number each, they will only set OK or WARN statuses. If the
variables in
or out
have values less than (<) 100, they
are considered to be OK. Otherwise they're elevated to WARN status.
What will happen when they go into WARN status depends on the
alerts file. These alerts will apply to any host
which uses this rrd, unless the host overrides it.
The last alert specifies that missing data for the variable in
will be
considered to be status WARN
, for purposes of generating alerts.
The full description of the alerts is kept in the docs for alert-monitor
as it is the program which implements them.
The archives
line tells how to keep the data for this rrd, using
the names defined in the archives file.
There can be multiple graph
lines describing as many graphs from
the data in this rrd as you want. The graph-name must be wildcarded if the
rrd is. A graph
line is followed by its definition which must be
indented. The definition is straight from rrdgraph with the
magic cookie substitution. If you want a
description , you can add:
desc='whatever you want'
or
desc="whatever you want"
to the graph
line. This is used to set the alt text on the web-page.
Collector-specific Stuff
Any data
line can have extra stuff added to the end of the line. What
this extra stuff means depends on the RRD.
Any data collected by the unix-status-collector will have a section-name
added to the end, to be passed on to the unix-status-server to tell it
which sections need to be run for this host. Some sections will also
specify extra information. E.G. the procname
section will need the name
of the process to be counted. [See the unix-status-server for complete
specifications.]
An rrd collected by the log-collector will have extra stuff
on each data
line after the DS information. The extra stuff will be
the function and pattern needed by log-collector to pass to the
log-server to get that variable's data.
An rrd collected by the port-collector may specify that this particular
service is critical, by simply including the word "critical" at the end of
line. This will cause the status to be elevated to CRITICAL status if
the status ever reaches ERROR level.
An RRD collected by the snmp-collector needs to specify which OIDs to fetch.
They are specified by name in the RRD with a line like:
oid APCUpsAdvInputLineVoltage
which refers to a name defined earlier in the
oids config-file.
An RRD collected by the snmp-collector
may also specify an SNMP port to use
with a line like:
port 3401
For an RRD collected by the dbi-collector
, the data
directive will
require the column number in the select that will provide its data. This
will be appended to the data
directive like:
data xxx=yyy GAUGE:600:0:U COL=12
There are also pseudo-columns called STATUS
and RESPONSE
to enable the
rrd definition to reference the connection-status and response-time.
[