Name

ganglia - distributed monitoring system


Version

ganglia 2.5.5

The latest version of this software and document will always be found at http://ganglia.sourceforge.net/


Synopsis

     ______                  ___
    / ____/___ _____  ____ _/ (_)___ _
   / / __/ __ `/ __ \/ __ `/ / / __ `/
  / /_/ / /_/ / / / / /_/ / / / /_/ /
  \____/\__,_/_/ /_/\__, /_/_/\__,_/
                   /____/ Distributed Monitoring System

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.

The ganglia system is comprised of two unique daemons, a PHP-based web frontend and a few other small utility programs.

Ganglia Monitoring Daemon (gmond)
Gmond is a multi-threaded daemon which runs on each cluster node you want to monitor. Installation is easy. You don't have to have a common NFS filesystem or a database backend, install special accounts, maintain configuration files or other annoying hassles. Gmond is its own redundant, distributed database.

Gmond has four main responsibilities: monitor changes in host state, multicast relevant changes, listen to the state of all other ganglia nodes via a multicast channel and answer requests for an XML description of the cluster state.

Each gmond transmits in information in two different ways: multicasting host state in external data representation (XDR) format or sending XML over a TCP connection.

Ganglia Meta Daemon (gmetad)
Federation in Ganglia is achieved using a tree of point-to-point connections amongst representative cluster nodes to aggregate the state of multiple clusters. At each node in the tree, a Ganglia Meta Daemon (gmetad) periodically polls a collection of child data sources, parses the collected XML, saves all numeric, volatile metrics to round-robin databases and exports the aggregated XML over a TCP sockets to clients. Data sources may be either gmond daemons, representing specific clusters, or other gmetad daemons, representing sets of clusters. Data sources use source IP addresses for access control and can be specified using multiple IP addresses for failover. The latter capability is natural for aggregating data from clusters since each gmond daemon contains the entire state of its cluster.

Ganglia PHP Web Frontend
The Ganglia web frontend provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users. Although the web frontend to ganglia started as a simple HTML view of the XML tree, it has evolved into a system that keeps a colorful history of all collected data.

The Ganglia web frontend caters to system administrators and users. For example, one can view the CPU utilization over the past hour, day, week, month, or year. The web frontend shows similar graphs for Memory usage, disk usage, network statistics, number of running processes, and all other Ganglia metrics.

The web frontend depends on the existence of the gmetad which provides it with data from several Ganglia sources. Specifically, the web frontend will open the local port 8651 (by default) and expects to receive a Ganglia XML tree. The web pages themselves are highly dynamic; any change to the Ganglia data appears immediately on the site. This behavior leads to a very responsive site, but requires that the full XML tree be parsed on every page access. Therefore, the Ganglia web frontend should run on a fairly powerful, dedicated machine if it presents a large amount of data.

The Ganglia web frontend is written in the PHP scripting language, and uses graphs generated by gmetad to display history information. It has been tested on many flavours of Unix (primarily Linux) with the Apache webserver and the PHP 4.1 module.


Installation

The latest version of all ganglia software can always be downloaded from http://ganglia.sourceforge.net/downloads.php

Ganglia runs on Linux (i386, ia64, sparc, alpha, powerpc, m68k, mips, arm, hppa, s390), Solaris, FreeBSD, AIX, IRIX, Tru64, HPUX, MacOS X and Windows (cygwin beta) making it as portable as it is scalable.

Monitoring Core Installation

If you use the Linux RPMs provided on the ganglia web site, you can skip to the end of this section.

Ganglia uses the GNU autoconf so compilation and installation of the monitoring core is basically

  % ./configure
  % make
  % make install

but there are some issues that you need to take a look at first.

Kernel multicast support
Currently ganglia will only run on machines with multicast support. The vast majority of machines have multicast support by default. If you have problems with ganglia this is a core issue. Later versions of ganglia will not have the multicast requirement.

Gmetad is not installed by default
Since gmetad relies on the Round-Robin Database Tool ( see http://www.rrdtool.com/ ) it will not be compiled unless you explicit request it by using a --with-gmetad flag.
  % ./configure --with-gmetad

The configure script will fail if it cannot find the rrdtool library and header files. By default, it expects to find them at /usr/include/rrd.h and /usr/lib/librrd.a. If you installed them in different locations then you need to add the following configure flags

  % ./configure CFLAGS="-I/rrd/header/path" CPPFLAGS="-I/rrd/header/path" \
     LDFLAGS="-L/rrd/library/path" --with-gmetad

of course, you need to substitute /rrd/header/path and /rrd/library/path with the real location of the rrd tool header file and library respectively.

AIX should not be compiled with shared libraries
You must add the --disable-shared and --enable-static configure flags if you running on AIX
  % ./configure --disable-shared --enable-static

GEXEC confusion
GEXEC is a scalable cluster remote execution system which provides fast, RSA authenticated remote execution of parallel and distributed jobs. It provides transparent forwarding of stdin, stdout, stderr, and signals to and from remote processes, provides local environment propagation, and is designed to be robust and to scale to systems over 1000 nodes. Internally, GEXEC operates by building an n-ary tree of TCP sockets and threads between gexec daemons and propagating control information up and down the tree. By using hierarchical control, GEXEC distributes both the work and resource usage associated with massive amounts of parallelism across multiple nodes, thereby eliminating problems associated with single node resource limits (e.g., limits on the number of file descriptors on front-end nodes). (from http://www.theether.org/gexec )

gexec is a great cluster execution tool but integrating it with ganglia is very clumsy to say the least. GEXEC can run standalone without access to a ganglia gmond. In standalone mode gexec will use the hosts listed in your GEXEC_SVRS variable to run on. For example, say I want to run hostname on three machines in my cluster: host1, host2 and host3. I use the following command line.

  % GEXEC_SVRS="host1 host2 host3" gexec -n 3 hostname

and gexec would build an n-ary tree (binary tree by default) of TCP sockets to those machines and run the command hostname

As an added feature, you can have gexec pull a host list from a locally running gmond and use that as the host list instead of GEXEC_SVRS. The list is load balanced and gexec will start the job on the n least-loaded machines.

For example..

  % gexec -n 5 hostname

will run the command hostname on the five least-loaded machines in a cluster.

To turn on the gexec feature in ganglia you must configure ganglia with the --enable-gexec flag

  % ./configure --enable-gexec

Enabling gexec means that by default any host running gmond will send a special multicast message announcing that gexec is installed on it and open for requests.

Now the question is, what if I don't want gexec to run on every host in my cluster? For example, you may not want to have gexec run jobs on your cluster frontend nodes.

You simply add the following line to your gmond configuration file (/etc/gmond.conf by default)

  no_gexec on

Simple huh? I know the configuration file option, no_gexec, seems crazy (and it is). Why have an option that says ``yes to no gexec''? The early versions of gmond didn't use a configuration file but instead commandline options. One of the commandline options was simply --no-gexec and the default was to announce gexec as on.

Once you have successfully run

  % ./configure
  % make
  % make install

you should find the following files installed in /usr (by default).

  /usr/bin/gstat
  /usr/bin/gmetric
  /usr/sbin/gmond
  /usr/sbin/gmetad

If you installed ganglia using RPMs then these files will be installed when you install the RPM. The RPM is installed simply by running

  % rpm -Uvh ganglia-monitor-core-2.5.5.tar.gz

Once you have the necessary binaries installed, you can test your installation by running

   % ./gmond

This will start the ganglia monitoring daemon. You should then be able to run

   % telnet localhost 8649

And get an XML description of the state of your machine (and any other hosts running gmond at the time).

If you are installing by source on Linux, scripts are provided to start gmetad and gmond at system startup. They are easy to install from the source root.

   % cp ./gmond/gmond.init /etc/rc.d/init.d/gmond
   % chkconfig --add gmond
   % chkconfig --list gmond
     gmond              0:off   1:off   2:on    3:on    4:on    5:on    6:off
   % /etc/rc.d/init.d/gmond start
     Starting GANGLIA gmond:                                    [  OK  ]

Repeat this step with gmetad.

PHP Web Frontend Installation

  1. Unzip the webfrontend distribution in your website tree. This is often under the directory /var/www/html, however look for the variable DocumentRoot in your Apache configuration files to be sure. All the PHP script files use relative URLs in their links, so you may place the gmetad-webfrontend/ directory anywhere convenient. I like to unzip *tar.gz files with one tar command:
      % cd /var/www/html
      % tar xvzf gmetad-webfrontend-2.5.0.tar.gz

  2. Ensure your webserver understands how to process PHP script files. Currently, the web frontend contains certain php language that requires PHP version 4 or greater. Processing PHP script files usually requires a webserver module, such as the mod_php for the popular Apache webserver. In RedHat Linux, the RPM package that provides this module is called simply ``php''.

    For Apache, mod_php module must be enabled. The following lines should appear somewhere in Apache's *conf files. This example applies to RedHat and Mandrake Linux. The actual filenames may vary on your system. If you installed the php module using an RPM package, this work will have been done automatically.

      <IfDefine HAVE_PHP4>
      LoadModule php4_module    extramodules/libphp4.so
      AddModule mod_php4.c
      </IfDefine>
      AddType  application/x-httpd-php         .php .php4 .php3 .phtml
      AddType  application/x-httpd-php-source  .phps

  3. The webfrontend requires the existance of the gmetad package on the webserver. Follow the installation instructions on the gmetad page. Specifically, the webfrontend requires the rrdtool and the rrds/ directory from gmetad. If you are a power user, you may use NFS to simulate the local existance of the rrds.

  4. Test your installation. Visit the URL:
      http://localhost/gmetad-webfrontend/

    With a web-browser, where localhost is the address of your webserver.

Installation of the web frontend is simplified on Linux by using rpm.

  % rpm -Uvh gmetad-webfrontend-2.5.5-1.i386.rpm
  Preparing...                ########################################### [100%]
     1:gmetad-webfrontend     ########################################### [100%]


Configuration

Gmond Configuration

While the default options for gmond will work for most clusters, gmond is very flexible and can be customize with the configuration file: /etc/gmond.conf.

/etc/gmond.conf is not required as its absence will only cause gmond to start in a default configuration. Here is a sample of a gmond.conf configuration file with comment to help you configure gmond

   # $Id: ganglia.pod,v 1.2 2003/11/06 04:51:11 massie Exp $
   # This is the configuration file for the Ganglia Monitor Daemon (gmond)
   # Documentation can be found at http://ganglia.sourceforge.net/docs/
   #
   # To change a value from it's default simply uncomment the line
   # and alter the value
   #####################
   #
   # The name of the cluster this node is a part of
   # default: "unspecified"
   # name  "My Cluster"
   #
   # The owner of this cluster. Represents an administrative
   # domain. The pair name/owner should be unique for all clusters
   # in the world.
   # default: "unspecified"
   # owner "My Organization"
   #
   # The latitude and longitude GPS coordinates of this cluster on earth.
   # Specified to 1 mile accuracy with two decimal places per axis in Decimal
   # DMS format: "N61.18 W130.50".
   # default: "unspecified"
   # latlong "N32.87 W117.22"
   #
   # The URL for more information on the Cluster. Intended to give purpose,
   # owner, administration, and account details for this cluster.
   # default: "unspecified"
   # url "http://www.mycluster.edu/";
   #
   # The location of this host in the cluster. Given as a 3D coordinate:
   # "Rack,Rank,Plane" that corresponds to a Euclidean coordinate "x,y,z".
   # default: "unspecified"
   # location "0,0,0"
   #
   # The multicast channel for gmond to send/receive data on
   # default: 239.2.11.71
   # mcast_channel 239.2.11.71
   #
   # The multicast port for gmond to send/receive data on
   # default: 8649
   # mcast_port    8649
   #
   # The multicast interface for gmond to send/receive data on
   # default: the kernel decides based on routing configuration
   # mcast_if  eth1
   #
   # The multicast Time-To-Live (TTL) for outgoing messages
   # default: 1
   # mcast_ttl  1
   #
   # The number of threads listening to multicast traffic
   # default: 2
   # mcast_threads 2
   #
   # Which port should gmond listen for XML requests on
   # default: 8649
   # xml_port     8649
   #
   # The number of threads answering XML requests
   # default: 2
   # xml_threads   2
   #
   # Hosts ASIDE from "127.0.0.1"/localhost and those multicasting
   # on the same multicast channel which you will share your XML
   # data with.  Multiple hosts are allowed on multiple lines.
   # Can be specified with either hostnames or IP addresses.
   # default: none
   # trusted_hosts 1.1.1.1 1.1.1.2 1.1.1.3 \
   # 2.3.2.3 3.4.3.4 5.6.5.6
   #
   # The number of nodes in your cluster.  This value is used in the
   # creation of the cluster hash.
   # default: 1024
   # num_nodes  1024
   #
   # The number of custom metrics this gmond will be storing.  This
   # value is used in the creation of the host custom_metrics hash.
   # default: 16
   # num_custom_metrics 16
   #
   # Run gmond in "mute" mode.  Gmond will only listen to the multicast
   # channel but will not send any data on the channel.
   # default: off
   # mute on
   #
   # Run gmond in "deaf" mode.  Gmond will only send data on the multicast
   # channel but will not listen/store any data from the channel.
   # default: off
   # deaf on
   #
   # Run gmond in "debug" mode.  Gmond will not background.  Debug messages
   # are sent to stdout.  Value from 0-100.  The higher the number the more
   # detailed debugging information will be sent.
   # default: 0
   # debug_level 10
   #
   # If you don't want gmond to setuid, set this to "on"
   # default: off
   # no_setuid  on
   #
   # Which user should gmond run as?
   # default: nobody
   # setuid     nobody
   #
   # If you do not want this host to appear in the gexec host list, set
   # this value to "on"
   # default: off
   # no_gexec   on
   #
   # If you want any host which connects to the gmond XML to receive
   # data, then set this value to "on"
   # default: off
   # all_trusted on

If you want to customize the operation of gmond, simply edit this file and save it to /etc/gmond.conf. You can create multiple gmond configurations by writing the configuration file to a different file, say /etc/gmond_test.conf, and the using the --conf option of gmond to specify which configuration file to use.

  % ./gmond --conf=/etc/gmond_test.conf

would start gmond with the settings in /etc/gmond_test.conf

Gmetad Configuration

The behavior of the Ganglia Meta Daemon is completely controlled by a single configuration file which is by default /etc/gmetad.conf. For gmetad to do anything useful you much specify at least one data_source in the configuration. The format of the data_source line is as follows

  data_source "Cluster A" 127.0.0.1  1.2.3.4:8655  1.2.3.5:8625
  data_source "Cluster B" 1.2.4.4:8655

In this example, there are two unique data sources: ``Cluster A'' and ``Cluster B''. The Cluster A data source has three redundant sources. If gmetad cannot pull the data from the first source, it will continue trying the other sources in order.

If you do not specify a port number, gmetad will assume the default ganglia port which is 8649 (U*N*I*X on a phone key pad)

Here is a sample gmetad configuration file with comments

   # This is an example of a Ganglia Meta Daemon configuration file
   #                http://ganglia.sourceforge.net/
   #
   #-------------------------------------------------------------------------------
   # Setting the debug_level to 1 will keep daemon in the forground and
   # show only error messages. Setting this value higher than 1 will make 
   # gmetad output debugging information and stay in the foreground.
   # default: 0
   # debug_level 10
   #
   #-------------------------------------------------------------------------------
   # What to monitor. The most important section of this file. 
   #
   # The data_source tag specifies either a cluster or a grid to
   # monitor. If we detect the source is a cluster, we will maintain a complete
   # set of RRD databases for it, which can be used to create historical 
   # graphs of the metrics. If the source is a grid (it comes from another gmetad),
   # we will only maintain summary RRDs for it.
   #
   # Format: 
   # data_source "my cluster" [polling interval] address1:port addreses2:port ...
   # 
   # The keyword 'data_source' must immediately be followed by a unique
   # string which identifies the source, then an optional polling interval in 
   # seconds. The source will be polled at this interval on average. 
   # If the polling interval is omitted, 15sec is asssumed. 
   #
   # A list of machines which service the data source follows, in the 
   # format ip:port, or name:port. If a port is not specified then 8649
   # (the default gmond port) is assumed.
   # default: There is no default value
   #
   # data_source "my cluster" 10 localhost  my.machine.edu:8649  1.2.3.5:8655
   # data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651
   # data_source "another source" 1.3.4.7:8655  1.3.4.8
   
   data_source "my cluster" localhost
   
   #
   #-------------------------------------------------------------------------------
   # Scalability mode. If on, we summarize over downstream grids, and respect
   # authority tags. If off, we take on 2.5.0-era behavior: we do not wrap our output
   # in <GRID></GRID> tags, we ignore all <GRID> tags we see, and always assume
   # we are the "authority" on data source feeds. This approach does not scale to
   # large groups of clusters, but is provided for backwards compatibility.
   # default: on
   # scalable off
   #
   #-------------------------------------------------------------------------------
   # The name of this Grid. All the data sources above will be wrapped in a GRID
   # tag with this name.
   # default: Unspecified
   # gridname "MyGrid"
   #
   #-------------------------------------------------------------------------------
   # The authority URL for this grid. Used by other gmetads to locate graphs
   # for our data sources. Generally points to a ganglia/
   # website on this machine.
   # default: "http://hostname/ganglia/";,
   #   where hostname is the name of this machine, as defined by gethostname().
   # authority "http://mycluster.org/newprefix/";
   #
   #-------------------------------------------------------------------------------
   # List of machines this gmetad will share XML with. Localhost
   # is always trusted. 
   # default: There is no default value
   # trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org
   #
   #-------------------------------------------------------------------------------
   # If you want any host which connects to the gmetad XML to receive
   # data, then set this value to "on"
   # default: off
   # all_trusted on
   #
   #-------------------------------------------------------------------------------
   # If you don't want gmetad to setuid then set this to off
   # default: on
   # setuid off
   #
   #-------------------------------------------------------------------------------
   # User gmetad will setuid to (defaults to "nobody")
   # default: "nobody"
   # setuid_username "nobody"
   #
   #-------------------------------------------------------------------------------
   # The port gmetad will answer requests for XML
   # default: 8651
   # xml_port 8651
   #
   #-------------------------------------------------------------------------------
   # The port gmetad will answer queries for XML. This facility allows
   # simple subtree and summation views of the XML tree.
   # default: 8652
   # interactive_port 8652
   #
   #-------------------------------------------------------------------------------
   # The number of threads answering XML requests
   # default: 4
   # server_threads 10
   #
   #-------------------------------------------------------------------------------
   # Where gmetad stores its round-robin databases
   # default: "/var/lib/ganglia/rrds"
   # rrd_rootdir "/some/other/place"

gmetad has a --conf option to allow you to specify alternate configuration files

  % ./gmetad -conf=/tmp/my_custom_config.conf

PHP Web Frontend Configuration

Most configuration parameters reside in the gmetad-webfrontend/conf.php file. Here you may alter the template, gmetad location, RRDtool location, and set the default time range and metrics for graphs.

The static portions of the Ganglia website are themable. This means you can alter elements such as section lables, some links, and images to suit your individual tastes and environment. The template_name variable names a directory containing the current theme. Ganglia uses TemplatePower to implement themes. A user-defined skin must conform to the template interface as defined by the default theme. Essentially, the variable names and START/END blocks in a custom theme must remain the same as the default, but all other HTML elements may be changed.

Other configuration variables in conf.php specify the location of gmetad's files, and where to find the rrdtool program. These locations need only be changed if you do not run gmetad on the webserver. Otherwise the default locations should work fine. The default_range variable specifies what range of time to show on the graphs by default, with possible values of hour, day, week, month, year. The default_metric parameter specifies which metric to show on the cluster view page by default.


Commandline Tools

There are two commandline tools that work with gmond to add custom metrics and query the current state of a cluster: gmetric and gstat respectively.

Gmetric

The Ganglia Metric Tool (gmetric) allows you to easily monitor any arbitrary host metrics that you like expanding on the core metrics that gmond measures by default.

If you want help with the gmetric sytax, simply use the ``help'' commandline option

  % gmetric --help
    gmetric 2.5.5
   Purpose:
     The Ganglia Metric Client (gmetric) announces a metric
     value to all Ganglia Monitoring Daemons (gmonds) that are listening
     on the cluster multicast channel.
   Usage: gmetric [OPTIONS]...
      -h         --help                  Print help and exit
      -V         --version               Print version and exit
      -nSTRING   --name=STRING           Name of the metric
      -vSTRING   --value=STRING          Value of the metric
      -tSTRING   --type=STRING           Either string|int8|uint8|int16|uint16|int32|uint32|float|double
      -uSTRING   --units=STRING          Unit of measure for the value e.g. Kilobytes, Celcius
      -sSTRING   --slope=STRING          Either zero|positive|negative|both (default='both')
      -xINT      --tmax=INT              The maximum time in seconds between gmetric calls (default=60)
      -dINT      --dmax=INT              The lifetime in seconds of this metric (default=0)
      -cSTRING   --mcast_channel=STRING  Multicast channel to send/receive on (default='239.2.11.71')
      -pINT      --mcast_port=INT        Multicast port to send/receive on (default=8649)
      -iSTRING   --mcast_if=STRING       Network interface to multicast on e.g. 'eth1' (default='kernel decides') 
      -lINT      --mcast_ttl=INT         Multicast Time-To-Live (TTL) (default=1)

The gmetric tool formats a special multicast message and sends it to all gmonds that are listening.

All metrics in ganglia have a name, value, type and optionally units. For example, say I wanted to measure the temperature of my CPU (something gmond doesn't do by default) then I could multicast this metric with name=``temperature'', value=``63'', type=``int16'' and units=``Celcius''.

Assume I have a program called cputemp which outputs in text the temperature of the CPU

  % cputemp
  63

I could easily send this data to all listening gmonds by running

  % gmetric --name temperature --value `cputemp` --type int16 --units Celcius

Check the exit value of gmetric to see if it successfully sent the data: 0 on success and -1 on failure.

To constantly sample this temperature metric, you just need too add this command to your cron table.

Gstat

The Ganglia Cluster Status Tool (gstat) is a commandline utility that allows you to get status report for your cluster.

To get help with the commandline options, simply pass gstat the --help option

  % gstat --help
  gstat 2.5.5
  Purpose:
    The Ganglia Status Client (gstat) connects with a
    Ganglia Monitoring Daemon (gmond) and output a load-balanced list
    of cluster hosts
  Usage: gstat [OPTIONS]...
     -h         --help             Print help and exit
     -V         --version          Print version and exit
     -a         --all              List all hosts.  Not just hosts running gexec (default=off)
     -d         --dead             Print only the hosts which are dead (default=off)
     -m         --mpifile          Print a load-balanced mpifile (default=off)
     -1         --single_line      Print host and information all on one line (default=off)
     -l         --list             Print ONLY the host list (default=off)
     -iSTRING   --gmond_ip=STRING  Specify the ip address of the gmond to query (default='127.0.0.1')
     -pINT      --gmond_port=INT   Specify the gmond port to query (default=8649)


Troubleshooting (FAQ)

Solaris, IRIX, Tru64
Here is an email from Steve Wagner about the state of the ganglia on Solaris, IRIX and Tru64. Steve is to thank for porting ganglia to Solaris and Tru64. He also helped with the IRIX port.
   State of the IRIX port:
   
   *  CPU percentage stuff hasn't improved despite my efforts.  I fear there
      may be a flaw in the way I'm summing counters for all the CPUs.
   *  Auto-detection of network interfaces apparently segfaults.
   *  Memory and load reporting appear to be running properly.
   *  CPU speed is not being reported properly on multi-proc machines.
   *  Total/running processes are not reported.
   *  gmetad untested.
   *  Monitoring core apparently stable in foreground, background being tested
   (had a segfault earlier).
   
   State of the Tru64 port:
   
   *  CPU percentage stuff here works perfectly.
   *  Memory and swap usage stats are suspected to be inaccurate.
   *  Total/running processes are not reported.
   *  gmetad untested.
   *  Monitoring core apparently stable in foreground and background.
   
   State of the Solaris port:
   *  CPU percentages are slightly off, but correct enough for trending
      purposes.
   *  Load, ncpus, CPU speed, breads/writes, lreads/writes, phreads/writes,
      and rcache/wcache are all accurate.
   *  Memory/swap statistics are suspiciously flat, but local stats bear
      this out (and they *are* being updated) so I haven't investigated
      further.
   *  Total processes are counted, but not running ones.
   *  gmetad appears stable
   
   Anyway, all three ports I've been messing with are usable and fairly
   stable.  Although there are areas for improvement I think we really can't
   keep hogging all this good stuff - what I'm looking at is ready for
   release.

Debian Users
Here is an email message from Preston Smith for Debian users
 Debian packages for Debian 3.0 (woody) are available at
  http://www.physics.purdue.edu/~psmith/ganglia
 (i386, sparc, and powerpc are there presently, more architectures will
  appear when I get them built.)
 Packages for "unstable" (sid) will be available in the main Debian
  archive soon.
 Also, a CVS note: I checked in the debian/ directory used to create
 debian packages.

Multihomed Machines
Here is an email that Matt Massie sent to a user having problems with multihomed machines
   i need to add a section in the documentation talking about this since it 
   seems to be a common question.
   
   when you use...
   
   mcast_if eth1
   
   .. in /etc/gmond.conf that tells gmond to send its data out the "eth1"
   network interface but that doesn't necessarily mean that the source
   address of the packets will match the "eth1" interface.  to make sure that
   data sent out eth1 has the correct source address run the following...
   
   % route add -host 239.2.11.71 dev eth1
   
   ... before starting gmond.  that should do the trick for you.
   
   -matt
   
   > I have seen some post related to some issues
   > with gmond + multicast running on a dual nic
   > frontend.
   > 
   > Currently I am experiencing a weird behavior
   > 
   > I have the following setup:
   > 
   >   -----------------------
   >   | web server + gmetad |
   >   -----------------------
   >              |
   >              |
   >              |
   >     ----------------------
   >     |   eth0 A.B.C.112   |
   >     |                    |
   >     |  Frontend + gmond  |
   >     |                    |
   >     | eth1 192.168.100.1 |
   >     ----------------------
   >              |
   >              |
   > 
   >        26 nodes each
   >           gmond
   > 
   > In the frontend /etc/gmond.conf I have the
   > following statement: mcast_if  eth1
   > 
   > The 26 nodes are correctly reported. 
   > 
   > However the Frontend is never reported.
   > 
   > I am running iptables on the Frontend, and I am seing
   > things like:
   > 
   > INPUT packet died: IN=eth1 OUT= MAC= SRC=A.B.C.112 DST=239.2.11.71 
   > LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=53740 DF PROTO=UDP SPT=41608 DPT=8649
   > LEN=16 
   > 
   > I would have expected the source to be 192.168.100.1 with mcast_if eth1
   > 
   > Any idea ?

Cisco Catalyst Switches
Perhaps information regarding gmond on networks set up through cisco catalyst switches should be mentioned in the ganglia documentation. I think by default multicast traffic on the catalyst will flood all devices unless configured properly. Here is a relavent snipet from a message forum, with a link to cisco document.

If what you are trying to do, is minimizing the impact on your network due to a multicast application, this link may describe what you want to do: http://www.cisco.com/warp/public/473/38.html

We set up our switches according to this after a consultant came in and installed an application multicasting several hundred packets per second. This made the network functional again.


Getting Support

  The tired and thirsty prospector threw himself down at the edge of the 
  watering hole and started to drink. But then he looked around and saw 
  skulls and bones everywhere. "Uh-oh," he thought. "This watering hole 
  is reserved for skeletons." --Jack Handey

There are three mailing lists available to you: ganglia-general, ganglia-developers and ganglia-announce. You can join these lists or read their archives by visiting https://sourceforge.net/mail/?group_id=43021

When you need help please follow these steps until your problem is resolved.

  1. completely read the documentation

  2. check the ganglia-general archive to see if other people have had the same problem

  3. post your support request to the ganglia-general mailing list

  4. check the ganglia-developers archive

  5. post your question to the ganglia-developers list

please send all bugs, patches, and feature requests to the ganglia-developers list after you have checked the ganglia-developers archive to see if the question has already been asked and answered.


Copyright

  Copyright (C) 2002,2003 University of California, Berkeley
 
The ganglia source tree incorporated great source code from other projects as well.
  Copyright (c) 2000 Dug Song <dugsong@monkey.org>
  Copyright (C) 1999,2000,2001,2002 Lukas Schroeder <lukas@azzit.de>,
   and others.
  Copyright (C) 1991, 1992, 1996, 1998, 1999 Free Software Foundation, Inc.
  Copyright (C) 2000  David Helder
  Copyright (C) 2000  Andrew Lanoix


Authors

  Matt Massie <massie@CS.Berkeley.EDU>

and the Ganglia Development Team...

 Bas van der Vlies      basv               Developer    basv at users.sourceforge.net 
 Neil T. Spring         bluehal            Developer    bluehal at users.sourceforge.net
 Brooks Davis           brooks_en_davis    Developer    brooks_en_davis at users.sourceforge.net
 Eric Fraser            fraze              Developer    fraze at users.sourceforge.net 
 greg bruno             gregbruno          Developer    gregbruno at users.sourceforge.net
 Jeff Layton            laytonjb        Developer       laytonjb at users.sourceforge.net       
 Doc Schneider          maddocbuddha    Developer       maddocbuddha at users.sourceforge.net 
 Mason Katz             masonkatz       Developer       masonkatz at users.sourceforge.net      
 Mike Howard            mhoward         Developer       mhoward at users.sourceforge.net        
 Oliver Mössinger      olivpass        Developer       olivpass at users.sourceforge.net       
 Preston Smith          pmsmith         Developer       pmsmith at users.sourceforge.net        
 Federico David Sacerdoti sacerdoti     Developer       sacerdoti at users.sourceforge.net      
 Tim Cera               timcera         Developer       timcera at users.sourceforge.net        
 Mathew Benson          wintermute11    Developer       wintermute11 at users.sourceforge.net


Contributors

There have been dozens of contributors who have provided patches and helpful bug reports. We need to list them here later.