MySQL Cluster uses the new NDBCluster
storage engine to enable running several
MySQL servers in a cluster.
The NDBCluster
storage engine is available from MySQL source release 4.1.2,
and in binary releases from MySQL-Max 4.1.3.
Supported operating systems are currently Linux, Mac OS X, and Solaris.
Naturally, we are working to make NDBCluster
run on all of the operating
systems that MySQL itself also runs on, including Windows.
This chapter represents work in progress. Other documents describing MySQL Cluster can be found at http://www.mysql.com/cluster/ and http://dev.mysql.com/doc/#cluster.
You may also wish to subscribe to the MySQL Cluster mailing list. See http://lists.mysql.com/.
MySQL Cluster
is a new technology to enable clustering of in-memory
databases in a share-nothing system. The share-nothing architecture allows
the system to work with very inexpensive hardware, without any specific
requirement on hardware or software. It also does not have any single
point of failure since each component has its own memory and disk.
MySQL Cluster is an integration of the standard MySQL server with an
in-memory clustered storage engine, called NDB
. In our
documentation, the term NDB
refers to the storage engine specific
part of the setup, whereas MySQL Cluster
refers to the combination
of MySQL and the new storage engine.
A MySQL Cluster consists of computers with a set of processes executing several MySQL servers, storage nodes for NDB Cluster, management servers and possibly also specialized data access programs. All these programs work together to form MySQL Cluster. When data is stored in the NDBCluster storage engine, the tables are stored in the storage nodes for NDB Cluster. Those tables are directly accessible also from all other MySQL servers in the cluster. Thus, if one application updates the salary of an employee all other MySQL servers that query this data can see it immediately.
The data stored in the storage nodes for MySQL Cluster can be mirrored and can handle failures of storage nodes with no other impact than that a number of transactions are aborted due to losing the transaction state. This should cause no problems bcause transactional applications should be written to handle transaction failure.
By bringing MySQL Cluster to the open source world, MySQL makes clustered data management with high availability, high performance and scalability available to all who need it.
NDB
is an in-memory storage engine offering high-availability
and data-persistence features.
NDB
can (although this requires extensive knowledge) be used as an
independent database system, supporting the traditional relational
data model with full ACID transactions.
The NDB
storage engine can be configured with a range of fail-over and
load-balancing options, but it is easiest to start with the storage
engine at the cluster level. The NDB
storage engine of MySQL Cluster
contains a complete set of data, dependent only on other data within
the cluster itself.
A MySQL Cluster may also replicate clustered data to other MySQL Clusters,
but this is a complex configuration. Here, we will focus on how to set up
a single MySQL Cluster consisting of an NDB
storage engine and some
MySQL servers.
The cluster part of MySQL Cluster is currently configured independently
from the MySQL servers. In an MySQL Cluster each part of the cluster
is considered to be a node
.
Note: A node in many contexts is often a computer, but for MySQL Cluster it is a process. There can be any number of nodes on a single computer.
Each node has a type, and there can be multiple nodes in the MySQL Cluster of each type. In a minimal MySQL Cluster configuration, there will be at least three nodes:
MGM
) node. The role of this type of node is to
manage the other nodes within the MySQL Cluster, such as providing
configuration data, starting and stopping nodes, running backup etc.
As this node type manages the configuration of the other nodes, a
node of this type must always be started first, before any other
node. With a running cluster, the MGM node does necessarily have to
be running all the time.
DB
) node. This is the type of node that
manages and stores the database itself. There are as many DB nodes
as you have replicas times the number of fragments. That is, with
two fragments, each with two replicas, you need four DB nodes. Note
that it is not necessary to have more than one replica, so a minimal
MySQL Cluster may contain just one DB node.
API
) node. This is the client node that will access
the cluster, and in the case of MySQL Cluster, these are traditional
MySQL servers with a new storage engine NDBCluster
which enables
access to clustered tables. Basically, the MySQL daemon is a client
of the NDB cluster. If you have applications that use the NDB API
directly, then these are considered API nodes too.
We refer to these cluster processes as nodes in the cluster. Setting up the configuration of the cluster involves configuring each individual node in the cluster and setting up each individual communication link between the nodes in the cluster. MySQL Cluster currently is designed with the intention that storage nodes are homogenous in terms of processor power, memory space, and communication bandwidth. Also, to enable one point of configuration, it was decided to move the entire cluster configuration to one configuration file.
The management server manages the cluster configuration file and the cluster log. All nodes in the cluster contact the management server to retrieve their part of the configuration, so they need a way to determine where the management server resides. When interesting events occur in the storage nodes, they transfer the information of these events to the management server, which then writes the information to the cluster log.
In addition, there are any number of clients to the cluster. These are of two types. First, there are the normal MySQL clients that are no different for MySQL Cluster. MySQL Cluster can be accessed from all MySQL applications written in PHP, Perl, C, C++, Java, Ruby, and so forth. Second, there are management clients. These clients access the management server and provide commands to start and stop nodes gracefully, to start and stop message tracing (only in debug versions), to print configuration, to show node status of all nodes in the cluster, to show versions of all nodes in the cluster, to start and stop backups, and so forth.
A MySQL server that is part of MySQL Cluster differs in only one aspect
from what we are used to, it has an additional storage engine
(NDB
or NDBCLUSTER
), which is initially disabled.
Except for this, the MySQL server is not much different than what we are
used to from previous MySQL releases, except any other new 4.1 features,
of course. By default, the MySQL is configured with the NDB
storage
engine disabled (to not needlessly allocate resources); to enable it you
need to modify `my.cnf'.
Also, as the MySQL daemon is an API client to the NDB
storage engine,
the minimal configuration data needed to access the MGM node from the
MySQL server must be set. When this is done, then all MGM nodes
(one is sufficient to start) and DB nodes must be up and running before
starting the MySQL server.
NDBCluster
is available in binary distributions from MySQL-Max 4.1.3.
If you choose to build from source terball or the BitKeeper MySQL 4.1 tree,
make sure you add the --with-ndbcluster
option when running `configure'.
You could also simply use the BUILD/compile-pentium-max
build script.
This script also includes OpenSSL, so you either have to get OpenSSL
of modify the build script to exclude it.
Apart from these things, you can just follow the standard instructions to build your own binaries, run the tests and perform the installation procedure. See section 2.3.3 Installing from the Development Source Tree.
You need to have all the MGM and DB nodes up and running first, and this will probably be the most time-consuming part of the configuration, if for no other reason than because we will assume that you are already familiar with MySQL to a certain extent. As for the MySQL configuration and the `my.cnf' file, this is very straightforward, and this section only covers the differences from configuring MySQL without clustering.
This section describes how to quickly configure and start the simplest possible MySQL Cluster setup. This will make you familiar with the basic concepts. Then, read the other sections and design your desired setup.
A couple of directories need to be created (as root):
shell> mkdir /var/lib/mysql-cluster shell> mkdir /var/lib/mysql-cluster/node1 shell> mkdir /var/lib/mysql-cluster/node2
In the directory of node1, create a file `config.ini' with the following contents:
# file "config.ini" - showing minimal setup with 1 DB node # This file is placed in the start directory of ndb_mgmd, # the management server. [COMPUTER DEFAULT] [DB DEFAULT] NoOfReplicas: 1 [API DEFAULT] [MGM DEFAULT] ArbitrationRank: 2 [TCP DEFAULT] PortNumber: 28002 [COMPUTER] Id:1 HostName: localhost [MGM] Id:1 ExecuteOnComputer: 1 PortNumber: 2200 [DB] Id: 2 ExecuteOnComputer: 1 FileSystemPath: /var/lib/mysql-cluster/node2/ [API] Id: 3 ExecuteOnComputer: 1
In the same directory (`node1'), also create a file `Ndb.cfg' (note the capital N) containing the following information:
nodeid=1;host=localhost:2200
You can now start the management server. Do this as follows:
shell> cd /var/lib/mysql-cluster/node1 shell> ndb_mgmd -c config.ini -d
In the `node2' directory, also create a file `Ndb.cfg' but with this information:
nodeid=2;host=localhost:2200
To start the single DB node in this simple setup:
shell> cd /var/lib/mysql-cluster/node2 shell> ndbd -d
Please note that if you installed from the binary tarball, you will need to explicitly specify the path of the @command{ndb_mgmd} and @command{ndbd} servers. They would be in the `/usr/local/mysql/bin' directory.
Finally, go to the MySQL data directory (`/var/lib/mysql' or
`/usr/local/mysql/data' and ensure that the `my.cnf' file contains
the option necessary to enable the NDBCluster
storage engine:
[mysqld] ndbcluster
Create a file `Ndb.cfg' here (i.e., in the MySQL data directory) with this information:
nodeid=3;host=localhost:2200
You can now start the MySQL server as usual:
shell> mysqld_safe --user=mysql &
Wait a moment to make sure the MySQL server is running properly. If you see a notice ``mysql ended'', you need to check the server's `.err' file to find out what went wrong.
If all went well so far, we can now start using the cluster:
shell> mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 to server version: 4.1.3-beta-max-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> SHOW STORAGE ENGINES; +------------+---------+-------------------------------------------... | Engine | Support | Comment ... +------------+---------+-------------------------------------------... ... | NDBCLUSTER | YES | Clustered, fault tolerant memory based tables | NDB | YES | Alias for NDBCLUSTER ... ... mysql> USE test; Database changed mysql> CREATE TABLE ctest (i INT) ENGINE=NDBCLUSTER; Query OK, 0 rows affected (0.59 sec) mysql> SHOW CREATE TABLE ctest \G *************************** 1. row *************************** Table: ctest Create Table: CREATE TABLE `ctest` ( `i` int(11) default NULL ) ENGINE=ndbcluster DEFAULT CHARSET=latin1 1 row in set (0.00 sec)
As you will have noticed, there are still some aspects of the initial configuration that can be simplified and otherwise improved, and we are working on this. However, it may already pass the famous ``15 minute test'' for being able to install a working setup. Clustering is yours. Enjoy.
The configuration of MySQL Cluster is contained in a configuration file read by the management server and distributed to all processes involved in the cluster. This file contains a description of all involved nodes in the cluster, configuration parameters for the storage nodes and configuration parameters for all connections between the nodes in the cluster.
Currently the configuration file in INI format, and is named `config.ini' It is placed in the start directory of ndb_mgmd (the management server).
Preset defaults are available for most parameters, and defaults can be
specified in the `config.ini' file itself. For example, if all DB nodes use
the same size of data memory, which is not the same as the preset default size,
then a DB DEFAULT
entry can be created that specifies the default data
memory size for all DB node entries.
The INI format consists of sections preceded by section headings (surrounded by square brackets), followed by settings and values. One deviation from the standard format being that the setting name and value are separated by a colon (`:') instead of an equal sign (`='), and another is that the sections aren't unique. Instead, unique entries (such as two nodes of the same type) are identified by a unique ID.
A minimalistic configuration file needs to define the computers involved in the cluster and which nodes are involved in the cluster and what computers these nodes are placed on.
An example of a still fairly minimalistic configuration file for a cluster with one management server, two storage nodes and two MySQL servers is shown below:
# file "config.ini" - 2 DB nodes and 2 mysqld # This file is placed in the start directory of ndb_mgmd, # the management server. [COMPUTER DEFAULT] [DB DEFAULT] NoOfReplicas: 2 [API DEFAULT] [MGM DEFAULT] ArbitrationRank: 2 [TCP DEFAULT] PortNumber: 28002 [COMPUTER] Id: 1 Hostname: ndb_mgmd.mysql.com [COMPUTER] Id: 2 HostName: ndbd_2.mysql.com [COMPUTER] Id: 3 HostName: ndbd_3.mysql.com [COMPUTER] Id: 4 HostName: mysqld_4.mysql.com [COMPUTER] Id: 5 HostName: mysqld_5.mysql.com [MGM] Id: 1 ExecuteOnComputer: 1 PortNumber: 2200 [DB] Id: 2 ExecuteOnComputer: 2 FileSystemPath: /var/lib/mysql-cluster/node2/ [DB] Id: 3 ExecuteOnComputer: 3 FileSystemPath: /var/lib/mysql-cluster/node3/ [API] Id: 4 ExecuteOnComputer: 4 [API] Id: 5 ExecuteOnComputer: 5 #[TCP] #NodeId1: 2 #NodeId2: 3 #SendBufferMemory: 512K
There are six different sections in the config file. COMPUTER defines
the computers in the cluster. DB defines the storage nodes in the cluster.
API defines the MySQL server nodes in the cluster. MGM defines the
management server node in the cluster. TCP defines TCP/IP connections
between nodes in the cluster, TCP/IP is the default connection mechanism
between two nodes. SHM defines shared-memory connections between nodes.
This is only available in binaries which have been built with
the flag --with-ndb-shm
.
For each section one can define DEFAULTs. All parameters are currently case sensitive.
The COMPUTER section has no real significance other than serving as a way to avoid the need of defining host names for each node in the system. All parameters mentioned here are mandatory.
[COMPUTER]Id
[COMPUTER]HostName
The MGM section is used to configure the behaviour of the management server in various aspects. The mandatory parameters are Id, ExecuteOnComputer and PortNumber. All other parameters can be left out and will in that case receive the default value.
[MGM]Id
[MGM]ExecuteOnComputer
[MGM]PortNumber
[MGM]LogDestination
CONSOLESYSLOG means sending it to a syslog facility. It is necessary to specify the facility for this parameter. The poossible facilities are auth, authpriv, cron, daemon, ftp, kern, lpr, mail, news, syslog, user, uucp, local0, local1,local2, local3, local4, local5, local6, local7. Note that every facility is not necessarily supported by every operating system.
SYSLOG:facility=syslogFILE means sending the cluster log to a regular file on the machine. It is necessary to specify the name of this file, the maximum size of the file until a new file is opened and the old is renamed with filename extended by .x where x is the next number not used yet on the file. It is also necessary to specify maximum number of rolled files.
FILE:filename=cluster.log,maxsize=1000000,maxfiles=6Multiple log destinations can be specified as in the following example.
CONSOLE;SYSLOG:facility=local0;FILE:filename=/var/log/mgmdThe default of this parameter is FILE:filename=cluster.log,maxsize=1000000,maxfiles=6.
[MGM]ArbitrationRank
[MGM]ArbitrationDelay
The DB section is used to configure the behaviour of the storage nodes.
There are many parameters specified that controls the buffer sizes, pool
sizes, time-out parameters and so forth. The only mandatory parameters are
the Id
, ExecuteOnComputer
and NoOfReplicas
. Most
parameters should be set in the DB DEFAULT section. Only parameters
explicitly stated as possible to have local values are allowed to be
changed in the DB section. Id
and ExecuteOnComputer
needs to
be defined in the local DB section.
The first parameters to define are all mandatory, everyone except for
NoOfReplicas
should be defined per storage node.
For each parameter it is possible to use k, M and G which are then converted to 1024, (1024*1024) and (1024* 1024*1024). So 100k means 102400. Parameters and values are currently case sensitive.
The max values, min values and default values of the parameters are currently worked upon for best experience for first-time users. Also some unnecessary low max values will be changed soon.
[DB]Id
[DB]ExecuteOnComputer
[DB]NoOfReplicas
NoOfReplicas
is set to 2. The four storage nodes have node id 2, 3,
4 and 5. Then the first node group will be formed by node 2 and node 3. The
second node group will be formed by node 4 and node 5. It is important to
configure the cluster in such a manner such that nodes in the same node
groups are not placed on the same computer. This would cause a single HW
failure to cause a cluster crash.
There is no default value and the maximum number is 4.
[DB]FileSystemPath
ndbd
process.
If you use the recommended directory hierarchy, you will first need to create
the `/var/lib/mysql-cluster' directory, and then a subdirectory for
each DB node running on this computer. We suggest you simple name these
node1
, node2
, etc.
DataMemory
and IndexMemory
are the parameters that specify
the size of memory segments used to store the actual records and their
indexes. It is important to understand how DataMemory
and
IndexMemory
is used to understand how to set these parameters which
for most user cases will need to be updated to reflect the usage of the
cluster.
[DB]DataMemory
DataMemory
will be allocated in memory so it is important that the
machine contains enough memory to handle the DataMemory
size.
The DataMemory
is used to store two things. It stores the actual
records. Each record is currently of fixed size. So VARCHAR fields are
stored as fixed size fields. There is an overhead on each record on 16
bytes normally. Additionally each record is stored in a 32 kByte page with
128 byte page overhead. There will also be a small amount of waste for
each page since records are only stored in one page. The maximum record
size for the fields is currently 8052 bytes.
The DataMemory
is also used to store ordered indexes. Ordered
indexes uses about 10 bytes per record. Each record in the table is always
represented in the ordered index.
The DataMemory
consists of 32kByte pages. These pages are allocated
to partitions of the tables. Each table is normally partitioned with the
same number of partitions as there are storage nodes in the cluster. Thus
for each node there are the same amount of partitions (=fragments) as the
NoOfReplicas
is set to. Once a page has been allocated to a
partition it is currently not possible to bring it back to the pool of
free pages. The methods to restore pages to the pool is by deleting the
table. Performing a node recovery also will compress the partition since
all records are inserted into an empty partition from another live node.
Another important aspect is that the DataMemory
also contains UNDO
information for records. For each update of a record a copy record is
allocated in the DataMemory
. Also each copy record will also have an
instance in the ordered indexes of the table. Unique hash indexes are only
updated when the unique index fields are updated and in that case a new
entry in the index table is inserted and at commit the old entry is
deleted. Thus it is necessary to also allocate memory to be able to handle
the largest transactions which are performed in the cluster.
Performing large transactions has no advantage in MySQL Cluster other than
the consistency of using transactions which is the whole idea of
transactions. It is not faster and consumes large amounts of memory.
The default DataMemory
size is 80000 kBytes. The minimum size is 1
MByte and the maximum size is slightly more than 3 GBytes. In reality the
maximum size has to be adapted so that the process doesn't start swapping
when using the maximum size of the memory.
[DB]IndexMemory
IndexMemory
is the parameter that controls the amount of storage
used for hash indexes in MySQL Cluster. Hash indexes are always used for
primary key indexes and unique indexes and unique constraints. Actually
when defining a primary key and a unique index there will be two indexes
created in MySQL Cluster. One index is a hash index which is used for all
tuple accesses and also for lock handling. It is also used to ensure unique
constraints.
The size of the hash index is 25 bytes plus the size of the primary key.
For primary keys larger than 32 bytes another 8 bytes is added for some
internal references.
Thus for a table defined as
CREATE TABLE example (a int not null, b int not null, c int not null, PRIMARY KEY(a), UNIQUE(b)) ENGINE=NDBCLUSTER;We will have 12 bytes overhead (no nullable fields saves 4 bytes of overhead) plus 12 bytes of data per record. In addition we will have two ordered indexes on a and b consuming about 10 bytes each per record. We will also have a primary key hash index in the base table with roughly 29 bytes per record. The unique constraint is implemented by a separate table with b as primary key and a as a field. This table will consume another 29 bytes of index memory per record in the table and also 12 bytes of overhead plus 8 bytes of data in the record part. Thus for 1 million records we will need 58 MBytes of index memory to handle the hash indexes for the primary key and the unique constraint. For the
DataMemory
part we will need 64 MByte of memory to handle the
records of the base table and the unique index table plus the two ordered
index tables.
The conclusion is that hash indexes takes up a fair amount of memory space
but in return they provide very fast access to the data. They are also used
in MySQL Cluster to handle uniqueness constraints.
Currently the only partitioning algorithm is hashing and the ordered
indexes are local to each node and can thus not be used to handle
uniqueness constraints in the general case.
An important point for both IndexMemory
and DataMemory
is
that the total database size is the the sum of all DataMemory
and
IndexMemory
in each node group. Each node group is used to store
replicated information so if there are four nodes with 2 replicas there
will be two node groups and thus the total DataMemory
available is
2*DataMemory
in each of the nodes.
Another important point is about changes of DataMemory
and
IndexMemory
. First of all it is highly recommended to have the same
amount of DataMemory
and IndexMemory
in all nodes. Since data
is distributed evenly over all nodes in the cluster the size available is
no better than the smallest sized node in the cluster times the number of
node groups.
DataMemory
and IndexMemory
can be changed, it is dangerous to
decrease it since that can easily lead to a node that will not be able to
restart or even a cluster not being able to restart since there is not
enough memory space for the tables needed to restore into the starting
node. Increasing it should be quite ok, but it is recommended that such
upgrades are performed in the same manner as a software upgrade where first
the configuration file is updated, then the management server is restarted
and then one storage node at a time is restarted by command.
More IndexMemory
is not used due to updates but inserts are inserted
immediately and deletes are not deleted until the transaction is committed.
The default IndexMemory
size is 24000 kBytes. The minimum size is 1
MByte and the maximum size is around 1.5 GByte.
The next two parameters are important since they will affect the number of
parallel transactions and the sizes of transactions that can be handled by
the system. MaxNoOfConcurrentTransactions
sets the number of
parallel transactions possible in a node and
MaxNoOfConcurrentOperations
sets the number of records that can be
in update phase or locked simultaneously.
Both of these parameters and particularly
MaxNoOfConcurrentOperations
are likely targets for users setting
specific values and not using the default value. The default value is set
for systems using small transactions and to ensure not using too much
memory in the default case.
[DB]MaxNoOfConcurrentTransactions
[DB]MaxNoOfConcurrentOperations
The next set of parameters are used for temporary storage in the midst of executing a part of a query in the cluster. All of these records will have been released when the query part is completed and is waiting for the commit or rollback.
Most of the defaults for these parameters will be ok for most users. Some high-end users might want to increase those to enable more parallelism in the system and some low-end users might want to decrease them to save memory.
[DB]MaxNoOfConcurrentIndexOperations
MaxNoOfConcurrentOperations
.
The default value of this parameter is 8192. Only in rare cases of
extremely high parallelism using unique hash indexes should this parameter
be necessary to increase. To decrease could be performed for memory savings
if the DBA is certain that such high parallelism is not occurring in the
cluster.
[DB]MaxNoOfFiredTriggers
MaxNoOfFiredTriggers
is 1000. Normally this
value should be sufficient for most systems. In some cases it could be
decreased if the DBA feels certain the parallelism in the cluster is not so
high.
This record is used when an operation is performed that affects a unique
hash index. Updating a field which is part of a unique hash index or
inserting/deleting a record in a table with unique hash indexes will fire
an insert or delete in the index table. This record is used to represent
this index table operation while its waiting for the original operation
that fired it to complete. Thus it is short lived but can still need a
fair amount of records in its pool for temporary situations with many
parallel write operations on a base table containing a set of unique hash
indexes.
[DB]TransactionBufferMemory
TransactionBufferMemory
is 1000 kBytes.
[DB]MaxNoOfConcurrentScans
MaxNoOfConcurrentScans
is 25. The maximum value
is 500.
[DB]NoOfFragmentLogFiles
[DB]MaxNoOfSavedMessages
The next set of parameters defines the pool sizes for meta data objects. It is necessary to define the maximum number of attributes, tables, indexes and trigger objects used by indexes, events and replication between clusters.
[DB]MaxNoOfAttributes
[DB]MaxNoOfTables
[DB]MaxNoOfIndexes
[DB]MaxNoOfTriggers
There is a set of boolean parameters affecting the behaviour of storage nodes. Boolean parameters can be specified to true by setting it to Y or 1 and to false by setting it to N or 0.
[DB]LockPagesInMainMemory
[DB]StopOnError
[DB]RestartOnErrorInsert
There are quite a few parameters specifying time-outs and time intervals between various actions in the storage nodes. Most of the time-outs are specified in milliseconds with a few exceptions which will be mentioned below.
[DB]TimeBetweenWatchDogCheck
[DB]StartPartialTimeout
[DB]StartPartitionedTimeout
StartPartialTimeout
but
is still in a possibly partitioned state one waits until also this timeout
has passed.
The default timeout is 60000 milliseconds, 60 seconds.
[DB]StartFailureTimeout
[DB]HeartbeatIntervalDbDb
[DB]HeartbeatIntervalDbApi
[DB]TimeBetweenLocalCheckpoints
[DB]TimeBetweenGlobalCheckpoints
[DB]TimeBetweenInactiveTransactionAbortCheck
[DB]TransactionInactiveTimeout
[DB]TransactionDeadlockDetectionTimeout
[DB]NoOfDiskPagesToDiskAfterRestartTUP
NoOfDiskPagesToDiskAfterRestartACC
. This
parameter handles the limitation of writes from the DataMemory
.
So this parameter specifies how quickly local checkpoints will be executed.
This parameter is important in connection with NoOfFragmentLogFiles
,
DataMemory
, IndexMemory
.
The default value is 10, thus 800 kBytes of data pages per second.
[DB]NoOfDiskPagesToDiskAfterRestartACC
NoOfDiskPagesToDiskAfterRestartTUP
but limits the speed of writing
index pages from IndexMemory
.
The default value of this parameter is 5, thus 400 kBytes per second.
[DB]NoOfDiskPagesToDiskDuringRestartTUP
NoOfDiskPagesToDiskAfterRestartTUP
and
NoOfDiskPagesToDiskAfterRestartACC
, only it does it for local
checkpoints executed in the node as part of a local checkpoint when the
node is restarting. As part of all node restarts a local checkpoint is
always performed. Since during a node restart it is possible to use a
higher speed of writing to disk since less activities is performed in the
node due to the restart phase.
This parameter handles the DataMemory
part.
The default value is 40, thus 6.4 MBytes per second.
[DB]NoOfDiskPagesToDiskDuringRestartACC
IndexMemory
part of local checkpoint.
The default value is 20, thus 3.2 MBytes per second.
[DB]ArbitrationTimeout
For management of the cluster it is important to be able to control the amount of log messages sent to stdout for various event types. The possible events will be listed in this manual soon. There are 16 levels possible from level 0 to level 15. Setting event reporting to level 15 means receiving all event reports of that category and setting it to 0 means getting no event reports in that category.
The reason why most defaults are set to 0 and thus not causing any output to stdout is that the same message is sent to the cluster log in the management server. Only the start-up message is by default generated to stdout.
A similar set of levels can be set in management client to define what levels to record in the cluster log.
[DB]LogLevelStartup
[DB]LogLevelShutdown
[DB]LogLevelStatistic
[DB]LogLevelCheckpoint
[DB]LogLevelNodeRestart
[DB]LogLevelConnection
[DB]LogLevelError
[DB]LogLevelInfo
There is a set of parameters defining memory buffers set aside for on-line backup execution.
[DB]BackupDataBufferSize
BackupWriteSize
.
When sending data to the disk, the backup can continue filling this buffer
until it runs out of buffer space. When running out of buffer space it will
simply stop the scan and wait until some disk writes return and thus frees
up memory buffers to use for further scanning.
The default is 2 MBytes.
[DB]BackupLogBufferSize
BackupDataBufferSize
except
that when this part runs out of buffer space it will cause the backup to
fail due to lack of backup buffers. Thus the size of this buffer must be
big enough to handle the load caused by write activities during the backup
execution.
The default parameter should be big enough. Actually it is more likely that
a backup failure is caused by a disk not able to write as quickly as it
should. If the disk subsystem is not dimensioned for the write load caused
by the applications this will create a cluster which will have great
difficulties to perform the desired actions.
It is important to dimension the nodes in such a manner that the processors
becomes the bottleneck rather than the disks or the network connections.
The default is 2 MBytes.
[DB]BackupMemory
BackupDataBufferSize
and BackupLogBufferSize
.
The default is 4 MBytes.
[DB]BackupWriteSize
The API section defines the behaviour of the MySQL server. Id, ArbitrationRank and ExecuteOnComputer are mandatory.
[API]Id
[API]ExecuteOnComputer
[API]ArbitrationRank
[API]ArbitrationDelay
TCP/IP connections is the default transport mechanism for MySQL Cluster. It is actually not necessary to define any connection since there will be a one connection set-up between each of the storage node, between each storage node and all MySQL server nodes and between each storage node and the management server.
It is only necessary to define a connection if it is necessary to change
the default values of the connection. In that case it is necessary to
define at least NodeId1
, NodeId2
and the parameters to
change.
It is also possible to change the default values by setting the parameters in the TCP DEFAULT section.
[TCP]NodeId1
[TCP]NodeId2
NodeId1
and NodeId2
.
[TCP]SendBufferMemory
[TCP]SendSignalId
[TCP]Checksum
[TCP]PortNumber
[TCP]ReceiveBufferMemory
Shared memory segments are currently only supported for special builds of
MySQL Cluster using the configure parameter --with-ndb-shm
. Its
implementation will most likely change. When defining shared memory as the
connection method it is necessary to define at least NodeId1
,
NodeId2
and ShmKey
. All other parameters have default values
which will work out fine in most cases.
[SHM]NodeId1
[SHM]NodeId2
NodeId1
and NodeId2
.
[SHM]ShmKey
[SHM]ShmSize
[SHM]SendSignalId
[SHM]Checksum
There are four processes that are important to know about when using MySQL Cluster. We will cover how to work with those processes, which options to use when starting and so forth.
ndbd
, the Storage Engine Node Process
ndb_mgmd
, the Management Server Process
ndb_mgm
, the Management Client Process
@command{mysqld} is the traditional MySQL server process. To be used with MySQL Cluster it needs to be built with support for the NDBCluster storage engine. If the @command{mysqld} binary has been built in such a manner, the NDBCluster storage engine is still disabled by default.
To enable the NDBCluster storage engine there are two ways. Either use
--ndbcluster
as a start-up option when starting @command{mysqld} or
insert a line with ndbcluster
in the [mysqld] section of your
`my.cnf' file.
An easy way to verify that your server runs with support for the
NDBCluster
storage engine is to issue the command
SHOW TABLE TYPES
from a mysql client.
You should see YES
for the row listing NDBCLUSTER
.
If you see NO
, you are not running a @command{mysqld} that is compiled
with NDBCluster
support enabled. If you see DISABLED
, then you
need to enable it in the my.cnf
configuration file.
The MySQL server needs to know how to get the configuration of the cluster. To access this configuration it needs to know three things, it needs to know its own node id in the cluster, it needs to know the hostname (or IP address) where the management server resides and finally it needs to know the port on which it can connect to the management server.
There are currently two possible ways to provide this information to the
@command{mysqld} process. The first option is to include this information in a
file called `Ndb.cfg'. This file should reside in the data directory
of the MySQL Server. The second option is to set an environment variable
called NDB_CONNECTSTRING
. The string is the same in both cases.
host=ndb_mgmd.mysql.com:2200
ndb_mgmd.mysql.com
is the host where the management server resides,
and it is listening to port 2200.
With this set-up the MySQL server will be a full citizen of MySQL Cluster and will be fully aware of all storage nodes in the cluster and their status. It will set-up connection to all storage engine nodes and will be able to use all storage engine nodes as transaction coordinator and to access their data for reading and updating.
ndbd
, the Storage Engine Node Process
ndbd
is the process which is used to handle all the data in the
tables using the NDBCluster storage engine. This is the process that
contains all the logic of distributed transaction handling, node recovery,
checkpointing to disk, on-line backup and lots of other functionality.
In a cluster there is a set of ndbd
processes cooperating in
handling the data. These processes can execute on the same computer or on
different computers, in a completely configurable manner.
Each ndbd
process should start from a separate directory. The reason
for this is that ndbd
generates a set of log files in its starting
directory. These log files are:
error.log
is a file that contains information of all the crashes which
the ndbd
process has encountered and a smaller error string and reference to a trace file for this crash. An entry could like this:
Date/Time: Saturday 31 January 2004 - 00:20:01 Type of error: error Message: Internal program error (failed ndbrequire) Fault ID: 2341 Problem data: DbtupFixAlloc.cpp Object of reference: DBTUP (Line: 173) ProgramName: NDB Kernel ProcessID: 14909 TraceFile: NDB_TraceFile_1.trace ***EOM***
NDB_TraceFile_1.trace
is a trace file describing exactly what
happened before the error occurred. This information is useful for the
MySQL Cluster team when analysing any bugs occurring in MySQL Cluster. The
information in this file will be described in the section MySQL
Cluster Troubleshooting
. There can be a configurable number of those trace
files in the directory before old files are overwritten. 1 in this context
is simply the number of the trace file.
NextTraceFileNo.log
is the file keeping track of what the number of
the next trace file is to be.
node2.out
is the file which contains any data printed by the
ndbd
process when executing as a daemon process. 2 in this context
is the node id. This file only exists when starting ndbd
as a daemon
since then stdout and stderr is redirected to this file.
node2.pid
is a file used to ensure that only one ndb node is started
with this node id. This is the normal pid-file created when starting a
daemon. 2 in this context is the node id. This file only exists when
starting ndbd
as a daaemon process.
Signal.log
is a file which is only used in debug versions of
ndbd
where it is possible to trace all incoming, outgoing and
internal messages with their data in the ndbd
process.
It is recommended to not use a directory mounted through NFS since that can in some environments cause problems with the lock on the pid-file remaining even after the process has stopped.
Also when starting the ndbd
process it is necessary to specify which
node id the process is to use, the host of the management server and the
port it is listening to. Again there are two ways of specifying this
information. Either in a string in the file `Ndb.cfg', this file
should be stored in the starting directory of the ndbd
process. The
second option is to set the environment variable NDB_CONNECTSTRING
before starting the process.
When ndbd
starts it will actually start two processes. The starting
process is called the "angel" and its only job is to discover when the
execution process has completed and then if configured to do so, to restart
the ndbd
process. Thus if one attempts to kill the ndbd
through the kill command in Unix, it is necessary to kill both processes.
The execution process will use one thread for all activities in reading,
writing and scanning data and all other activities. This thread is designed
with asynchronous programming so it can easily handle thousands of
concurrent activites. In addition there is a watch-dog thread supervising
the execution thread to ensure it doesn't stop in an eternal loop or other
problem. There is a pool of threads handling file I/O. Each thread can
handle one open file. In addition threads can be used for connection
activities of the transporters in the ndbd
process. Thus in a system
that performs a large number of activities including update activities the
ndbd
process will consume up to about 2 CPUs if allowed to. Thus in
a large SMP box with many CPUs it is recommended to use several
ndbd
processes which are configured to be part of different node
groups.
nodeid=2;host=ndb_mgmd.mysql.com:2200
ndb_mgmd
, the Management Server ProcessThe management server is the process which reads the configuration file of the cluster and distributes this information to all nodes in the cluster requesting it. It does also maintain the log of cluster activities. Management clients can connect to the management server and use commands to check status of the cluster in various aspects.
Also when starting ndb_mgmd
it is necessary to state the same
information as for ndbd
and @command{mysqld} processes and again there
are two options using the file `Ndb.cfg' or using the environment
variable NDB_CONNECTSTRING
. The `Ndb.cfg' will if used be
placed in the start directory of ndb_mgmd
.
nodeid=1;host=ndb_mgmd.mysql.com:2200
The following files are created or used by ndb_mgmd
in its starting
directory of ndb_mgmd
:
MySQL Cluster Configuration
.
cluster.log
is the file where events in the cluster are reported.
Examples of events are checkpoints started and completed, node failures and
nodes starting, levels of memory usage passed and so forth. The events
reported are described in the section section 17.5 Management of MySQL Cluster.
node1.out
is the file used for stdout and stderr when executing the
management server as a daemon process. 1 in this context is the node id.
node1.pid
is the pid file used when executing the management server
as a daemon process. 1 in this context is the node id.
cluster.log.1
when the cluster log becomes bigger than 1 million
bytes then the file is renamed cluster.log.1
where 1 is the number
of the cluster log file, so if 1, 2, and 3 already exists the next will be
having the number 4 instead.
ndb_mgm
, the Management Client ProcessThe final important process to know about is the management client. This process is not needed to run the cluster. Its value lies in its ability to check status of the cluster, start backups and other management activities. It does so by providing access to a set of commands.
Actually the management client is using a C API which is used to access the management server so for advanced users it is also possible to program dedicated management processes which can do similar things as the management client can do.
When starting the management client it is necessary to state the hostname and port of the management server as in the example below.
ndb_mgm localhost 2200
ndbd
ndb_mgmd
ndb_mgm
--ndbcluster
NDBCluster
storage engine the
default disabling of support for the NDB
storage engine can be
overruled by using this option. Using the NDBCluster
storage engine
is necessary for using MySQL Cluster.
--skip-ndbcluster
NDBCluster
storage engine. This is disabled by default
for binaries where it is included. So this option only applies if the
server was configured to use the NDBCluster
storage engine.
ndbd
-?, --usage
-c string, --connect-string string
ndbd
it is also possible to set the connect string to the
management server as a command option.
nodeid=2;host=ndb_mgmd.mysql.com:2200
-d, --daemon
ndbd
to execute as a daemon process.
-i, --initial
ndbd
to perform an initial start. An initial start will
erase any files created by earlier ndbd
instances for recovery. It
will also recreate recovery log files which on some Operating Systems can
take a substantial amount of time.
-n, --no-start
ndbd
to not automatically start. ndbd
will connect
to the management server and get the configuration and initialise
communication objects. It will not start the execution engine until
requested to do so by the management server. The management server can
request by command issued by the management client.
-s, --start
ndbd
process to immediately start. This is the default
behavior so it is not really needed.
-v, --version
ndbd
process. The version number is
the MySQL Cluster version number. This version number is important since at
start-up the MySQL Cluster processes verifies that the versions of the
processes in the cluster can co-exist in the cluster. It is also important
for on-line software upgrade of MySQL Cluster (see section Software
Upgrade of MySQL Cluster
).
ndb_mgmd
-?, --usage
-c filename
-d
ndb_mgmd
to start as a daemon process.
-l filename
-n
--version
ndb_mgm
-?, --usage
[host_name [port_num]]
--try-reconnect=number
Managing a MySQL Cluster involves a number of activities. The first activity is to configure and start-up MySQL Cluster. This is covered by the sections section 17.3 MySQL Cluster Configuration and section 17.4 Process Management in MySQL Cluster. This section covers how to manage a running MySQL Cluster.
There are essentially two ways of actively managing a running MySQL
Cluster. The first is by commands entered into the management client where
status of cluster can be checked, log levels changed, backups started and
stopped and nodes can be stopped and started. The second method is to study
the output in the cluster log. The cluster log is directed towards the
cluster.log
in the directory where the management server started.
The cluster log contains event reports generated from the ndbd
processes in the cluster. It is also possible to send the cluster log
entries to a Unix system log.
Go to the first, previous, next, last section, table of contents.