Slony-I 2.0.4 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Fast Forward | Next |
When upgrading Slony-I, the installation on all nodes in a cluster must be upgraded at once, using the slonik command SLONIK UPDATE FUNCTIONS.
While this requires temporarily stopping replication, it does not forcibly require an outage for applications that submit updates.
The proper upgrade procedure is thus:
Stop the slon processes on all nodes. (e.g. - old version of slon)
Install the new version of slon software on all nodes.
Execute a slonik script containing the command update functions (id = [whatever]); for each node in the cluster.
Note: Remember that your slonik upgrade script like all other slonik scripts must contain the proper preamble commands to function.
Start all slons.
The overall operation is relatively safe: If there is any mismatch between component versions, the slon will refuse to start up, which provides protection against corruption.
You need to be sure that the C library containing SPI trigger functions has been copied into place in the PostgreSQL build. There are multiple possible approaches to this:
The trickiest part of this is ensuring that the C library containing SPI functions is copied into place in the PostgreSQL build; the easiest and safest way to handle this is to have two separate PostgreSQL builds, one for each Slony-I version, where the postmaster is shut down and then restarted against the "new" build; that approach requires a brief database outage on each node.
While that approach has been found to be easier and safer, nothing prevents one from carefully copying Slony-I components for the new version into place to overwrite the old version as the "install" step. That might not work on Windows™ if it locks library files that are in use.
If you build Slony-I on the same system on which it is to be deployed, and build from sources, overwriting the old with the new is as easy as make install. There is no need to restart a database backend; just to stop slon processes, run the UPDATE FUNCTIONS script, and start new slon processes.
Unfortunately, this approach requires having a build environment on the same host as the deployment. That may not be consistent with efforts to use common PostgreSQL and Slony-I binaries across a set of nodes.
With this approach, the old PostgreSQL build with old Slony-I components persists after switching to a new PostgreSQL build with new Slony-I components. In order to switch to the new Slony-I build, you need to restart the PostgreSQL postmaster, therefore interrupting applications, in order to get it to be aware of the location of the new components.
Usually, upgrades between Slony-I versions have required no special attention to the condition of the existing replica. That is, you fairly much merely need to stop slons, put new binaries in place, run SLONIK UPDATE FUNCTIONS against each node, and restart slons. Schema changes have been internal to the cluster schema, and SLONIK UPDATE FUNCTIONS has been capable to make all of the needed alterations. With version 2, this changes, if there are tables that used SLONIK TABLE ADD KEY. Version 2 does not support the "extra" column, and "fixing" the schema to have a proper primary key is not within the scope of what SLONIK UPDATE FUNCTIONS can perform.
When upgrading from versions 1.0.x, 1.1.x, or 1.2.x to version 2, it will be necessary to have already eliminated any such Slony-I-managed primary keys.
One may identify the tables affected via the following SQL query: select n.nspname, c.relname from pg_class c, pg_namespace n where c.oid in (select attrelid from pg_attribute where attname like '_Slony-I_%rowID' and not attisdropped) and reltype <> 0 and n.oid = c.relnamespace order by n.nspname, c.relname;
The simplest approach that may be taken to rectify the "broken" state of such tables is as follows:
Drop the table from replication using the slonik command SLONIK SET DROP TABLE.
This does not drop out the Slony-I-generated column.
On each node, run an SQL script to alter the table, dropping the extra column.
alter table whatever drop column "_Slony-I_cluster-rowID";
This needs to be run individually against each node. Depending on your preferences, you might wish to use SLONIK EXECUTE SCRIPT to do this.
If the table is a heavily updated one, it is worth observing that this alteration will require acquiring an exclusive lock on the table. It will not hold this lock for terribly long; dropping the column should be quite a rapid operation as all it does internally is to mark the column as being dropped; it does not require rewriting the entire contents of the table. Tuples that have values in that column will continue to have that value; new tuples will leave it NULL, and queries will ignore the column. Space for those columns will get reclaimed as tuples get updated.
Note that at this point in the process, this table is not being replicated. If a failure takes place, replication is not, at this point, providing protection on this table. This is unfortunate but unavoidable.
Make sure the table has a legitimate candidate for primary key, some set of NOT NULL, UNIQUE columns.
The possible variations to this are the reason that the developers have made no effort to try to assist automation of this.
If the table is a small one, it may be perfectly reasonable to do alterations (note that they must be applied to every node!) to add a new column, assign it via a new sequence, and then declare it to be a primary key.
If there are only a few tuples, this should take a fraction of a second, and, with luck, be unnoticeable to a running application.
Even if the table is fairly large, if it is not frequently accessed by the application, the locking of the table that takes place when you run ALTER TABLE may not cause much inconvenience.
If the table is a large one, and is vital to and heavily accessed by the application, then it may be necessary to take an application outage in order to accomplish the alterations, leaving you necessarily somewhat vulnerable until the process is complete.
If it is troublesome to take outages, then the upgrade to Slony-I version 2 may take some planning...
Create a new replication set (SLONIK CREATE SET) and re-add the table to that set (SLONIK SET ADD TABLE).
If there are multiple tables, they may be handled via a single replication set.
Subscribe the set (SLONIK SUBSCRIBE SET) on all the nodes desired.
Once subscriptions are complete, merge the set(s) in, if desired (SLONIK MERGE SET).
This approach should be fine for tables that are relatively small, or infrequently used. If, on the other hand, the table is large and heavily used, another approach may prove necessary, namely to create your own sequence, and "promote" the formerly Slony-I-generated column into a "real" column in your database schema. An outline of the steps is as follows:
Add a sequence that assigns values to the column.
Setup steps will include SQL CREATE SEQUENCE, SQL SELECT SETVAL() (to set the value of the sequence high enough to reflect values used in the table), Slonik SLONIK CREATE SET (to create a set to assign the sequence to), Slonik SLONIK SET ADD SEQUENCE (to assign the sequence to the set), Slonik SLONIK SUBSCRIBE SET (to set up subscriptions to the new set)
Attach the sequence to the column on the table.
This involves ALTER TABLE ALTER COLUMN, which must be submitted via the Slonik command SLONIK EXECUTE SCRIPT.
Rename the column _Slony-I_@CLUSTERNAME@_rowID so that Slony-I won't consider it to be under its control.
This involves ALTER TABLE ALTER COLUMN, which must be submitted via the Slonik command SLONIK EXECUTE SCRIPT.
Note that these two alterations might be accomplished via the same SLONIK EXECUTE SCRIPT request.
One of the major changes to Slony-I is that enabling/disabling of triggers and rules now takes place as plain SQL, supported by PostgreSQL 8.3+, rather than via "hacking" on the system catalog.
As a result, Slony-I users should be aware of the PostgreSQL syntax for ALTER TABLE, as that is how they can accomplish what was formerly accomplished via SLONIK STORE TRIGGER and SLONIK DROP TRIGGER.
The version 2 branch is substantially different from earlier releases, dropping support for versions of PostgreSQL prior to 8.3, as in version 8.3, support for a "session replication role" was added, thereby eliminating the need for system catalog hacks as well as the not-entirely-well-supported xxid data type.
As a result of the replacement of the xxid type with a (native-to-8.3) PostgreSQL transaction XID type, the slonik command SLONIK UPDATE FUNCTIONS is quite inadequate to the process of upgrading earlier versions of Slony-I to version 2.
In version 2.0.2, we have added a new option to SLONIK SUBSCRIBE SET, OMIT COPY, which allows taking an alternative approach to upgrade which amounts to:
Uninstall old version of Slony-I
When Slony-I uninstalls itself, catalog corruptions are fixed back up.
Install Slony-I version 2
Resubscribe, with OMIT COPY
Warning |
There is a large "foot gun" here: during part of the process, Slony-I is not installed in any form, and if an application updates one or another of the databases, the resubscription, omitting copying data, will be left with data out of sync. The administrator must take care; Slony-I has no way to help ensure the integrity of the data during this process. |
The following process is suggested to help make the upgrade process as safe as possible, given the above risks.
Use Section 21.10 to generate a slonik script to recreate the replication cluster.
Be sure to verify the SLONIK ADMIN CONNINFO statements, as the values are pulled are drawn from the PATH configuration, which may not necessarily be suitable for running slonik.
This step may be done before the application outage.
Determine what triggers have SLONIK STORE TRIGGER configuration on subscriber nodes.
As discussed in Section 11, the handling has fundamentally changed between Slony-I 1.2 and 2.0.
Generally speaking, what needs to happen is to query sl_table on each node, and, for any triggers found in sl_table, it is likely to be appropriate to set up a script indicating either ENABLE REPLICA TRIGGER or ENABLE ALWAYS TRIGGER for these triggers.
This step may be done before the application outage.
Begin an application outage during which updates should no longer be applied to the database.
To ensure that applications cease to make changes, it would be appropriate to lock them out via modifications to pg_hba.conf
Ensure replication is entirely caught up, via examination of the sl_status view, and any application data that may seem appropriate.
Shut down slon processes.
Uninstall the old version of Slony-I from the database.
This involves running a slonik script that runs SLONIK UNINSTALL NODE against each node in the cluster.
Ensure new Slony-I binaries are in place.
A convenient way to handle this is to have old and new in different directories alongside two PostgreSQL builds, stop the postmaster, repoint to the new directory, and restart the postmaster.
Run the script that reconfigures replication as generated earlier.
This script should probably be split into two portions to be run separately:
Firstly, set up nodes, paths, sets, and such
At this point, start up slon processes
Then, run the portion which runs SLONIK SUBSCRIBE SET
Splitting the Section 21.10 script as described above is left as an exercise for the reader.
If there were triggers that needed to be activated on subscriber nodes, this is the time to activate them.
At this point, the cluster should be back up and running, ready to be reconfigured so that applications may access it again.