|Informix Application Development - Getting Started|
|IBM Informix Developer's Handbook|
|Embedding IBM Informix|
It sometimes happens that quite useful fixes and enhancements make it into a release but remain little-known. A few such fixes and enhancements made it into the 11.10xC2 server; together, these enhancements make the management of the CDR_QDATA_SBSPACE configuration and of DDRBLOCK mode much easier and more tenable than in the past.
The IDS server writes to logical log files in a circular fashion, overwriting older log files when a new log file needs to be written to and more than LOGILES files (as specified in the
$INFORMIXDIR/etc/$ONCONFIG configuration file) have been written to. DDRBLOCK occurs when new transactions writing to the log come dangerously close to wrapping the log space around and overwriting old logs that Enterprise Replication has yet to process. In older servers, if the system ever entered DDRBLOCK mode, it could be very difficult to get the system out of DDRBLOCK mode without restarting oninit.
More recent releases of Enterprise Replication -- certainly, version 10 and later -- should rarely enter DDRBLOCK mode, unless the system is severely misconfigured. An example of a dangerously misconfigured system would be one with too few log files, especially if some of the log files are quite large while others are quite small. With such a configuration, even a small hiccup when Enterprise Replication processes log entries can cause DDRBLOCK mode, or even worse, log wrap. If log wrap occurs, that is, if new transactions overwrite entries that Enterprise Replication has yet to process, Enterprise Replication shuts down and data becomes unsynchronized among servers in the replication system.
One condition in which Enterprise Replication can still enter
DDRBLOCK mode even in an otherwise well-configured system is when a
destination site remains inaccessible for an extended period of time.
If this happens, the Reliable Queue Manager (RQM) send queue will save
transactions that include that site in its destination list in stable
storage. If the spool space fills, the oninit server will likely enter
DDRBLOCK mode, because Enterprise Replication cannot stably store
transactions in its send queue and therefore can no longer advance the replay position, the oldest point in the logs that Enterprise Replication needs to access.
As an example, I have configured a small two-server replication system. I configured the IDS instance at which I will be generating transactions with too few logs and too little send queue stable storage and used the 'cdr suspend serv' command to suspend the other server. Since transactions cannot flow to the destination server, transactions quickly start to accumulate in the send queue:
[pinch-cdrtempmurre] (pinch) 110 % onstat -g rqm sendq | egrep '^ Txns' Txns in queue: 18 Txns in memory: 7 Txns in spool only: 11 Txns spooled: 11
and as I configured very little send queue spool space, the spool space immediately fills up, as shown in the message log:
10:44:47 CDR QUEUER: Send Queue space is FULL - waiting for space in CDR_QDATA_SBSPACE
In this case, Enterprise Replication will also raise an alarm of severity 4 and class 31.
Since Enterprise Replication cannot advance the replay position, the IDS instance also enters DDRBLOCK state, as shown by the "Blocked:DDR" line in the following output:
[pinch-cdrtempmurre] (pinch) 129 % onstat -g ddr | head -10 IBM Informix Dynamic Server Version 11.10.F -- On-Line -- Up 00:26:03 -- 78772 Kbytes Blocked:DDR DDR -- Running -- # Event Snoopy Snoopy Replay Replay Current Current Buffers ID Position ID Position ID Position 2064 4 1ee4454 3 74f018 12 2ad000
We can see that the replay log id is 3, whereas the current log id to which IDS is writing transactions is 12. The fact that log 12 is the current log is also displayed by the onstat -l command:
[pinch-cdrtempmurre] (pinch) 132 % onstat -l | grep C | grep -v CDR 451f2c30 2 U---C-L 12 1:31763 9000 685 7.61
I configured my example instance to have only 10 logical log files, so if we cannot reuse logical log 3 and are already at log 12, we need 12 - 3 + 1 or all 10 logical log files. Small wonder the server is in DDRBLOCK mode!
The send queue stable storage area is configured via the CDR_QDATA_SBSPACE configuration parameter. 11.10xC2 and later include an addition to onstat that allows the sbspaces configured to CDR_QDATA_SBSPACE to be monitored very easily. The command is onstat -g rqm sbspaces:
onstat -g rqm sbspaces IBM Informix Dynamic Server Version 11.10.F -- On-Line -- Up 00:29:41 -- 78772 Kbytes Blocked:DDR RQM Space Statistics for CDR_QDATA_SBSPACE: ------------------------------------------- name/addr number used free total %full pathname 0x46581c58 5 311 1 312 100 /tmp/amsterdam_sbsp_base amsterdam_sbsp_base5 311 1 312 100 0x46e54528 6 295 17 312 95 /tmp/amsterdam_sbsp_2 amsterdam_sbsp_26 295 17 312 95 0x46e54cf8 7 310 2 312 99 /tmp/amsterdam_sbsp_3 amsterdam_sbsp_37 310 2 312 99 0x47bceca8 8 312 0 312 100 /tmp/amsterdam_sbsp_4 amsterdam_sbsp_48 312 0 312 100
In the past, the information returned via the onstat -g rqm sbspaces command was available, but you had gather it by looking at the the CDR_QDATA_SBSPACE values and then manually extracting the information relevant to the CDR_QDATA_SBSPACE spaces from the onstat -d output. Imagine doing this in a "real" system with dozens of dbspaces!
If CDR_QDATA_SBSPACE space starts to run low, you can either add more chunks to an sbspace already in the CDR_QDATA_SBSPACE list, or, starting with the 11.10xC2 release, you can add a new sbspace to the CDR_QDATA_SBSPACE list.
For example, say I have created (via onspaces) a new sbspace mynewcdrsbsp:
[pinch-cdrtempmurre] (configparam) 157 % onstat -d | grep mynewcdrsbsp 47bce508 12 0x68001 12 1 2048 N SB informix mynewcdrsbsp 47bce6a0 12 12 0 1000 702 702 POSB /tmp/mynewcdrsbsp
I can then add that space to the list of CDR_QDATA_SBSPACE spaces via the cdr add config command.
[pinch-cdrtempmurre] (configparam) 158 % userid informix cdr add config "CDR_QDATA_SBSPACE mynewcdrsbsp" WARNING: The value specifed updated in-memory only.
I can easily verify what sbspaces are configured via onstat. As you can
see, mynewcdrsbsp is there:
[pinch-cdrtempmurre] (configparam) 159 % onstat -g cdr config CDR_QDATA_SBSPACE IBM Informix Dynamic Server Version 11.10.F -- On-Line -- Up 00:39:38 -- 86964 Kbytes Blocked:DDR CDR_QDATA_SBSPACE configuration setting: amsterdam_sbsp_base amsterdam_sbsp_2 amsterdam_sbsp_3 amsterdam_sbsp_4 mynewcdrsbsp
and Enterprise Replication is spooling transactions to the new sbspace. In fact, it's already 99% full.
[pinch-cdrtempmurre] (configparam) 162 % onstat -g rqm sbspaces IBM Informix Dynamic Server Version 11.10.F -- On-Line -- Up 00:51:59 -- 86964 Kbytes Blocked:DDR RQM Space Statistics for CDR_QDATA_SBSPACE: ------------------------------------------- name/addr number used free total %full pathname 0x46581c58 5 311 1 312 100 /tmp/amsterdam_sbsp_base amsterdam_sbsp_base5 311 1 312 100 0x46e54528 6 312 0 312 100 /tmp/amsterdam_sbsp_2 amsterdam_sbsp_26 312 0 312 100 0x46e54cf8 7 310 2 312 99 /tmp/amsterdam_sbsp_3 amsterdam_sbsp_37 310 2 312 99 0x47bceca8 8 312 0 312 100 /tmp/amsterdam_sbsp_4 amsterdam_sbsp_48 312 0 312 100 0x47bce6a0 12 696 6 702 99 /tmp/mynewcdrsbsp mynewcdrsbsp 12 696 6 702 99
So what about DDRBLOCK mode? In practice, by far the likeliest cause for entering DDRBLOCK mode is that a destination server remains unavailable for an extended period of time. (In this example, I have simulated that condition by suspending the destination server.) If you expect the destination server to become available in a reasonable amount of time and you have enough disk space, you can add more space to the CDR_QDATA_SBSPACE parameter as in this example. Because Enterprise Replication raises an alarm of severity 4 and class 31 when it runs out of send queue spool space, you could even write an alarm handler to automate this task.
What if you expect a destination server to become unavailable for an extended period of time, a period longer than you expect can be handled by spooling the send queue to disk? You will have little choice other than to remove the unavailable server from the replication system and to resynchronize data once it becomes available again; but that is the topic of a future blog entry.