Informix Warehouse Accelerator: HA Environment Integration in 11.70.FC5

Blog Content/Topics List


Icon: IWA Logo TruckA good how-to description of all the steps to be performed is already provided as the topic Configuring Informix Warehouse Accelerator on secondary servers in a high-availability environment in the IBM Informix Warehouse Accelerator Administration Guide at the IBM Informix 11.70 Information Center.

Rather than repeating the content of the manual, I'll try here to give some background information.

  1. Why use a secondary server with the Informix Warehouse Accelerator?

    • For query acceleration:

      Within a high-availability (HA) environment of Informix servers, there is a primary server that is best suited for OLTP workload, where data is frequently changed, inserted or deleted. The primary server has direct access to the disks in order to perform all required data changes directly. A secondary server in such a HA environment can also perform changes to data (update, insert, delete), but in order to do so, it always needs to route the change requests to the primary server to get them executed and then replicated to all the secondary servers of the HA environment.

      Therefore, a secondary server is more suited to performing queries that change only very little or no data at all. This is exactly the case for typical data warehouse queries comprising an OLAP workload. It is therefore quite desirable to let a secondary server perform such OLAP workloads and still benefit from acceleration by the Informix Warehouse Accelerator. With Release 11.70.FC5 this is now possible.

      It could be argued, that also the primary server would not be strained by accelerated queries as they get executed by the accelerator rather than the Informix server. However, an OLAP workload, e.g. for a specific report, can contain queries dealing with temporary tables to be created and used on the fly. Such queries cannot be accelerated, but with according configuration can be executed on a secondary server without burdening the primary server. In such cricumstances, using the secondary server is still beneficial to the overall workload distribution in the HA environment.

    • For data mart administration:

      Furthermore, loading data into a data mart in the accelerator is a demanding task for the database server, as all the data must be read from disk and sent over to the accelerator. Nevertheless, no changes to the user data need to occur in the database server. Only tiny amounts of meta data need to be created when creating a data mart, and even less such meta data needs to be changed after a data mart was loaded successfully, to reflect the new state "Active" for the data mart. Therefore, also data mart administration, including the loading of a data mart with the data, is suited very well for a secondary server.

  2. Necessary configuration

    • Updates on the secondary server:

      In order to be able to perform the required changes of the small amounts of meta data, the secondary server still needs to be an "updateable secondary" server. Hence, the secondary server must be configured accordingly with the parameter UPDATEABLE_SECONDARY in its onconfig file. Strictly speaking this is not necessary for pure query acceleration only. Nevertheless, it is recommended that every secondary server connecting to the accelerator be an "updateable secondary" server, to avoid errors and confusion when trying to do some administrative action for the accelerator, e.g. after an HA-failover has happened.

    • Virtual Processor for acclereator administration:

      For the same reason, the secondary server must be configured with the VPCLASS parameter to run the required virtual process (the "dwavp") handling administrative commands, and have access to the SmartBLOBSpace required as a staging area of user input and output for such administrative commands. If configuration parameters were changed or added, then the secondary server needs to be restarted to make these effective. With that, the secondary server basically is ready.

    (For examples and detailed how-to on the configuration tasks please see the manual.)

  3. Connecting the secondary server to the accelerator

    The only thing missing now is an actual connection to the accelerator. The necessary connection information is stored as a group entry in the sqlhosts file of the respective Informix server instance, where the accelerator name is represented as the group name.

    Example for an accelerator named "dwa1":

      dwa1    group   -       -       c=1,a=524e61686a285e6368325d767...
      dwa1_1  dwsoctcp        127.0.0.1       21022   g=dwa1
      

    Here the long string following the option "a=" is the encoded authentication token, that the database server uses when sending any requests to the accelerator.

    If all database servers of the HA environment to be connected to the accelerator share the same sqlhosts file (e.g. via shared disk or file system), then it is sufficient to have once established a connection to the accelerator from one of the database servers. All other database servers in the HA environment will then see the automatically created entry in the sqlhosts file and be able to utilize this. Thus even when the primary server previously had already established a connection to the accelerator, this will be sufficient for the secondary servers as they will see the corresponding entry in the shared sqlhosts file.

    However, if the sqlhosts file is not shared, then the connection still should be established only once (if not already done) and the resulting sqlhosts entry of that database server should only be copied manually to the sqlhosts files of the other servers in the HA environment.

    Two cases are possible that need slightly different handling:

    • The primary server of the HA environment already has an established connection to the accelerator:

      The connection information for this connection should also be used by the secondary server(s) intended to connect to the accelerator. As the sqlhosts file is not shared, the respective group entry needs to be copied manually to all secondary servers to be connected to the accelerator.

    • None of the servers in the HA environment has yet a connection to the accelerator:

      A new connection from one of the chosen servers must be established to the accelerator as described in the manual. Once this is done, the created sqlhosts entry needs to be copied manually to the sqlhosts files of the other servers.

    What happens, if a new connection is done even though another server already has an established connection?

    If one database server of the HA environment already has an established connection to the accelerator, but still a new connection is established from a different database server (with a different sqlhosts file), then the existing connection of the first database server will be invalidated. This is because upon the new connection request, the accelerator server will generate a new authentication token for the same accelerator name. This will be stored correctly in the sqlhosts file of the database server requesting that new connection. But without sharing the sqlhosts file, the first database server will have no knowledge of the new authentication token. Its sqlhosts file will still have the old authentication token, that has just been invalidated. When attempting to use this old authentication token, the accelerator will reject any request. Therefore, when not sharing the same sqlhosts file among the database servers of a HA envrionment, some extra care may be needed regarding the handling of the accelerator's connection info and distributing it among the database servers involved. The same is true when only renewing just the authentication token, as in some envrionments there will be a requirement to do this regularly.

Once all chosen database servers of the HA environment have access to the accelerator, it is probably good practice to determine one of them as the "master" for all administrative actions regarding data marts in the accelerator, and utilize with other connected database servers only the query acceleration. This will prevent confusion that could easily ensue when data marts are manipulated concurrently from different database servers. E.g. when from one database server a data mart is being loaded while at the same time from a different database server a status change is attempted for the same data mart.


Blog Content/Topics List