1 Designing and deploying data marts

After the Informix Warehouse Accelerator is installed and configured, the next step is to set up a data mart. A data mart is defined as a subset of a data warehouse or your schema, oriented to a specific business line or team. An enterprise’s data warehouse on the Informix database server can contain information from sales, inventory, customer service, market data, and the like. Your transactional schema is designed for maximizing the transactional throughput. You must select the subset that is required for analysis and specify star schema relationships.

A data mart defines fact and dimension tables and their relationships. The dimension tables typically contain fewer rows than fact tables, for example, product information and customer information. In some cases the dimension table can be large, for example, one that contains data on all California residents.

You can selectively accelerate subsets of the data marts and data by using Informix Warehouse Accelerator. For example, a sales manager might want to analyze sales and inventory data to understand trends and create suitable sales incentives. In this case, only the data marts with sales and inventory fact tables must be accelerated.

In the context of the accelerator, a data mart contains one or more snowflake schemas, each of which has one fact table and related dimension tables. In the snowflake schema that is shown in figure below, the fact table DAILY_SALES is related to dimensions that describe the business facts.

After the tables to create a data mart are identified, the DBA defines the relationships between the fact table and the dimension tables. The data mart validation step helps ensure that all the relationships between the tables are defined. The DBA must address any errors in this step before the data mart is deployed.

As part of data mart deployment, the administration tools ( OAT or Smart Analytics Optimizer Studio) send the data mart definition to the accelerator, which sends back the definition in SQL. This definition is saved as a special accelerated query table ( AQT) view within Informix system catalogs with a special flag and related information. The AQT view is later used to match queries and redirect the matching queries to the correct data mart and accelerator.

2 Loading the data

After the data mart is created (deployed), the next step is to load the data. In this step, a snapshot of data from the database server tables is sent to the accelerator. The accelerator distributes the data to each of its worker nodes. The worker node analyzes the data for frequently occurring values and the relationships between the columns, and then it partitions the data vertically and horizontally.

After partitioning, it compresses the data by using the deep columnar process that is described in the “Columnar storage” section of this white paper. After the compression, the worker node keeps the compressed data in memory with a copy on disk for persistence; no indexes, summary tables, or cubes are created. The data can be refreshed periodically from the Informix database server.

After the loading is complete, the data mart is ready for queries on IWA.

2.1 Data refresh

After the initial load, the data on Informix database can change. You can refresh the data in IWA by using one of these methods:

  • Full data reload
  • Refresh the partitions that are modified since the last refresh
  • Trickle feed the data from Informix to IWA as the data is being inserted

Full data reload requires you to disable the data mart and then do a complete load of the data mart. The new data is transferred just like the first load, then analyzed to create new compression dictionary, and then compressed. Automatic partition refresh detects the fact and dimension table partitions modified since the last refresh. It then removes those partition data from IWA and, when necessary, reloads the data from respective partitions on to IWA. Automatic partition refresh also handles partition add, drop, attach, or detach to fact and dimension tables. Automatic partition refresh is useful when your data mart is designed to handle time cyclic data management.

Partition refresh is incredibly fast compared to full data reload since it reuses the existing compression dictionary. Because of this reuse, a slight loss of compression efficiency is possible. IWA creates new versions of the modified rows and hence can run query concurrent to partition refresh and trickle feed.

Trickle feed helps you to achieve near-real time analysis of the data. Typically, customers collect transaction data and append the new data to the fact table (for example, sales table) and update on dimension table (customer, product pricing, and so on). After you set up the trickle feed, Informix collects all of the inserted data to the fact table into a staging area. At the designated interval, Informix sends the data that was inserted to the fact table to IWA and does a partition-based refresh on all of the dimension tables. Dimension tables are typically small and the refresh is usuallyfast. All of the data refresh options can be run and set up by using OAT or built-in stored procedures.

3 Query acceleration

The characteristics of a query determine if the query can be processed by the accelerator server. Some queries cannot be accelerated. A query can be sent to the accelerator server for processing only if the following conditions are met:

  • The query refers to the fact table.
  • The query refers to a subset of the tables in the data mart definition.
  • The table joins that are in the query match the table joins that are specified in the data mart definition.

3.1 Queries that benefit from acceleration

The database server uses information about the existing data marts to determine which types of queries should be sent to the accelerator server for processing. Only SELECT queries that refer to a fact table, or SELECT queries that join a fact table with one or more dimension tables are processed by the accelerator server. The following types of queries will benefit the most from being sent to the accelerator server:

  • Complex, ad hoc queries that look for trends or exceptions to make workable business decisions.
  • Queries that access a large subset of the database, often by using sequential scans.
  • Queries that involve aggregation functions such as COUNT, SUM, AVG, MAX, MIN, and VARIANCE.
  • Queries that often create reports that group data by time, product, geography,customer set, or market.
  • Queries that involve star joins or snowflake joins of a large fact table with several dimension tables.

3.2 Types of queries that are not accelerated

There are characteristics of queries that either will not benefit from being sent to the accelerator server, or will not be considered for acceleration.

3.2.1 Queries that will not benefit from being accelerated

Queries that only refer to a single, small dimension table do not benefit from being sent to the accelerator server for processing as much as queries that also refer to a fact table. Queries that return a large result set should be processed by the database server to avoid the overhead of sending the large result set from the accelerator server over the connection to the database server. If a query returns millions of rows, the total response time of the query is influenced by the maximum transfer rate of the connection.

For example, the following query might return a very large result set:

SELECT * FROM fact_table ORDER BY sales_date;

Queries that search only a small number of rows of data should be processed by the database server to avoid the overhead of sending the query to the accelerator server.

3.2.2 Queries that are not considered for acceleration

There are some queries that will not be processed by the accelerator server. Queries that would change the data cannot be processed by the accelerator server and must be processed by the database server. The data in the accelerator server is a snapshot view of the data and is read only. There is no mechanism to change the data in the data marts and replicate those changes back to the source database server. Other queries that are not processed by the accelerator server include queries that contain INSERT, UPDATE.

3.3 Supported and unsupported joins

Specific join types, join predicates, and join combinations are supported by Informix Warehouse Accelerator.

3.3.1 Supported joins

Equality join predicates, INNER joins, and LEFT OUTER joins are the supported join types. The fact table referenced in the query must be on the left side of the LEFT OUTER join. Any filter on a dimension table that you use in a LEFT OUTER join must be apost-join filter. A post-join filter is in the WHERE clause after the join.

3.3.2 Unsupported joinss

The following joins are not supported:

  • RIGHT OUTER joins
  • FULL OUTER joins
  • Informix outer joins
  • Joins that do not use an equality predicate Subqueries

4 Turning on query acceleration

Before queries can be routed to the accelerator server for processing, the use_dwa session environment variable must be set. For example, to enable acceleration use the following command:

The following SQL command enables the Informix® query optimizer to consider using the accelerator server to process the query when the optimizer generates the query plans:

SET ENVIRONMENT USE_DWA 'accelerate on';

The SET ENVIRONMENT use_dwa statement takes one argument, which is a string value within single quotation marks. Keywords are not case-sensitive.

Keyword Purpose Restrictions
accelerate Controls query acceleration. By default, queries are not accelerated.

accelerate on: Turns on acceleration. Queries that match one of the accelerated query tables (AQTs) are sent to the accelerator server for processing. Use this setting to accelerate queries.

accelerate off: Turns off acceleration. Queries are not sent to the accelerator server even if the queries meet the required criteria.
fallback Controls what happens when Informix Warehouse Accelerator cannot accelerate the queries. For example, the accelerator server is offline or the queries do not match one of the accelerated query tables (AQTs). The fallback on option is the default setting.

fallback off: If Informix Warehouse Accelerator cannot accelerate the queries, the queries are not sent to the Informix database server for processing. Use the fallback off option when you do not want subsequent queries that are processed by the Informix database server.

fallback on: If Informix Warehouse Accelerator cannot accelerate the queries, the queries are sent to the Informix database server for processing.
The fallback option applies only if you set the accelerate on option.

4.1 Examples

  • The following example turns on acceleration.
    set environment use_dwa 'accelerate on';
  • The following example removes all the probing data that was previously collected for the current database.
    set environment use_dwa 'probe cleanup';
  • The following example turns on acceleration and turns off fallback. The queries are not sent to the Informix database server for processing. Queries that cannot be accelerated by Informix Warehouse Accelerator will fail.
    set environment use_dwa 'accelerate on';
        set environment use_dwa 'fallback off';
  • The following example sets the debug option. By default, the debug information is appended to the online.log file.
    set environment use_dwa 'accelerate on';
        set environment use_dwa 'debug on';
  • The following example creates probing data for your queries, turns on debugging, and appends the debugging information to a file named /tmp/my_debug_file.
    set environment use_dwa 'probe on'; 
        set environment use_dwa 'debug on'; 
        set environment use_dwa 'debug file /tmp/my_debug_file';
  • The following example turns on debugging. The debug output of query 1 is written to the file /tmp/myDwaDebugFile. The debug output of query 2 is written to the default online.log file.
    set environment use_dwa 'debug on';
        set environment use_dwa 'debug file /tmp/myDwaDebugFile';
        select ... { query 1 }
        set environment use_dwa 'debug file';
        select ... { query 2 }
  • The following text shows example debug output. This output shows that a query has matched an AQT and is being sent to Informix Warehouse Accelerator for processing.
    15:00:35  SQDWA: select MIN(s_suppkey) from partsupp x0 left join supplier x1 on x0.ps_
        15:00:35  SQDWA: Identified 1 candidate AQTs for matching
        15:00:35  SQDWA: matched: aqt48a93f10-d192-404f-8911-e01d67aadde2
        15:00:35  SQDWA: matching successful (0 msec) aqt48a93f10-d192-404f-8911-e01d67aadde2
        15:00:35  SQDWA: offloading successful (22 msec)

5 Create an accelerator

You create an accelerator by connecting the Informix database server to the accelerator server (pairing). An accelerator is a logical entity that contains information for a connection from the database server to the accelerator server and for the data marts that are associated with that connection.

Set up the connection by using the ondwa getpin command to retrieve the IP address, port number, and pairing code from the accelerator server. For example:

# ondwa getpin 21022 4302

Now the accelerator can be created using the SQL administration funcion ifx_setupDWA. The syntax is as follows:

> ifx_setupDWA('ACC_NAME', 'IP_ADDRESS', 'PORT', 'PIN_CODE');

where ACC_NAME is an arbitrary name for the logical accelerator which will be created, and the other three parameters should be replaced by the valued from the getpin command. In our example is:

$ dbaccess demodb -
Database selected.

> EXECUTE FUNCTION ifx_setupDWA('IWA1DEMODB', '', '21022', '4302');

(expression)   The operation was completed successfully.

1 row(s) retrieved.

Check the official documentation for other methods to pair a IWA instance to the DB server: link.

6 Create a datamart from SQL Statement

IWA Accelerator allows to create a datamart definition from "probe data" stored by executing SQL statements.

Next SQL script allows to automatize Datamart definition from SQL Statement execution.

database dbs_dwh; -- Nos ubicamos en la BD en donde se encuentra la data

set environment use_dwa 'probe cleanup';           -- Limpiamos relevamiento de IWA
set environment use_dwa 'probe start';             -- Levantamos el relevamiento
set optimization environment fact 'my_fact_table'; -- Indicamos que va ser FACT TABLE

set explain on avoid_execute;                      -- Habilitamos explain para simular query y seleccionar tablas

  FROM my_fact_table, my_dimension
 WHERE my_fact_table.joinid = my_dimension.joinid;

set explain off;  -- Apagamos el explain
set environment use_dwa 'probe stop'; -- Apagamos el relevamiento

-- Convertimos la informacion almacenada en el probe (onstat -g probe) a una estructura
-- de definicion de datamart que se guarda en tablas especiales
-- Por eso lo hacemos en una base de datos independientes, ya que todas las
-- tablas creadas iwa_* son solo temporales.

drop database if exists probe_mart2;
create database probe_mart2 with buffered log;
database probe_mart2;

execute procedure ifx_probe2mart("dbs_dwh", "mart_name_2");

database dbs_dwh;

-- Solo debemos borrar el datamart si existe previamente
--execute function ifx_dropmart('IWAS3','mart_name_2');

-- Creamos un datamart real a partir de la estructura de definicion que tenemos en la base de datos probe_mart2
execute function ifx_createmart('IWAS3', 'mart_name_2', 'probe_mart2');

-- Cargamos los datos en el acelerador
--execute function ifx_loadmart('IWAS3','mart_name_2','NONE');

-- Guardamos una copia de la definición interna del datamart en XML
execute function lotofile(ifx_getMartdef('IWAS3', 'mart_name_2'), 'mart_name_2.xml','client');

7 Allow NCHAR,NVARCHAR data types in datamarts

If you define your datamarts using the probe method or using an DataMart XML file definition, you can use NCHAR or NVARCHAR data types in columns to accelerate.

Allowing NLS data types doesn't means IWA uses NLS to sort data. Informix IWA always use ASCII code (single byte) binary sorting.

To activate this hidden capability you need to set IWA_ALLOW_NCHAR in your environment variable before starting engine and in the session creating the datamart.