A key factor in processing queries related to the data warehouse is parallel database query (PDQ), to read the pages in parallel. To do this, PDQ must be specifically activated and system resources must be configured and dedicated to its use. PDQ allows the database server to distribute the work for one aspect of a query among several processors. Table fragmentation enables you to store the parts of a table on different disks. PDQ delivers maximum performance benefits when the data that you query is in fragmented tables. PDQ can, depending on its configuration, be quite resource-intensive and must be used with caution.

1 PDQ configuration parameters

Primarily, six variables enable you to control PDQ:

  • PDQPRIORITY sets a reasonable or recommended priority value. MAX_PDQPRIORITY limits the PDQ resources that the database server can allocate to any one DSS query. MAX_PDQPRIORITY is a factor that is used to scale the value of PDQ priority set by users.
  • DS_TOTAL_MEMORY specifies the amount of memory available for PDQ queries.
  • DS_MAX_SCANS limits the number of Parallel Database Query (PDQ) scan threads that the database server can execute concurrently.
  • DS_MAX_QUERIES specifies the maximum number of queries that can run concurrently.
  • DS_NONPDQ_QUERY_MEM increases the amount of memory that is a available for a query that is not a Parallel Database Query (PDQ).

In the following list, you can see the formulas used in PDQ. A good understanding of these formulas can help to find the best setting for the PDQ parameters for your environment:

  • Memory quantum
    Memory is granted in units called a quantum. A quantum unit is the minimum increment of memory allocated to a query. The memory quantum is calculated with the following formula: $$ \texttt{memory quantum} = \frac{\texttt{DS_TOTAL_MEMORY}}{\texttt{DS_MAX_QUERIES}} $$
  • Minimum amount of decision-support memory
    When you assign a value to the configuration parameter DS_MAX_QUERIES, the database server sets the minimum amount of decision-support memory according to the following formula: min_ds_total_memory = DS_MAX_QUERIES * 128 kilobytes When you do not assign a value to DS_MAX_QUERIES, the database server uses the following formula instead, and is based on the value of VPCLASS CPU or NUMCPUVPS: $$ \texttt{min_ds_total_memory} = \texttt{NUMCPUVPS} * 2 * 128 \texttt{kilobytes} $$
  • Resources allocated
    When a query requests a percentage of PDQ resources, the database server allocates the MAX_PDQPRIORITY percentage of the amount requested, as the following formula shows: $$ \texttt{Resources allocated} = (\frac{PDQPRIORITY}{100}) * (\frac{\texttt{MAX_PDQPRIORITY}}{100}) $$
  • Memory for query
    The amount of memory that is granted to a single parallel database query depends on many system factors, but in general, the amount of memory granted to a single parallel database query is proportional to the following formula: $$ \texttt{memory_grant_basis} = (\frac{\texttt{DS_TOTAL_MEMORY}}{\texttt{DS_MAX_QUERIES}}) * (\frac{PDQPRIORITY}{100}) * (\frac{MAX_PDQPRIORITY}{100}) $$
  • Maximum number of scan threads per query
    You can limit the number of concurrent scans using the DS_MAX_SCANS. In fact, the resources that users can assign to a query are calculated by the following formula: $$ \texttt{scan_threads} = min (nfrags, (\texttt{DS_MAX_SCANS} * (\frac{pdqpriority}{100}) * (\frac{\texttt{MAX_PDQPRIORITY}}{100}) ) $$ Where:
    • nfrags is the number of fragments in the table with the largest number of fragments.
    • pdqpriority is the PDQ priority value set by either the PDQPRIORITY environment variable or the SET PDQPRIORITY statement.
  • Amount of shared memory for PDQ
    Use the following formula as a starting point for estimating the amount of shared memory to allocate to decision-support queries: $$ \texttt{DS_TOTAL_MEMORY} = \texttt{p_mem} - \texttt{os_mem} - \texttt{rsdnt_mem} - (128 kilobytes * users) - \texttt{other_mem} $$ Where:
    • p_mem is the total physical memory that is available on the host computer.
    • os_mem is represents the size of the operating system, including the buffer cache.
    • resdnt_mem represents the size of Informix resident shared memory.
    • users is the number of expected users (connections) specified in the NETTYPE configuration parameter.
    • other_mem is the size of memory used for other applications that are not IBM Informix.
    In general, as starting point for DSS environment, we set values as listed in
    Parameter Value
    PDQPRIORITY 100
    MAX_PDQPRIORITYY 100
    DS_TOTAL_MEMORYY 90% of SHMVIRTSIZE
    DS_MAX_SCAN Usually left as default value

You can monitor the PDQ behavior by using the onstat –g mgm command. PDQ queries use memory from the Virtual Shared Memory segments, not from the BUFFERS.

2 Database operations that use PDQ

2.1 Parallel update and delete operations

TO DO

This section is incomplete and will be concluded as soon as possible.

Informix performs some types of update and delete operations in parallel.

The database server takes the following two steps to process UPDATE and DELETE statements:

  1. Fetches the qualifying rows.
  2. Applies the action of updating or deleting.

The database server performs the first step of an UPDATE or DELETE statement in parallel, with the following exceptions:

  • The targeted table in a DELETE statement has a referential constraint that can cascade to a child table.
  • The UPDATE or DELETE statement contains an OR clause and the optimizer chooses an OR index to process the OR filter.
  • The UPDATE statement contains a subquery that the optimizer converts into a join.

2.2 Parallel insert operations

Informix performs some types of insert operations in parallel.

2.3 Parallel index builds

Index builds can take advantage of PDQ and can be parallelized. The database server performs both scans and sorts in parallel for index builds.

2.4 Parallel user-defined routines

If a query contains a user-defined routine (UDR) in an expression, the database server can execute a query in parallel when you turn on PDQ.

2.5 Hold cursors that use PDQ

When hold cursors that are created by declaring the WITH HOLD qualifier have no locks, PDQ is enabled.

2.6 SQL operations that do not use PDQ

The database server does not process some types of queries in parallel.

2.7 Update statistics operations affected by PDQ

An SQL UPDATE STATISTICS statement that is not processed in parallel, is affected by PDQ because it must allocate the memory used for sorting. Thus the behavior of the UPDATE STATISTICS statement is affected by the memory management associated with PDQ.

2.8 SPL routines and triggers and PDQ

Statements that involve SPL routines are not executed in parallel. However, statements within procedures are executed in parallel.

2.9 Correlated and uncorrelated subqueries

The database server does not use PDQ to process correlated subqueries. Only one thread at a time can execute a correlated subquery. While one thread executes a correlated subquery, other threads that request to execute the subquery are blocked until the first one completes.

2.10 OUTER index joins and PDQ

The database server reduces the PDQ priority of queries that contain OUTER index joins to LOW (if the priority is set to a higher value) for the duration of the query. If a subquery or a view contains OUTER index joins, the database server lowers the PDQ priority of only that subquery or view, not of the parent query or any other subquery.

2.11 Remote tables used with PDQ

Although the database server can process the data stored in a remote table in parallel, the data is communicated serially because the database server allocates a single thread to submit and receive the data from the remote table. The database server lowers the PDQ priority of queries that require access to a remote database to LOW.