In the following document, we will give some guidelines about how to monitor an IBM Informix database instance.

It's not the goal of this document to become a Informix database administrator tutorial but to give some basic knowledge.

1 Database info

General database info: doing a query on the sysdatabase table you can find the general info of the databases.

select, sysdatabases.owner,  
            from sysdbspaces,sysdatabases  
            where partdbsnum(sysdatabases.partnum) = sysdbspaces.dbsnum

DBspace of database: you can use any of this statements to retrieve the main DBSpace where databases are located.

select DBINFO("DBSPACE",partnum) dbspace, name, owner
                from sysdatabases

2 Monitoring tools

Informix provides two major tools to monitor system and database performance:

  • The onstat utility.
  • Numerous system monitoring interface (SMI) tables in the sysmaster database, which is created automatically at the time of IDS first initialization.

Both the onstat utility and SMI tables monitor IDS performance by examining IDS shared memory activities, but there is a difference in the way of presenting those statistics. The onstat utility always presents statistics in a fixed way, whereas using SMI tables permits you to reorganize those statistics in a more meaningful, more readable format.

One thing we need to pay attention to is that statistics collected either by onstat or in the SMI tables are cumulative from the time of system reboot or IDS initialization. Therefore, we need to be very careful about those statistics, and always take into account IDS running time. For example, 100,000 bufwaits for a server that has been running over one month is a lot different that 100,000 bufwaits in a single day. To get current statistics, we need to do onstat -z to zero out the old values.

Informix also provides a graphical monitoring tool - onperf. Onperf collects IDS server performance statistics and plots it into metrics. It can also save those statistics into a text file for later analysis. Refer to the Performance Guide for Informix Dynamic Server for more details on the onperf utility.

IDS activities can be classified into three categories:

  • Instance activities
  • Database activities
  • Session activities
Using the tools discussed above, we can effectively monitor all those >IDS activities.

3 Monitoring configuration

3.1 Monitoring instance configuration

To show instance configuration parameters

$ onstat -g cfg
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line (Prim) -- Up 17:04:42 -- 7881924 Kbytes

Configuration Parameter List

name                      current value
ROOTNAME                  rootdbs
ROOTPATH                  /INFORMIXDEV/rootdbs
ROOTOFFSET                0
ROOTSIZE                  2048000
MIRROR                    0
MIRRORPATH                /home/informix/tmp/demo_on.root_mirror
MIRROROFFSET              0
DBSERVERNAME              dbsrv3
DBSERVERALIASES           ol_dbsrv3
SERVERNUM                 0
MSGPATH                   /home/informix/tmp/online.log
CONSOLE                   /home/informix/tmp/online.con
TAPEDEV                   /backup1/IFMX-ONTAPES
TAPESIZE                  80000000
TAPEBLK                   32
LTAPEDEV                  /backup1/IFMX-LBACKUP
LTAPESIZE                 80000000
LTAPEBLK                  32
PHYSFILE                  6195070
PHYSBUFF                  128
LOGFILES                  49
LOGSIZE                   100000
LOGBUFF                   64
DYNAMIC_LOGS              2
LTXHWM                    70
LTXEHWM                   80
RESIDENT                  1
LOCKS                     2000000
ONDBSPACEDOWN             2
CLEANERS                  16
SHMBASE                   0x44000000
CKPTINTVL                 300
TXTIMEOUT                 300

To display more information about each parameter, such as the default value, maximum and minimum values, and a description:

onstat -g cfg full MAX_PDQPRIORITY
IBM Informix Dynamic Server Version 14.10.FC1DE -- On-Line -- Up 01:09:15 -- 173208 Kbytes

Configuration Parameter Info

id   name                      type    maxlen   units   rsvd  tunable
218  MAX_PDQPRIORITY           INT4    12               *     *

     min/max : 0,100
     default : 100
     onconfig: 100
     current : 100

     MAX_PDQPRIORITY limits the PDQ resources that the database server can
     allocate to any one DSS query. MAX_PDQPRIORITY is a factor that is
     used to scale the value of PDQ priority set by users. For example,
     suppose that the database administrator sets MAX_PDQPRIORITY to
     80. If a user sets the PDQPRIORITY environment variable to 50 and
     then issues a query, the database server silently processes the
     query with a PDQ priority of 40.

To display only those parameters with current values that are not the same as the values in the onconfig file:

onstat -g cfg diff
id   name                      type    maxlen   units   rsvd  tunable
218  MAX_PDQPRIORITY           INT4    12               *     *

     onconfig: 100
     current : 0

To display only tunable parameters:

onstat -g cfg tunable
. . .
Tunable Configuration Parameters

id   name                      type    maxlen   units   rsvd  tunable
10   MSGPATH                   CHAR    257              *     *

     default : $INFORMIXDIR/online.log
     onconfig: $INFORMIXDIR/tmp/online.log
     current : /home/informix/tmp/online.log

id   name                      type    maxlen   units   rsvd  tunable
12   TAPEDEV                   CHAR    257              *     persistent-only

     default : /dev/null
     onconfig: /dev/tapedev
     current : /dev/tapedev
. . .

3.1.1 Change a configuration parameter

Some parameters can be modified on line. To do that, use the command -wf. For example, to change the MAX_PDQPRIORITY to 100 use the following command.

onmode -wf MAX_PDQPRIORITY=100


EXECUTE FUNCTION task("modify config persistent","MAX_PDQPRIORITY","100");

Use the reset config argument with the admin() or task() function to revert the value of a dynamically updatable configuration parameter to its value in the onconfig file:

EXECUTE FUNCTION task("reset config","DYNAMIC_LOGS");

Use the reset config all argument with the admin() or task() function to revert the values of all dynamically updatable configuration parameter to their values in the onconfig file:

EXECUTE FUNCTION task("reset config all");

3.1.2 Modify Groups of Configuration Parameters

Export a snapshot of the current configuration to a file for use as a configuration file that can be imported:

onmode -we /tmp/onconfig.exp


EXECUTE FUNCTION task("export config", "/tmp/onconfig.exp");

Import parameters from a configuration file and modify those parameters in the file that are tunable:

onmode -wi /tmp/onconfig.exp


EXECUTE FUNCTION task("import config", "/tmp/onconfig.exp");

3.1.3 Monitoring session enviroment

For a running session, you can inspect it's running environment parameters by doing.

$ onstat -g env SESSIONID
BM Informix Dynamic Server Version 12.10.FC9 -- On-Line (Prim) -- Up 17:10:14 -- 7881924 Kbytes

Environment for session 7561:

Variable            Value [values-list]
CLIENT_LOCALE       en_us.utf8
DBCENTURY           C
DBDATE              DMY4-
DBMONEY             ,
DBPATH              .
DBPRINT             lp -s
DBTEMP              /tmp
DB_LOCALE           en_us.utf8
IFX_UPDDESC         1
INFORMIXDIR         /home/informix
INFORMIXTERM        terminfo
LANG                en_US.UTF-8
LC_COLLATE          en_US.UTF-8
LC_CTYPE            en_US.UTF-8
LC_MONETARY         en_US.UTF-8
LC_NUMERIC          en_US.UTF-8
LC_TIME             en_US.UTF-8
LKNOTIFY            yes
LOCKDOWN            no
NODEFDAC            no
PATH                /usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/inf
SERVER_LOCALE       en_US.819
SHELL               /bin/bash
SINGLELEVEL         no
SKALL               0
SKINHIBIT           0
SKSHOW              0
SQLPID              2646863912
SUBQCACHESZ         10

4 Check logging status of the smart blob spaces

  1. Display the information for all the sbspaces in an instance by using this command:
    onstat -g smb s
  2. Look at the Defaults line in the output for the sbspace you are interested in:
    • If the Defaults line has LO_LOG the sbspace has logging turned on.
    • If the Defaults line does not have LO_LOG the sbspace has logging turned off.

This is a sample of the output from the onstat -g smb s command. In the sample output below, there are two (2) sbspaces:

  • The sbspace called ‘sbspace’ has LOGGING turned on.
  • The sbspace called ‘s9_sbspc’ has LOGGING turned off.
IBM Informix Dynamic Server Version 14.10FC8   -- On-Line -- Up 00:00:28 -- 25472 Kbytes

Sbspace Summary:

sbnum 2    address d1fa180
    Space    : flags    nchk     owner    sbname
               -------- 4        informix sbspace
    Defaults : LO_LOG

    LO       : ud b/pg  flags    flags    avg s/kb max lcks
               4096     0        -------- -1       -1
    Ext/IO   : 1st sz/p nxt sz/p min sz/p mx io sz
               0        0        0        -1

    HdrCache : max      free
               512      0
sbnum 3    address d1fa2c8
    Space    : flags    nchk     owner    sbname
               -------- 1        informix s9_sbspc
    Defaults :

    LO       : ud b/pg  flags    flags    avg s/kb max lcks
               4096     0        -------- -1       -1
    Ext/IO   : 1st sz/p nxt sz/p min sz/p mx io sz
               0        0        0        -1

    HdrCache : max      free
               512      0

5 Monitoring Instance activity

An IDS instance refers to Informix shared memory, Informix processors, Informix databases and physical devices allocated to Informix. The following are some of the most important instance activities we need to monitor.

5.1 Operating mode

The first and most important instance activity is, of course, the operating mode of IDS. Is IDS running all right or is it having some problems, or is it down ? The onstat -p command catches the current operating mode of IDS as follows:

$ onstat -p
IBM Informix Dynamic Server Version 12.10.FC7W1 -- On-Line (Prim) -- Up 15:24:48 -- 7443636 Kbytes

dskreads   pagreads   bufreads   %cached dskwrits   pagwrits   bufwrits   %cached
7981981    99445393   926195831  99.15   1109300    15536008   14594451   95.49  

isamtot    open       start      read       write      rewrite    delete     commit     rollbk
1301920791 8531560    7384228    203440144  3342031    201494     340591     245135     76

gp_read    gp_write   gp_rewrt   gp_del     gp_alloc   gp_free    gp_curs   
611267     2039       30099      4          2          0          2946      

ovlock     ovuserthread ovbuff     usercpu  syscpu   numckpts   flushes   
0          0            0          9207.19  673.70   188        188       

bufwaits   lokwaits   lockreqs   deadlks    dltouts    ckpwaits   compress   seqscans  
40669      1          75396970   0          0          19         488917     533180    

ixda-RA    idx-RA     da-RA      logrec-RA  RA-pgsused lchwaits  
74346425   1417536    1544886    5678       3283215    96748

The first line of the output displays the current IDS operating mode. In this case, Informix engine is "On-Line". There are six operating modes, of which three are particularly important:

  • Off-Line, indicates that IDS is not running
  • Quiescent, indicates that IDS is running in a single user mode where only DBA can do administrative and maintenance work
  • On-Line, indicates that IDS is running normal and all users can connect to the database server and perform any kind of database operations
In most circumstances, IDS should always be in the On-Line mode. If for some reason, IDS is down or in the Off-Line mode, the above command will display the following message:

Shared memory not initialized for INFORMIXSERVER 'axional_shm'

5.1.1 Message log

The message log is also referred to as the online log. It contains a variety of information about critical instance activities, such as time and duration of checkpoints, instance startup and shutdown, backup and recovery status, logical log backup status, and changes to major configuration parameters. Message log also contains critical errors (referred to as assertion failures by Informix) such as disk I/O errors, mirroring errors, down chunks, data integrity errors, shared memory errors and so on. When assertion failures occur, message log will usually point us to the relative assertion failure ("") file, which records more detailed information about instance activities when database engine went down and also will give us some suggestions on how to fix the problem.

$ onstat -m
IBM Informix Dynamic Server Version 12.10.FC7W1 -- On-Line (Prim) -- Up 15:30:35 -- 7443636 Kbytes

Message Log File: /home/informix/tmp/online.log
18:15:16  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 335, Llog used 4466

18:20:16  Checkpoint Completed:  duration was 0 seconds.
18:20:16  Tue May  2 - loguniq 47427, logpos 0x363c018, timestamp: 0x1ec4f37f Interval: 586894

18:20:16  Maximum server connections 97 
18:20:16  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 271, Llog used 1154

18:25:16  Checkpoint Completed:  duration was 0 seconds.
18:25:16  Tue May  2 - loguniq 47427, logpos 0x3856018, timestamp: 0x1ec55513 Interval: 586895

5.2 Chunk (disk) status

Chunks are the physical storage devices. They should always be online. If any chunk is down, it indicates data corruption and requires immediate attention. The onstat -d command monitors current chunk status and the following is the output of this command:

IBM Informix Dynamic Server Version 12.10.FC7W1WE -- Read-Only (Sec) -- Up 1 days 10:06:27 -- 5643752 Kbytes

address          number   flags      fchunk   nchunks  pgsize   flags    owner    name
56e4c028         1        0x801      1        1        2048     NL BA    informix rootdbs
5860b2d0         2        0x8801     2        1        2048     NLSBA    informix s_sbstd
5860b510         3        0x8801     3        1        2048     NLSBA    informix s_sbsys
5860b750         4        0xa001     4        1        2048     N UBA    informix s_sbtmp
5860b990         5        0x801      5        1        2048     NL BA    informix d_plog
5860bbd0         6        0x801      6        1        2048     NL BA    informix d_llog
5860d028         7        0x2001     7        1        2048     N TBA    informix t_tmp1
5860d268         8        0x2001     8        1        2048     N TBA    informix t_tmp2
5860d4a8         9        0x2001     9        1        2048     N TBA    informix t_tmp3
5860d6e8         10       0x2001     10       1        2048     N TBA    informix t_tmp4
5860d928         11       0x801      11       1        2048     NL BA    informix d_wics
5860db68         12       0x8801     12       1        2048     NLSBA    informix s_wics
5860dda8         13       0x801      13       4        2048     NL BA    informix d_data
5860e028         14       0x801      19       2        2048     NL BA    informix i_data
5860e268         15       0x801      23       2        2048     NL BA    informix d_auto
5860e4a8         16       0x801      25       1        2048     NL BA    informix i_auto
5860e6e8         17       0x801      31       3        2048     NL BA    informix d_cons
5860e928         21       0x8801     36       2        2048     NLSBA    informix s_cons
 18 active, 2047 maximum

address          chunk/dbs     offset     size       free       bpages     flags pathname
56e4c268         1      1      0          1024000    446069                PI-B-D /INFORMIXDEV/rootdbs
58610028         2      2      2          2560000    894659     2387625    PISB-D /INFORMIXDEV/s_sbstd
                                 Metadata 172322     128230     172322  
58611028         3      3      2          2560000    2387625    2387625    PISB-D /INFORMIXDEV/s_sbsys
                                 Metadata 172322     128230     172322  
58612028         4      4      2          2560000    -1         -1         POSB-- /INFORMIXDEV/s_sbtmp
58613028         5      5      2          2560000    59947                 PI-B-D /INFORMIXDEV/d_plog
58614028         6      6      2          2560000    59947                 PI-B-D /INFORMIXDEV/d_llog
58615028         7      7      2          2560000    2559947               PO-B-- /INFORMIXDEV/t_tmp1
58616028         8      8      2          2560000    2559947               PO-B-- /INFORMIXDEV/t_tmp2
58617028         9      9      2          2560000    2559947               PO-B-- /INFORMIXDEV/t_tmp3
58618028         10     10     2          2560000    2559947               PO-B-- /INFORMIXDEV/t_tmp4
58619028         11     11     2          2560000    2085189               PI-B-D /INFORMIXDEV/d_wics_01
5861a028         12     12     2          2560000    1903856    2387625    PISB-D /INFORMIXDEV/s_wics_01
                                 Metadata 172322     93117      172322  
5861b028         13     13     2          5120000    16                    PI-B-D /INFORMIXDEV/d_data_01
5861c028         14     13     2          5120000    6                     PI-B-D /INFORMIXDEV/d_data_02
5861d028         15     13     2          5120000    3876719               PI-B-D /INFORMIXDEV/d_data_03
5861e028         16     13     2          5120000    5119997               PI-B-D /INFORMIXDEV/d_data_04
5861f028         19     14     2          5120000    1655639               PI-B-D /INFORMIXDEV/i_data_01
58620028         20     14     2          5120000    5119997               PI-B-D /INFORMIXDEV/i_data_02
58621028         23     15     0          5000000    0                     PI-B-D /INFORMIXDEV/d_auto_01
58622028         24     15     0          5000000    2874484               PI-B-D /INFORMIXDEV/d_auto_02
58623028         25     16     0          2500000    2499947               PI-B-D /INFORMIXDEV/i_auto_01
58624028         31     17     0          5120000    5108936               PI-B-D /INFORMIXDEV/d_cons_01
58625028         32     17     0          5120000    5119197               PI-B-D /INFORMIXDEV/d_cons_02
58626028         33     17     0          5120000    5119997               PI-B-D /INFORMIXDEV/d_cons_03
58627028         36     21     0          5120000    4775328    4775328    PISB-D /INFORMIXDEV/s_cons_01
                                 Metadata 344619     168267     344619  
58628028         39     21     0          5120000    4775374    4775374    PISB-D /INFORMIXDEV/s_cons_02
                                 Metadata 344623     344623     344623  
 26 active, 32766 maximum

The above output contains two sections. The first section lists all dbspaces, and the second lists all chunks. In the Chunk section, we need to pay close attention to flags field. The first character of this field indicates if the chunk is a primary ( P) chunk or a mirrored ( M) chunk. The second character indicates the current status of the chunk, whether it is online ( O) or off-line ( D). Since O and D look very much alike, especially when you are in a hurry, you may want to pipe the results to grep PD, that is onstat -d |grep PD, just to make sure that you don't miss any down chunks. If there are any down primary chunks, you need to perform either a cold or warm recovery from backup tapes immediately to ensure data integrity.

We can also query syschunks and sysdbspaces tables in the sysmaster database to get similar statistics.

5.3 Checkpoint

A checkpoint is the process of synchronizing pages on disk with pages in the shared memory buffer pool. During checkpoints, IDS prevents user threads form entering critical session and blocks all transaction activities. Thus, if checkpoint duration is long, users may experience system hanging. This is especially true in OLTP environments where there are thousands of transactions and the response time is most critical.

Use onstat -m to see the last checkpoint logs

$ onstat -m
IBM Informix Dynamic Server Version 12.10.FC7W1WE -- On-Line (Prim) -- Up 09:56:45 -- 10800248 Kbytes

Message Log File: /home/informix/tmp/online.log
12:44:26  Checkpoint Completed:  duration was 0 seconds.
12:44:26  Wed May  3 - loguniq 47871, logpos 0xaf018, timestamp: 0x8fe6b9b6 Interval: 414059

12:44:26  Maximum server connections 17 
12:44:26  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 40, Llog used 48

12:49:26  Checkpoint Completed:  duration was 0 seconds.
12:49:26  Wed May  3 - loguniq 47871, logpos 0xbd018, timestamp: 0x8fe6cab2 Interval: 414060

Use the onstat -g ckp command to print checkpoint history and show configuration recommendations if a suboptimal configuration is detected.

onstat -g ckp
IBM Informix Dynamic Server Version 12.10.FC6WE -- On-Line (Prim) -- Up 10:20:02 -- 4524668 Kbytes


                                                                    Critical Sections                          Physical Log    Logical Log
           Clock                                  Total Flush Block #      Ckpt  Wait  Long  # Dirty   Dskflu  Total    Avg    Total    Avg
Interval   Time      Trigger    LSN               Time  Time  Time  Waits  Time  Time  Time  Buffers   /Sec    Pages    /Sec   Pages    /Sec
359400     10:10:20  CKPTINTVL  27029:0x3764018   0.1   0.1   0.0   0      0.0   0.0   0.0   7100      7100    2369     7      3699     12
359401     10:15:20  CKPTINTVL  27029:0x4f63018   0.1   0.1   0.0   0      0.0   0.0   0.0   8090      8090    3475     11     6143     20
359402     10:20:20  CKPTINTVL  27029:0x58de4ec   0.1   0.1   0.0   1      0.0   0.0   0.0   1907      1907    1548     5      2428     8
359403     10:25:21  CKPTINTVL  27029:0x6dcf018   0.1   0.1   0.0   0      0.0   0.0   0.0   5425      5425    4374     14     5361     17
359404     10:30:21  CKPTINTVL  27029:0x8801018   0.2   0.2   0.0   1      0.0   0.0   0.0   6740      6740    5714     19     6728     22
359405     10:35:21  CKPTINTVL  27029:0x9d72018   0.1   0.1   0.0   0      0.0   0.0   0.0   6592      6592    4179     13     5489     18
359406     10:40:21  CKPTINTVL  27029:0xa77225c   0.1   0.1   0.0   2      0.0   0.0   0.0   3325      3325    2937     9      2565     8
359407     10:45:22  CKPTINTVL  27029:0xb687018   0.1   0.1   0.0   0      0.0   0.0   0.0   3530      3530    3084     10     3861     12
359408     10:50:22  CKPTINTVL  27030:0x9a7018    0.1   0.1   0.0   0      0.0   0.0   0.0   5926      5926    3098     10     5744     19
359409     10:55:22  CKPTINTVL  27030:0x149e018   0.1   0.1   0.0   0      0.0   0.0   0.0   2459      2459    1908     6      2807     9
359410     11:00:22  CKPTINTVL  27030:0x31fa018   0.5   0.5   0.0   0      0.0   0.0   0.0   6846      6846    5192     17     7544     25
359411     11:05:22  CKPTINTVL  27030:0x40a8018   0.3   0.2   0.0   0      0.0   0.0   0.0   3983      3983    2227     7      3758     12
359412     11:10:22  CKPTINTVL  27030:0x5696018   0.2   0.1   0.0   1      0.0   0.0   0.0   6664      6664    4629     15     5633     18
359413     11:15:22  CKPTINTVL  27030:0x7221578   0.4   0.2   0.0   2      0.0   0.0   0.0   8914      8914    6296     20     7093     23
359414     11:20:22  CKPTINTVL  27030:0x8358018   0.1   0.1   0.0   0      0.0   0.0   0.0   4531      4531    2817     9      4409     14
359415     11:25:22  CKPTINTVL  27030:0x9f53018   0.1   0.1   0.0   0      0.0   0.0   0.0   7909      7909    5233     17     7163     23
359416     11:30:22  CKPTINTVL  27030:0xb419018   0.1   0.1   0.0   0      0.0   0.0   0.0   6210      6210    4661     15     5320     17
359417     11:35:22  CKPTINTVL  27031:0x44058c    0.8   0.4   0.0   0      0.0   0.0   0.0   5373      5373    3072     10     5017     16
359418     11:40:22  CKPTINTVL  27031:0x1105394   0.1   0.1   0.0   1      0.0   0.0   0.0   3463      3463    2683     8      3333     11
359419     11:45:23  CKPTINTVL  27031:0x2f55018   0.1   0.1   0.0   0      0.0   0.0   0.0   8292      8292    5349     17     7762     25

Max Plog       Max Llog       Max Dskflush   Avg Dskflush   Avg Dirty      Blocked
pages/sec      pages/sec      Time           pages/sec      pages/sec      Time
10240          1610           8              2924           2              0

If the checkpoint duration is consistently longer than 5 seconds, you may need to tune server configuration.

The database server prevents a user thread from entering in a critical section during a checkpoint. Sometimes a checkpoint can take several seconds or even a minute to complete, which can impact the customer activities and thus performance.

In a full checkpoint, the database server flushes all modified pages in the shared-memory buffer pool to disk. So this activity is based on:

  • Disk access.
  • CPU elaboration, to find the dirty pages and order them to perform a chunk write.
  • Informix processes, to manage the FLRU.
  • Number and length of the LRU queue.

The DBA can check how many user threads had to wait for the checkpoint to complete, this will indicate how serious the problem is. Use the onstat -p and check the value of ckpwaits. Reducing the time to perform a checkpoint will reduce the number of user thread that have to wait for the checkpoint to complete and thus increase the performance:

onstat -p |grep -A 1 ckpwaits
bufwaits   lokwaits   lockreqs   deadlks    dltouts    ckpwaits   compress   seqscans
516977     9          371771152  0          0          41         231015     313723

Usually the bottleneck can be in one or more of the following areas which are explained further, below:

  • Disk device(s)
  • Disk controller(s)
  • CPU(s)
  • AIO VP(s)
  • CPU VP(s)
  • Page cleaners
  • Number of LRU queues
  • Number of modified pages in the buffer cache
Disk device
If the disk drives you are trying to write data to are already at 100% busy, it is advised to spread the data more evenly on the disks you have available, by partitioning or fragmentation.
The Physical CPU(s)
In Unix CPU can be monitored by using sar, vmstat, or glance.If the CPU(s) are 100% busy all of the time, reduce contention or add more or faster CPU(s).
Monitor onstat -g ioq for queue lengths and max queue lengths of the AIO VP(s)/KAIO VP(s) and GFD(s). If there is a large max length (>25) or a consistent length (>10) on any of those queues, then those IO requests are not happening fast enough. If you are using AIO then add more AIO VPs or if you are using KAIO add more CPU VPs.
onstat -g ioq |grep "maxlen\|aio\|kio"
q name/id    len maxlen totalops  dskread dskwrite  dskcopy
  kio   0      0     65 63487443 63193106   294337        0
  kio   1      0     65 65595169 65320234   274935        0
  kio   2      0     65 41082306 40803204   279102        0
  aio   0      0      4      345       59        2        0
Check the GFD
In the onstat -g ioq the GFD (global file descriptors) are open file handlers to the chunks and one GFD per chunk exist in the instance. Check the columns dskread and dskwrite and compare all the chunks to check if you can find one or more chunks with a very high number of read or write. If this is the case then you have some table inside this chunk with a high number of activities, thus try to spread the tables in a better way. Note that the second column "ID" specifies the chunk id, which can be used with the onstat -d output to find the chunk name and position. Use the oncheck -pe to find the contents of the chunk. If the AIO queues are keeping up and performance is still not desirable, then maybe the requests are not getting to the queues fast enough check the page cleaner.
onstat -g ioq |grep "maxlen\|gfd"
q name/id    len maxlen totalops  dskread dskwrite  dskcopy
  gfd   3      0      0        0        0        0        0
  gfd   4      0      0        0        0        0        0
  gfd   5      0      0        0        0        0        0
  gfd   6      0     16      524      501       23        0
  gfd   7      0      0        0        0        0        0
  gfd   8      0      0        0        0        0        0
  gfd   9      0    128    31353    10684    20669        0
  gfd  10      0     16    25402     5316    20086        0
  gfd  11      0    128    26971     9997    16974        0
  gfd  12      0     16    22830     5579    17251        0
  gfd  13      0      0        0        0        0        0
  gfd  14      0      0        0        0        0        0
  gfd  15      0      0        0        0        0        0
  gfd  16      0      0        0        0        0        0
  gfd  17      0      0        0        0        0        0
  gfd  18      0      0        0        0        0        0
  gfd  19      0      0        0        0        0        0
  gfd  20      0      0        0        0        0        0
  gfd  21      0      0        0        0        0        0
  gfd  22      0      0        0        0        0        0
  gfd  23      0      0        0        0        0        0
  gfd  24      0      0        0        0        0        0
  gfd  25      0      0        0        0        0        0
  gfd  26      0      0        0        0        0        0
  gfd  27      0      0        0        0        0        0
  gfd  28      0      0        0        0        0        0
  gfd  29      0      0        0        0        0        0
  gfd  30      0      0        0        0        0        0
  gfd  31      0      0        0        0        0        0
  gfd  32      0      0        0        0        0        0
  gfd  33      0      0        0        0        0        0
  gfd  34      0      0        0        0        0        0
  gfd  35      0      0        0        0        0        0
  gfd  36      0      0        0        0        0        0
  gfd  37      0      0        0        0        0        0
  gfd  38      0      0        0        0        0        0
  gfd  39      0      0        0        0        0        0
  gfd  40      0      0        0        0        0        0
  gfd  41      0      0        0        0        0        0
  gfd  42      0      0        0        0        0        0
The page cleaners
Responsible for making the requests to the queues so adding more page cleaners can get those requests out faster. Usually the "cleaners" should be 1 cleaner per each pair of LRU or 1 cleaner per each chunk. The status of page cleaners can be checked with onstat -F. While the checkpoint is running execute the following command: onstat -F -r 1 and check the column "state" it should be flagged as "C" checkpoint and then check the column "data" which represents the chunk-id where the cleaner is working. If cleaner spends more time in some chunks then it means that there may be some congestion in this chunk, so you need to spread the tables in a better way so you need to do a "database reorganization".
Make sure that at a minimum there is one LRU queue per CPU VP. The CPU VPs will place a mutex on the LRU queue they are currently accessing to change a page in the buffer cache. If there are fewer queues than CPU VPs there could be a contention problem. Try to increase the number of LRU and check the performance again. With a large buffer cache, having more LRU queues will make the length of each queue shorter.
Check if you are using KAIO or AIO, usually KAIO is faster.
Check the "Checkpoint interval" in the ONCONFIG. If it is too high it means you will have a high number of pages that need to be flushed in the disk. Try to reduce the values of the CKPTINTVL.

5.4 Dbspace usage

It is very important for Informix database administrators to keep informed of the space in each dbspace. If one of the dbspaces is lacking or running out of space, then IDS will suffer. All kinds of problems may occur: we cannot import any databases, we can not create any tables and indexes, we can not even do insertion and update to any tables and indexes. This is very critical for production databases.

We need to monitor the growth of each dbspaces so that we can take more proactive approaches to those problems. The SQL script below reports current space usage of each dbspace, and calculates their percentage.

Notice that syschunks.chksize is computed in default system pagesize, not dbspace pagesize.

$ dbaccess sysmaster -
SELECT[1,18] name,
         format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(12) total,
         format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)) - 
         			  sum(syschunks.nfree   * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(12) used,
         round (100 - ((sum(syschunks.nfree)) / (sum(syschunks.chksize)) * 100), 2) pct_used
    FROM sysdbspaces,syschunks
   WHERE sysdbspaces.dbsnum = syschunks.dbsnum
     AND sysdbspaces.is_sbspace = 0
SELECT[1,18] name,
         format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(12) total,
         format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)) - 
         			  sum(syschunks.nfree   * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(12) used,
         round (100 - ((sum(syschunks.nfree)) / (sum(syschunks.chksize)) * 100), 2) pct_used
    FROM sysdbspaces,syschunks
   WHERE sysdbspaces.dbsnum = syschunks.dbsnum
     AND sysdbspaces.is_sbspace = 1
ORDER BY pct_used DESC;
name   nchunks  total      used      pct_used

d_data    8     76.3 GB   14.6 GB     19,19
rootdbs   1     4.77 GB    226 MB      4,62

5.5 Dbspace I/O

dbspace I/O is measured by disk reads and writes. If some dbspaces have heavy disk reads and writes while others scarcely have any, then the system may have some disk I/O bottlenecks. A well-balanced dbspace I/O will ease system disk I/O loads and, hence improve overall system performance. The following script will display current I/O statistics for each dbspace:

dbaccess sysmaster -    
 SELECT[1,18] dbspace,      
        fname [1,22],  
        sum(pagesread) dreads,
        sum(pageswritten) dwrites
   FROM syschkio c, syschunks k, sysdbspaces d
  WHERE d.dbsnum = k.dbsnum      
    AND k.chknum = c.chunknum
ORDER BY 3 desc;
dbspace            fname                            dreads          dwrites 

rootdbs            /INFORMIXDEV/rootdbs              17037            52568
d_data             /INFORMIXDEV/d_data_01             4079             2847
d_data             /INFORMIXDEV/d_data_02              784             1282
i_data             /INFORMIXDEV/i_data_01              689             1647
d_wics             /INFORMIXDEV/d_wics_01               37               20
d_ssd_llog         /INFORMIXDEV/d_ssd_llo               16            87586
s_sbstd            /INFORMIXDEV/s_sbstd_0               14                8
s_sbsys            /INFORMIXDEV/s_sbsys                 13                8
s_sbtmp            /INFORMIXDEV/s_sbtmp                 13                8
d_ssd_plog         /INFORMIXDEV/d_ssd_plo               13            53999
s_wics             /INFORMIXDEV/s_wics_01               13                8
d_data             /INFORMIXDEV/d_data_04               12                4
d_data             /INFORMIXDEV/d_data_03               10               28
d_tsdata           /INFORMIXDEV/d_tsdata_                9                8
d_data             /INFORMIXDEV/d_data_08                3                0
d_data             /INFORMIXDEV/d_data_07                3                0
i_data             /INFORMIXDEV/i_data_04                3                0
d_data             /INFORMIXDEV/d_data_05                3                0
i_data             /INFORMIXDEV/i_data_02                3                0
d_data             /INFORMIXDEV/d_data_06                3                0
i_data             /INFORMIXDEV/i_data_03                3                0
t_tmp1             /INFORMIXDEV/t_tmp1                   2               15
t_tmp2             /INFORMIXDEV/t_tmp2                   2               18

The goal is to achieve balanced disk reads and writes across all dbspaces. This is unrealistic in most cases, but the above output at least gives you an idea of how dbspace I/O is distributed and can help you to identify "hot" dbspaces - those dbapces with the most disk reads and writes. If disk reads and writes are extremely high for some dbspaces and extremely low for others, you may need to adjust or even rearrange your physical disk layout for Informix engine. We can get similar information using onstat -D, which displays disk reads and writes for each chunk, and onstat -g ioq which displays disk I/O waiting queue information.

5.6 Shared memory segments

Too many virtual share memory segments, usually more than three, indicate that the initial virtual share memory segment is too small and that the database engine has to constantly allocate additional virtual sheared memory segments. This adversely effects IDS performance and will eventually bring your system to its knees.

The onstat -g seg command displays how many shared memory segments your Informix database engine currently has:

$ onstat -g seg
IBM Informix Dynamic Server Version 12.10.FC7W1WE -- On-Line (Prim) -- Up 10:03:57 -- 10800248 Kbytes

Segment Summary:
id         key        addr             size             ovhd     class blkused  blkfree 
2359296    52564801   44000000         72552448         1288648  R*    17713    0       
2392065    52564802   48531000         4194304000       49153848 V     134900   889100  
2424834    52564803   142531000        6792036352       1        B*    1658212  0       
2457603    52564804   2d7295000        561152           7848     M     136      1       
Total:     -          -                11059453952      -        -     1810961  889101  

   (* segment locked in memory)
Virtual segment low memory reserve (bytes):16777216 
Low memory reserve used 0 times and used maximum block size 0 bytes

If the output shows more than three virtual shared memory segments ( V), you need to increase the value of SHMVERSIZE parameter in your configuration file. The idea is to let IDS allocate enough virtual shared memory at initialization so that it doesn't need to allocate more when users are logging onto the system and doing database operations. You may also want to use UNIX ipcs command to see sizes of Informix shared memory.

5.7 Overall operating system performance

Since the Informix database engine is always installed on an operating system, mostly Linux, to accurately monitor or evaluate IDS performance, we need to take into account the behavior of the operating system as a whole, especially if the database engine resides on a non-dedicated database server.

If IDS takes too much RAM, for instance, if your system has 16GB RAM and IDS takes 15GB or more for its shared memory, your operating system may experience heavy swapping and hanging when users perform memory-intense operations. When there is not enough memory, the system has to swap some data pages from memory to disk to leave more spaces for new data. And if the system is short of memory, the CPU may also suffer.

Use the Linux top to monitor overall operating system CPU and memory utilization

$ top
top - 13:11:22 up 17 days, 12:29,  1 user,  load average: 0.08, 0.08, 0.12
Tasks: 134 total,   1 running, 133 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16433156 total,   227860 free,   194448 used, 16010848 buff/cache
KiB Swap:  2097148 total,  2043696 free,    53452 used.  8544116 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                              
27061 informix  10 -10 10.466g 7.052g 7.047g S   0.0 45.0  31:36.20 oninit                                                               
27070 informix  10 -10 10.466g 5.866g 5.861g S   0.0 37.4  18:59.01 oninit                                                               
27071 informix  10 -10 10.466g 5.477g 5.473g S   0.0 35.0  10:35.18 oninit                                                               
27068 root      10 -10 10.470g  91676  88168 S   0.0  0.6   0:00.12 oninit                                                               
  379 root      20   0   98252  45988  45864 S   0.0  0.3   0:14.85 systemd-journal                                                      
  577 root      20   0  278776  25868  25512 S   0.0  0.2   0:04.01 rsyslogd                                                             
29942 informix  10 -10 10.477g  13324   7620 S   0.0  0.1   0:00.74 oninit                                                               
27064 root      10 -10 10.461g  10896   5980 S   0.0  0.1   0:03.55 oninit                                                               
27067 root      10 -10 10.452g   9800   6384 S   0.0  0.1   0:00.87 oninit                                                               
27072 root      10 -10 10.450g   7228   3820 S   0.0  0.0   0:12.81 oninit                                                               
27073 root      10 -10 10.450g   5648   2240 S   0.0  0.0   0:00.60 oninit                                                               
30251 root      10 -10 10.461g   5628    712 S   0.0  0.0   0:00.05 oninit                                                               
30264 root      20   0  141272   5304   4036 S   0.0  0.0   0:00.06 sshd                                                                 
 1263 root      20   0  553156   5296   2752 S   0.0  0.0   2:56.45 tuned                                                                
30171 postfix   20   0   91240   3916   2920 S   0.0  0.0   0:00.00 pickup                                                               
27065 root      10 -10 10.450g   3872    464 S   0.0  0.0   0:00.58 oninit                                                               
27066 root      10 -10 10.450g   3872    464 S   0.0  0.0   0:00.57 oninit                                                               
27069 root      10 -10 10.450g   3868    460 S   0.0  0.0   0:00.56 oninit                                                               
  578 root      20   0  323576   3648   2748 S   0.0  0.0   0:02.11 firewalld                                                            
    1 root      20   0   41496   3272   1968 S   0.0  0.0   0:23.31 systemd                                                              
  636 root      20   0  430360   3192   2404 S   0.0  0.0   0:02.45 NetworkManager                                                       
 1499 postfix   20   0   91416   3012   2904 S   0.0  0.0   0:01.38 qmgr                                                                 
  832 polkitd   20   0  528268   2636   2184 S   0.0  0.0   0:01.88 polkitd                                                              
30316 informix  20   0  157680   2232   1560 R   0.0  0.0   0:00.02 top

6 Monitoring database activity

6.1 Table extents

When tuning performance for an IBM Informix instance, one important issue to look for is the presence of tables and indexes containing a large number of extents.

Tables and indexes that are spread over a number of extents can cause significant performance impact, so such fragmentation is to be avoided. Fragmentation occurs when a table is initially created with too small an extent size, so that over time many extents must be added as the table and associated indexes grow. Table space is allocated in blocks of disk called extents. A table is created with an initial first extent size, and additional extents are added as needed with a size determined by the next extent size. The default extent size for Informix is 8 pages. If not configured properly at table creation, a large table may grow to be comprised of numerous small extents fragmented across the drive and interleaved with extents from other tables or indexes. The resulting disk layout can result in a performance hit on table scans as the system must read through a series of noncontiguous data fragments.

To avoid this problem, you should try to properly size the initial and next extent sizes at table creation. However, if sized incorrectly at first, table extent sizes can later be modified using the ALTER TABLE command. The size of the first, next or both extent sizes may be specified. The new first extent size will be stored in the system catalog, but will not take effect unless the table is rebuilt.


Unlike table extent allocation, the size can’t be altered after the index is created. The index will need to be recreated using the new extent sizes in order to defragment the index.

CREATE INDEX cust_idx1 ON customer . . . EXTENT SIZE 200 NEXT SIZE 20;

When reorganizing tables and indexes, you should take into account the current size and expected growth in order to properly allocate new extent sizes. Optimal extent sizing and defragmentation of interleaved tables and indexes can have a significant impact on the performance of your Informix server.

To determine tables with high number of extents use the following SQL:

database sysmaster;
select dbsname[1,20],tabname[1,20], count(*) num_of_extents, sum( pe_size ) total_size
from systabnames, sysptnext
where partnum = pe_partnum
group by 1, 2 
having count(*) > 40
order by 3 desc, 4 desc
dbsname              tabname                num_of_extents       total_size 

wic_conf             i_wic_user_soaplogs_               94           237493
wic_conf             wic_user_trxlogs                   90          1388500
d_icon               TBLSpace                           76            37600
d_wics               TBLSpace                           67            28000
wic_iges             _temptable                         66              804
wic_conf             i_wic_mobile_devices               62           159156

To defragment a database table, you can execute the following database task.

database sysadmin;    
EXECUTE FUNCTION task(“defragment”,“dbname:informix.tablename");

6.2 Shrink table space

When you delete rows from a table it's space is not reasigned as chunk space. To recover space from a table to fit it's real needs use the following commands.

database sysadmin;    
EXECUTE FUNCTION admin(“table shrink”, “table_name”, “database_name”, “owner_name”);

This it's also applied to indexes.

database sysadmin;    
EXECUTE FUNCTION admin(“index shrink”, “index_name”, “database_name”, “owner_name”);
Replace table and index name by the table or index to pack, database and owner. Owner can be alway set to informix.

6.3 Index levels

The number of index levels may also adversely effects performance. The more index levels, the more probes IDS needs to get to index leaf nodes. Furthermore, if a leaf node gets split or merged, it may take more time for the whole index to adjust to this change. For example, if an index has only two levels, only two levels need to be adjusted, but if it has four levels, then all four levels need to be adjusted accordingly. The time used for this adjustment is, of course, much longer. This is especially true in an OLTP environment where there are constant inserts, deletes, and updates which will cause indexes to be constantly split or merged.

The following script identifies how many levels each index has:

DATABASE mydb;                
  SELECT idxname[1,40], levels 
    FROM sysindexes 
ORDER BY 2 desc;
If any index has more than 4 levels, you may need to consider dropping and rebuilding it to consolidate its levels for better performance.

6.4 Index uniqueness

A highly duplicate index can severely impact performance for updates and deletes. Suppose you have an index on column customer_type on table customer, and there are only five possible customer_type codes. If the table has one million rows, there could potentially be 200,000 rows with the same customer_type code. The B-tree would store the key value, followed by a list of pointers to each of the physical rows. The problem occurs when you have to delete or update any of the key values. IDS must search through all the duplicates until it finds the correct key to delete or update!

The following is a script to identify highly duplicate indexes:

DATABASE mydb;                
 SELECT tabname[1,18], 
        round((nunique/nrows)*100,0) pct_uniq
    FROM systables t, sysindexes I
   WHERE t.tabid = i.tabid
     AND t.tabid > 99   
     AND nrows > 0      
     AND nunique > 0;

Ideally, all values appearing in nunique column should equal all values in nrow column, meaning that every key in the index is unique. Based on the number of rows (nrows column above) and number of unique keys (nunique column above), we can calculate the percentage of niqueness of each index as: $$(nunique/nrows)*100$$

The higher the percentage, the more unique the index is. To avoid the performance bottleneck of a highly duplicate index, you can replace the original index with a composite index that combines the highly duplicate column and a more unique column. Using the example above, you can add the primary key column customer_id to the original index and make it a composite index (for example, "create index index_name on customer (customer_type, customer_id)").

6.5 Sequential scans

Sequential access to a table is sometimes harmful to performance, since database engine has to scan the whole table to pick up the rows that satisfy query's conditions. If tables are small, say couple of hundred rows, it is ok; because when database engine first scan it, the table will be reside in the memory and next time when database engine scan it, all data in that table can be retrieved direct from memory. This is actually the efficient way of using sequential scans. But if the tables are huge, say tables with more than 100,000 rows, repeated sequential scans are deadly to performance.

The following script will identify tables with multiple sequential scans:

database sysmaster;    
  SELECT dbsname[1,18], tabname[1,18], sum(seqscans) seqscans 
    FROM sysptprof 
   WHERE seqscans > 0 
     AND dbsname not like "sys%" 
     AND tabname not like "sys%"
dbsname            tabname                    seqscans

wic_foodmart       wic_jrep_object                  83
wic_cpub           wic_xsql_script                  79
demo_sports        galmacen                         76
demo_sports        csyn_tablas                      61
If a big table appears in the list, you may need to think of adding some indexes to this table or using program directives to force internal query optimizer choose indexes for accessing data in this table rather than sequential scans.

7 Monitoring session activity

Statistics about session activities are very useful in identifying potential performance problems and troubleshooting. What session activity statistics can we collect using the monitoring tools discussed earlier in the article ?

Clean statistics before


onstat -z

before run the queries to get session information.

7.1 High CPU usage

  1. You can do onstat -u to get session ids then onstat -g ses [sessionid] to get threads. Finally do onstat -g ath to list threads and and look at the ones which have a status of running.
  2. You can see active threads with
    onstat -g act -r 2
    This will show you active threads every 2 seconds. If you will see same thread repeatedly, then you can use value of 'rstcb' column of previous output to catch session:
    onstat -u | grep value_of_rstcb
  3. Then you can get sql causing high CPU utilization with onstat -g ses session_id" You can use this sql to create a list of sessions sorted on the highest amount of cpu and including the sql the session is currently running ( note that the current sql might not be the one that used so much cpu ):
    SELECT s.sid, s.username, s.uid,, s.hostname, t.tid,, t.statedesc, t.statedetail, q.sqs_statement, t.cpu_time
    FROM syssessions s, systcblst t, sysrstcb r, syssqlstat q
    WHERE t.tid = r.tid AND s.sid = r.sid AND s.sid = q.sqs_sessionid
    ORDER BY t.cpu_time desc;

7.2 Session statistics

The syssessions table in the sysmaster database stores general information about each session, such as login name, login time, host machine form which the session was logged in, operation system's process ID, current state, and so on.

select sid, username[1,20], hostname[1,20], connected logint_time,
hex(state) s_state
from syssessions
order by logint_time
sid username             hostname             logint_time s_state

         16 informix                                   1493773277 0x40080201
         20 informix                                   1493773277 0x40080201
         42 informix                                   1493773284 0x00080001
         43 informix                                   1493773284 0x00080001
         46 informix                                   1493773284 0x00090001
         45 informix                                   1493773284 0x00080001
         52 informix             com2                  1493773319 0x00080001
         56 informix             web1                  1493773800 0x00080001
        243 informix             web1                  1493794774 0x00080001
        244 informix             web1                  1493794798 0x00080001

7.2.1 Session profile

The syssesprof table in the sysmaster database provides more detailed information about each session. With the following query, you can get a better idea how each session is interacting with the database:

select syssessions.sid,
       (isreads+bufreads+bufwrites+pagreads+pagwrites) access,
from syssesprof, syssessions
where syssesprof.sid = syssessions.sid
sid username                     access         dsksorts    locksheld   seqscans  total_sorts  dsksorts
       1678 informix                   11912524               19      0           0           1         1
         45 informix                    6672998                0      0           0           1         1
       1886 informix                    6452951                8      0           0           0         0
       1961 informix                    5721628                8      0           0           1         0
       1883 informix                    5680407               12      0           0           1         0

  • access field shows how often the session is hitting the database.
  • locksheld shows how many locks each session is using.
  • seqscans indicates how frequently each session is using sequential scans to access data; if this number is too high, say more than 100,000, you may want to question whether the session ever uses indexes to retrieve data, and examine its query execution plan more closely to see if it is optimal.
  • total_sorts and dsksorts indicate how efficiently each session uses memory to do sort operations.
You may calculate the percentage of how each session using memory to sort with the following formula:

$$((total_sorts - dsksorts)/total_sorts)*100$$

The higher the percentage, the more efficient are your sort operations.

7.2.2 Locks

You can also join the syssessions table with syslocks table to get more detailed information about locks, such as which table in which database is currently locked by which session to help you identify potential lock conflicts among users:

select owner, username, hostname, dbsname, tabname, type
  from syssessions s, syslocks l
 where sid  = owner
   and tabname not like "sys%"
Owner   username    hostname        dbsname        tabname       type 
1422    alba        web2            demo_1          customer        S 
1567    marc        web1            demo_1          products        S 
2237    tomas       web1            demo_1          orders          X

If there are some conflicts with lock usage, for instance if one user needs exclusive access to a table that has been locked by others, you can easily identity the owner of that lock and, based on the priority of users, issue onmode -z sid command to kill the session and release the lock; sid is a number picked up from owner field in the above output.

7.2.3 Session I/O

This SQL statement is retrieving session information and analyzes which session is creating more IO Waits in Informix Engine:

SELECT  max(trim(a.username)||'@'||trim(decode(length(a.hostname),0,'localhost',a.hostname))::lvarchar) as user,
        max(a.progname) as progname,
        trunc(sum(c.cpu_time),2) as cpu_time,
        trunc(sum(b.iowaittime),2) as iowaittime,
        sum(b.upf_isread)   as isreads,    sum(b.upf_iswrite) as iswrites,
        sum(b.upf_isrwrite) as isrewrites, sum(b.upf_isdelete) as isdeletes,
        sum(b.upf_iscommit) as iscommits,  sum(b.upf_isrollback) as isrollbacks,
        sum(b.upf_seqscans) as seqscans,   sum(b.nreads) as pagereads,
        sum(b.nwrites)      as pagewrites, sum(b.upf_srtspmax) as max_sortdiskspace,
        case when bitval(max(b.flags), '0x4') > 0 then 'Lock'
             when bitval(max(b.flags), '0x2') > 0 then 'Latch'
             when bitval(max(b.flags), '0x10') > 0 then 'Checkpoint'
             when bitval(max(b.flags), '0x8') > 0 then 'Buffer'
             when bitval(max(b.flags), '0x1000') > 0 then 'LogBuffer'
             when bitval(max(b.flags), '0x40000') > 0 then 'Transaction'
             when max(c.wreason) > 0 then max(c.wait_reason)
             else 'None' 
         end as waitingfor
  FROM sysscblst a, sysrstcb b, systcblst c
 WHERE a.sid = b.sid
   AND b.tid = c.tid
 GROUP BY a.sid,
HAVING max(c.wait_reason) = "IO Wait"
 ORDER BY 6 desc
user               informix@dbsrv1.local
sid                3360
pid                -1
cpu_time           30,67
iowaittime         0,80
isreads            729781
iswrites           64088
isrewrites         33
isdeletes          1
iscommits          570
isrollbacks        0
seqscans           3055
pagereads          878314
pagewrites         928
max_sortdiskspace  0

Persistent sessions present as results or waiting for a long iowaittime should be analized with command: onstat -g sql [sid].

7.2.4 Session memory

This SQL statement is retrieving session information and analyzes which session is creating more memory in Informix Engine:

SELECT s.sid, s.username[1,15], s.hostname[1,15],
       sum(p.po_usedamt + p.po_freeamt) total_memory,
       sum(p.po_usedamt) used_memory
  FROM syssessions s, syspoollst p
 WHERE s.sid = p.po_sid
   AND s.sid != DBINFO('sessionid')
   AND p.po_name matches s.sid || '*'
  GROUP BY 1,2,3
sid username        hostname            total_memory      used_memory

34 informix                                  622592           483808
36 informix                                  577536           466664
35 informix                                  561152           465928
40 informix        pdemodb                   102400            97984
33 informix                                  102400            86816

7.3 The most long running transaction

The following query returns the transaction that has been running the longest.

This is the transaction that has gone through more logical logs since it started. Do not confuse with the transaction that has consumed more logical logs.

      HEX(systxptab.address) address,
      systxptab.istar_coord tx_coordinator,
      syssessions.sid, syssessions.username, syssessions.hostname,
      dbinfo('UTC_TO_DATETIME',connected) conection_time,
      syslogfil_beg.number  log_begin,
      syslogfil_end.number  log_end,
      syslogfil_beg.uniqid  uniqid_begin,
      (select l.uniqid from syslogs l where l.is_current = 1) uniqid_current,
      ((select l.uniqid from syslogs l where l.is_current = 1) - syslogfil_beg.uniqid + 1) numlogs,
      CASE WHEN syslogfil_beg.filltime = 0 THEN null::datetime year to second
               ELSE dbinfo('UTC_TO_DATETIME', syslogfil_beg.filltime) END  filltime,
      CASE WHEN syslogfil_beg.filltime = 0 THEN null
               ELSE current - dbinfo('UTC_TO_DATETIME', syslogfil_beg.filltime) END  filltime_duration,
      CASE WHEN systxptab.flags = 541699 THEN 'BEGIN'
           WHEN systxptab.flags = 533507 THEN 'ROLLBACK'
           ELSE '' END  tx_state,
  FROM systxptab
      ,syslogfil syslogfil_beg
      ,syslogfil syslogfil_end
      ,OUTER(sysrstcb      ,OUTER syssessions, OUTER syssqlstat)
       systxptab.logbeg = syslogfil_beg.uniqid AND
       systxptab.loguniq = syslogfil_end.uniqid AND
       systxptab.address = sysrstcb.txp AND
       sysrstcb.sid = syssessions.sid AND
       sysrstcb.sid = syssqlstat.sqs_sessionid AND
       syslogfil_beg.used > 0


7.4 Query statistics

Query statistics are critical in troubleshooting and query optimization. The onstat -g sql sid command catches queries and related statistics for current sessions; where sid is the session ID that can be plugged in either manually by hard code or dynamically by Linux shell scripts.

For example, if you want to know what query a session is executing, you may first use the command onstat -g ses to find out its session ID and then plug this ID into the above command.

8 Monitoring temporary tables in use

The following query shows the temporary tables that are still in use by an opened session:

SELECT sid, nptotal , npused, npdata, nrows, dbspace
  FROM systabnames,sysptnhdr, sysdbspaces
 WHERE systabnames.partnum = sysptnhdr.partnum
   AND bitval(sysptnhdr.flags, 32) = 1
   AND (sysptnhdr.ncols != 0 OR (sysptnhdr.ncols = 0 AND sysptnhdr.nkeys = 0))
   AND systabnames.tabname != '_temptable'
   AND partdbsnum(systabnames.partnum) = sysdbspaces .dbsnum

SELECT sid, COUNT(*) num, SUM(nptotal) nptotal, SUM(npused) npused , SUM(npdata) npdata, SUM(nrows) nrows
  FROM aux

9 Monitoring performance

The goal of monitoring databases is to assess how a server is performing. Effective monitoring involves taking periodic snapshots of current performance to isolate processes that are causing problems, and gathering data continuously over time to track performance trends.

Ongoing evaluation of the database performance helps you minimize response times and maximize throughput, yielding optimal performance. Efficient network traffic, disk I/O, and CPU usage are key to peak performance. You need to thoroughly analyze the application requirements, understand the logical and physical structure of the data, assess database usage, and negotiate tradeoffs between conflicting uses such as online transaction processing (OLTP) versus decision support.

9.1 Update statistics

Use the UPDATE STATISTICS statement to update system catalog information that the query optimizer uses for operations on objects in the local database.

Massive database updates

During heavy database changes, during the process of inserting or deleting or many, remember to perform regular UPDATE STATISTICS (at intervals of 10% of process) to ensure database take in account all those changes.

9.2 Checking lock level

During transactions database locks data that is being modified till transactions finishes. This can produce two situations.

  • Timeout - A timeout occurs when an operation exceeds it's waiting max period for a locked resource.
  • Deadlock - A deadlock is a circular wait condition. When a deadlock occurs, a user is waiting on a resource that cannot be freed until he releases a resource he owns. He cannot release a resource while he is waiting.

Many times you can eliminate frequent lock contention by using row level locks as opposed to page level locks (the default).

To determine lock level on any table of a database you can use:

     DECODE(locklevel, "P" , "Page" , "R" , "Row") 
FROM systables

To modify the lock leve of a table you can use:

You may also need to modify (increase) the LOCKS parameter to a higher value in your ONCONFIG file, if you change a table to use row level locking.

9.3 Disk contention

In any database environment, disk access is a key factor for performance.

9.3.1 Checking tables I/O

You can further identify what tables have the most disk reads and writes by querying the sysptprof table in the sysmaster database:

dbaccess sysmaster -    
  SELECT dbsname[1,18], 
         (isreads + pagreads) diskreads, 
         (iswrites + pagwrites) diskwrites 
    FROM sysptprof 
ORDER BY 3 desc, 4 desc;
dbsname            tabname                   diskreads       diskwrites

demo_apps_all      apps_sfa_task_assi             2258             2683
demo_sii           systables                       238               14
sysmaster          sysviews                        151                0
sysmaster          sysprocedures                   145                0
sysadmin           sysprocedures                   144                0
sysadmin           mon_table_names_id              127              161
sysadmin           mon_table_names                 112              203
sysadmin           ph_run                          101             1537
sysadmin           mon_chunk                        99              477
sysadmin           systables                        80                0
sysadmin           mon_checkpoint                   70              387
sysadmin           ix_ph_run_03                     69             1098
demo_apps_all      i_apps_sfa_task_as               68              116

9.3.2 Logical log

Each time you create or alter a database object (table, index, sequence, etc.) a logical log record is written to the logical log (even if transaction logging is disabled). Each time you run a backup, a checkpoint happens, you add a chunk or dbspace or a new extent is allocated a logical log record is written to the logical log.

When the current logical log fills up the engine starts using the next logical log and kicks off a backup for the recently filled logical log. The logical log backup is initiated through the script defined by ALARMPROGRAM in your ONCONFIG or the engine marks it backed up if LTAPEDEV is set to /dev/null.

You can see database server logical logs by using:

$ onstat -l

Size of the Logical Logs

If you do not have enough logical log space you will run into problems. The logical logs are used in a circular fashion, when a new logical log is needed, Informix starts overwriting the data in the logical log that contains the oldest data as long as it is

  1. marked as backed up and
  2. does not contain any open transactions
If either 1 or 2 are met, the engine will block until neither 1 or 2 are true. Having sufficient logical log space won't help with 1, but it will help with 2.

There is no penalty for having too much logical log space, but you need to size your individual logical logs to be big enough that you're not switching logs every few seconds but not so big that they never fill up and therefore are never backed up. Having super huge logical logs that contain 3 days worth of transactions before they are backed up exposes you a little bit to more data loss. If your system totally dies and the disk that holds the logical logs is destroyed and you can't perform a log salvage during a cold restore to get the logical log data from the logs that are not backed up, well you just lost 3 days of transactions.

So, determine the size of logical logs to support the peak activity and divide by a number. For example, if we estimate a 2GB of total size of logical logs and we split it into 16 logical logs we will have each log of 128MB.

Logical log backups should be in a disk out of the primary storage to allow a complete recovery in case of primary failure.

Location of logical logs

If you are spreading out the I/O to multiple disks and you have disks to spare, try creating two logical log dbspaces on two different drives and alternate the creation of logical logs between these two dbspaces. For example, create logical log 1 in llogdbs01, logical log 2 in llogdbs02, logical log 3 in llogdbs01, etc. When a logical log switch occurs a backup reads all of the data from the old log while you're busy trying to write transactions to the new log. If these two logical logs are on the same disk then you can see some I/O contention during the backup.

9.3.3 Physical log

The Physical Log stores a before-image of any page that Informix is going to modify. A before-image of a page is how the page looked at the time of the last checkpoint. Only one before-image for each modified page is written to the Physical Log. If you were to modify the same page over and over and over again in between checkpoints, only the first modification would cause a before-image to be written to the Physical Log.

The before-images in the Physical Log are used by fast recovery and database restores to give an accurate snapshot of your data at the time of the last checkpoint.

Physical log size

On engine initialization the physical log is put into the rootdbs and that is where I like to keep it (provided I move the logical logs somewhere else.) You should, however, increase the size of the physical log. There is absolutely no performance penalty for having a big physical log, none. There are drawbacks to having an undersized physical log.

  • Checkpoints firing too frequently due to small physical log hitting 75% full mark often
  • Checkpoint blocking due to physical log filling during a checkpoint
  • Fast Recovery can take longer if physical log is not large enough
So make it bigger than the default of 50,000 KB. The manual recommends a physical log size of 110% the size of your buffer pool. Pick a size based on the available disk you have and the workload for your environment, there really isn't a one size fits all answer here. Start with 1 GB and go from there if you don't know where to start. Use onparams to change the physical log (to 1GB in sample below).

$ onparams -p -s 1048576


When the engine writes to the logical or physical log it doesn't write directly to disk, it puts as many of these writes as it can into one of the 3 logical log buffers or one of the 2 physical log buffers. These buffers are then written to stable storage on disk in a block which is faster than doing them one at a time.

So let's look at what can happen when you modify a row

  • Find the page to be modified either through and index or a sequential scan
  • Read the data page containing the row to be modified from disk into memory
  • Write a before-image to the physical log buffer
  • Write a logical log record of the update to the logical log buffer
  • Modify the data page in memory
  • Flush the physical log buffer to disk
  • Flush the logical log buffer to disk
  • Flush the data page to disk

That's a lot of disk access to different parts of the disk trying to happen at about the same time. You can really improve performance by putting your physical log, logical logs and data on separate spindles.

9.4 Lock waits

In a multi-user Informix® Dynamic Server (IDS) environment where users have their isolation set higher than dirty read, and/or multiple users are performing update activity (i.e. insert, update or delete actions, rather than read-only), multiple users can all be attempting to place mutually exclusive locks on the same record.

Tracing who has which locks and why using onstat involves joining entries from onstat -k, -u and -g sql. As locks are often held for very short periods of time, the evidence can disappear before all the necessary command can be run. The following SQL statements, run against the sysmaster datebase "tables" do all the joining and filtering for you.

select t.username waituser, t.sid waitsess, s.username hasuser,
s.sid hassess, l.type locktype, l.dbsname database,
l.tabname table, hex(l.rowidlk) rowid
from sysmaster:syslocks l, sysmaster:syssessions s, sysmaster:syssessions t
where s.sid = l.owner
and l.waiter = t.sid;

9.5 Update statistics

You should perform regular udpate statistics on your databases to ensure they are filled with appropriate performance considerations, specially after heavy database changes.

9.6 Slow queries

If you think our IBM Informix server is processing queries more slowly than expected, and you think it's not caused by inefficient querys (the most common cuase) you can use the following procedures to try to determine the causes.

Informix servers provides a number of commands, the onstats, for reporting system status. You can generally start your analysis using seven of the commands. Examination of the data often leads to further analysis and resolution. The recommended commands are:

  • onstat -m

    Message log. The message log contains warning and error messages. The onstat -m only reports recent updates to the log file. You may want to look at earlier log entries.

  • onstat -u

    User thread status. Look for user threads waiting on a resource. You will see a wait indicator in the first position of the flags field. The address of the resource is in the wait field. The specific wait flag, the type of resource, and cross references follow:

    • B - wait on buffer - match the wait address to a buffer in onstat -b
    • C - wait on checkpoint - examine onstat -g ckp and onstat -g con
    • G - wait on write to log buffer - match the wait address to a log buffer in onstat -l
    • L - wait on lock - match the wait address to the address of a lock in onstat -k
    • S - wait on latch - match the wait address to a latch (mutex) in onstat -lmx and onstat -wmx
    • Y - wait on condition - listed in onstat -g con and do not typically reflect performance

    There are several other flags but they are rarely observed.

    The first field of the onstat -u output, address, maps to the rstcb field of the onstat -g ath output for thread identification. The sessid field is the session id (SID). You can learn more about resources allocated to a session and its activity with onstat -g ses <SID>.

    Collect onstat -u several times in rapid succession. If the waiters persist over a measurable clock time, then the chances are very good that the waits reflect a processing problem that affects performance. Some waiters are normal but they should disappear quickly. Keep in mind that lock waits are programmed.

    The last two fields of onstat -u, nreads and nwrites, can be useful indicators of thread activity.

  • onstat -p

    Server statistics. The single most important performance statistic in this output is the read cache rate (the first %cached field). Your goal for an OLTP system is to have a read cache rate above 95 percent during normal processing. Try for a rate above 99 percent. Increase the cache rate by adding buffers, which are configured using the BUFFERPOOL configuration parameter. Make sure that you have plenty of system memory (onstat -g osi) when increasing the Informix server memory allocation. Increasing the server memory footprint can indirectly slow the instance by increasing paging/swapping at the OS side. You can use a sysmaster query (see Related Information below) to help determine if you have configured too many buffers.

    Other statistics, like bufwaits (waiting for buffer), seqscans (sequential scans), and the read aheads should be considered. In the case of read aheads, the sum of ixda-RA, idx-RA, and da-RA should be close to the value of RA-pgsused as an indicator of effective read-ahead configuration.

    Many of the same statistics are viewed on a partnum level with onstat -g ppf.

    Consider collecting and retaining onstat -p outputs at regular intervals for comparison. Note that cumulative statistics are zeroed with onstat -z if you want to observe statistics over a limited time interval.

  • onstat -g rea

    Ready queue. Reports threads ready to execute at one moment in time. These threads are prepared for execution but lack cpu resource. If the number remains above the number of cpu virtual processors (onstat -g glo) through repetitions of onstat -g rea, then your system is likely limited by processing power. Look for inefficient queries and non-server processing on the machine.

    See onstat -g glo output for cpu use integrated over time and onstat -g osi for system resources.

  • onstat -g act

    Active threads. You will usually see poll threads and listen threads in this queue. Look for threads doing query-related work, like sqlexec and I/O threads. If none show up or they are rare, then work is not getting to the processors.

  • onstat -g ioq

    I/O request queues. The statistic to monitor is the maxlen column. This is the maximum length of a queue after the engine was brought online or onstat -z was executed. If the number is too large, then at some point the I/O requests were not serviced fast enough. Try executing onstat -z and checking to see how long it takes for large maxlen values to return.

    For informix aio monitor the gfd (global file descriptor) queues. The maxlen should not be greater than 25. The system uses gfd 0, 1, and 2 (stdin, stdout, stderr), so the informix chunks start with gfd 3 (chunk 1).

    If you are using kaio monitor the kio queues. The maxlen values should not exceed 32. There will be one kio queue for each cpu virtual processor.

    Recommendations for maxlen with DIRECT_IO enabled have not been determined, but are not expected to be larger than 32.

  • onstat -g seg

    Shared memory allocations. The shared memory configuration parameters should be tuned so that the server dynamically allocates at most two virtual segments during normal processing.

10 High availability

It's strongly recommended you setup Informix in HDR (High Availability Database Replication) mode.

HDR consists of a primary server and a single HDR secondary server that are tightly coupled; transactions on the primary are not committed until the log records containing the transactions are sent to the HDR secondary server. SD secondary servers do not maintain a copy of the physical database on its own disk space; rather, they share disks with the primary server. SD secondary servers can be quickly and easily promoted to be the primary server if the primary goes offline. The third type of secondary server, remote standalone (RS) secondary server, can also be used in a high-availability solution that includes IBM Informix Enterprise Replication.

A high-availability cluster consists of two types of database servers: the primary database server, which receives updates, and one or more secondary copies of the primary database server. A secondary server is a mirror image of the primary server and is in perpetual recovery mode, applying logical-log records from the primary server.

The secondary server does not participate in IBM Informix Enterprise Replication; it receives updates from the primary server. If the primary server in a high-availability cluster becomes unavailable, one of the secondary servers takes over the role of the primary server. Using Connection Manager, you can specify which secondary server should take over in the event of a failure of the primary server.

The procedure for starting HDR, using ServerA as the primary database server and ServerB as the secondary database server, is described in the following steps.

Step Server A Server B
1 Install user-defined types, user-defined routines, and DataBlade® modules on both database servers, and then register them on ServerA only.
2 Create a level-0 backup using ontape -L 0
3 Use the onmode -d command to set the type of ServerA to primary and to indicate the name of the associated secondary database server (in this case ServerB).
When you issue an onmode -d command, the database server attempts to establish a connection with the other database server in the HDR pair and to start HDR operation. The attempt to establish a connection succeeds only if the other database server in the pair is already set to the correct type. At this point, ServerB is not online and is not set to type secondary, so the HDR connection is not established.
4 Perform a physical restore of ServerB from the level-0 backup that you created in step 1. Do not perform a logical restore.
use the ontape -p option. You cannot use the ontape -r option because it performs both a physical and a logical restore.
5 Use the onmode -d command to set the type of ServerB to secondary and indicate the associated primary database server.

ServerB tries to establish an HDR connection with the primary database server ( ServerA) and start operation. The connection must be successfully established.

Before HDR begins, the secondary database server performs a logical recovery using the logical-log records written to the primary database server since step 2. If all these logical-log records still are located on the primary database server disk, the primary database server sends these records directly to the secondary database server over the network and logical recovery occurs automatically.

If you have backed up and freed logical-log files on the primary database server, the records in these files are no longer on disk. The secondary database server prompts you to recover these files from tape. In this case, you must perform step 6.


If logical-log records that were written to the primary database server are no longer on the primary disk, the secondary database server prompts you to recover these files from tape backups.

If the secondary database server must read the backed-up logical-log files over the network, set the tape device parameters on the secondary database server to a device on the computer that is running the primary database server or to a device at the same location as the primary database server.

After you recover all the logical-log files on tape, the logical restore completes using the logical-log files on the primary database server disk.

11 Backup

There are 3 kinds of backups, System Backups, Logical Log Backups and Critical File Backups. Also, you should move (better copy) backups to an external extorage.

11.1 System backup

A System Backup (Storage-space backup) is a direct copy of the Informix pages that make up your engine. This is done with the engine online and with no impact or blocking of users other than the additional I/O strain put on the system to read each page from the engine and write out each page to disk.

There are 3 levels of System Backup that can be taken:

  • A Level 0 backs up every page
  • A Level 1 backs up every page that has changed since the last Level 0 backup
  • A Level 2 backs up every page that has changed since the last Level 1 backup
If you use multiple backup levels, when it comes time to restore you would first restore the last Level 0 then apply the most recent Level 1 backup (if one exists) after the Level 0 and finally apply the most recent Level 2 backup (if one exists) after the Level 1 backup is restored.

Using Level 1 and Level 2 backups can reduce the size of your backups taken and the I/O resources required to write backups to disk (less data to write, we still have to read all of the data), but we're here for the bare minimum so I'm just going to use Level 0 backups and Logical Log backups to keep our data safe.

Using a System Backup to restore data is called a Physical Restore.

The ontape command backup takes the following arguments:

ontape -s -L 0 -d

-s tells ontape we want a System Backup, -L 0 asks for a Level 0 (-L 1 and -L 2 for Level 1 or 2 backups) and -d says we're backing up to a directory and prevents ontape from prompting us for any input.

11.2 Logical-log backup

To keep a history of transactions and database server changes since the time of the last storage-space backup, the database server generates log records. The database server stores the log records in the logical log, a circular file that is composed of three or more logical-log files. The log is called logical because the log records represent logical operations of the database server, as opposed to physical operations.

Logical logs are used for any transaction operation witch include:

  • Transaction Rollback
  • Recovery from Engine Failure (Data Restore and Fast Recovery)
  • Deferred Constraint Checking
  • Cascading Deletes
  • Distributed Transactions (Two Phase Commit)
  • Enterprise Replication
  • High Availability Data Replication

At any time, the combination of a storage-space backup plus logical-log backup contains a complete copy of your database server data.

You can see your logical logs by using:

onstat -l

11.3 Critical file backups

There are some files that you should backup each time you take a System Backup, these files will help you recreate your Informix install in the event that the filesystem that holds your Informix install and config files becomes unavailable

  • $INFORMIXDIR/etc/oncfg_*
  • dbschema -c -ns, to give you the commands for recreating dbspaces, chunks and logical logs

Unfortunately there is no Informix utility for this, you should write a script to tar.gz these files and move them somewhere safe each time a System Backup is taken. Something like this should get you pointed in the right direction.

$ dbschema -c -ns > $INFORMIXDIR/tmp/
$ tar -czf /home/informix/backup/files/CONFIG.$(date "+%Y%m%d.%H%M").tar.gz \

11.4 Backup to external storage

For a truly safe backup, we strongly recommend:

  • Have two database servers in HDR mode without sharing disks and infrastructure.
  • Database server storage and backup storage must be in different disks or systems.
  • Backup should be sized to keep for at least one week or more.
  • All backup images (level 0), once completed should be copied to an external source.
  • You should have a way to recover the external source in a test environment if needed

Following this rule you will have:

  • A primary database
  • A secondary database
  • Backup on local storage of primary database (that is isolated from database disk data)
  • Backup on local storage of primary database may be copied to secondary
  • Backup on local storage of primary database copied to a external storage

As a remark, do not keep backup data on databse disks. As a "mandatory" rule,
backup must be always stored in disks not related to database systems.

And to be completely sure, backups must be restored periodically (automatically) on a test system.

12 Failure procedures

12.1 Stop

In exceptional cases, server can hang and no further operations can be executed. In this case, you can check if server is still running and try to stop it.

$ onmode -ky

12.2 Restart

After a server failure, restart in single user mode (-s) and verbose (-v) so you can see startup logs.

$ oninit -sv

You can perform checks now, and if everything is ok, you can go into online mode.

$ onmode -m

12.2.1 Data integrity check

IBM Informix Dynamic Server (IDS) includes the oncheck utility that is used to check for corruption within the database instance, and in some cases fix it.

Command Description
oncheck -cr This command checks the consistency of the reserved pages. The reserved pages are used to track control and configuration information for the entire instance. For example, the currently used onconfig parameters are stored here.
oncheck -cc This command checks the constancy of the system catalog tables of which there are a set for every database within the instance. These tables store critical data about the database, such as database schema's.
This command will usually take a few minutes to execute; the precise amount of time is proportional to the number of databases within the instance. This command will have little or no performance impact when run against a production system.
oncheck -ce This command checks the consistency of the extents within the database server. An extent is a contiguous area of disk that stores data, index data or blob data for a specific table. The oncheck utility ensures that all of the extents and free space within the database server, match their associated control information.
This command will usually take a few minutes to execute. The exact amount of time is proportional to the amount of disk space that is allocated to the database instance. This command will have little (or no) performance impact when run on a production system.
oncheck -cDI database_name [:table_name - optional] This command checks the consistency of the data and index pages within the specified database.
This command can take a substantial amount of time to complete as oncheck analyses every page within the database. It will typically have a considerable impact on database performance. You should consider bringing the server into quiescent mode prior to running this command, in order to ensure that the utility has exclusive access to the database.
If errors are reported whilst running these commands then please open a PMR (Problem Management Report) to contact with your Informix support.

12.2.2 If restart fails

If primary database restart fails, you should restore it from backup or promote secondary HDR system to become the new main database server

Take care following procedure explained afterwards to prevent introducing more damage to the system.

Before promoting secondary to main system, be sure your system not will become an split brain by restarting primary again after secondary has been promoted to main server

Promoting secondary to main system

When your Primary fails you can quickly make the Secondary server a Standalone (i.e. no HDR) server.

Make the Secondary a Standalone server with the onmode -d standard command

$ onmode -d standard
    $ onstat -m

When the old Primary is fixed and ready to be brought back online you have two options:

Option 1 is to reinitialize HDR just like we did when setting up HDR for the first time. Except now secondary server will be the Primary Informix and primary server will be the Secondary Informix. This option doesn't require any downtime.

Option 2 is to make the primary server the Primary Informix again (easier to do if the logs have not rolled over on the secondary server - Now Standard Informix.) This requires some downtime and assumes that the disks on the primary server were not the reason it went down and all of the data is still intact.

Switch secondary server to Quiescent Mode

$ onmode -s

Change the HDR status of secondary server to Secondary again

$ onmode -d secondary [primary informixserver]

Start Informix on the Primary

$ oninit

If the logical logs have rolled over on the Secondary (while it was Standalone) you will need to do what we did before. Move the logical log backups that you need from secondary to primary serer, change their names and run ontape -l -d

If everything works as advertised the Secondary will ship over the logs the Primary needs, they will be applied to the Primary and HDR will be restored.

12.2.3 Secondary failure

If the Secondary fails and the logical log that was current at the time of the failure has not been reused (they're circular, remember) on the Primary then you can simply restart the Secondary and it will automatically resync.

Network faillure can cause HDR to go off. Also, HDR is a bit sensitive over VPNs, and has been known to reset the odd firewall parameter, blocking it's own traffic.

Assuming the logical logs on the primary haven't wrapped round, i.e. the last one rolled forward by the secondary has not yet been overwritten, there's a good chance HDR will resume, even if HDR is disconnected.

Shut down the secondary, and restart it in quiescent mode: First, try to restart HDR by executing in secondary: onmode -d secondary [primary informixserver]

If this doesn't works, Switch the primary back to standard, then primary again. You may have to try a combination of these.

f your Secondary has been down for a while and the logical logs have rolled over there are 2 ways to recover: The easy way and the hard way.

The easy way is to reinitialize HDR by restoring the Primary to the Secondary again and running onmode -d secondary [primary informixserver] on the Secondary.

The hard way is to restart the Secondary and when you see this message in the online.log

15:03:21  DR: Start failure recovery from tape ...

You can perform a Logical Restore to the Secondary using the logical log backups from the Primary. If you're backing up to a directory, copy the necessary logical log backups from the Primary to the Secondary, rename each backup to include the Secondary server name and use ontape -l -d to perform a Logical Restore.

$ scp blogsvr01:/home/informix/backup/llog/* .
$ ontape -l -d

12.3 Recovery

12.3.1 Restore a table or portion from backup

The archecker utility is useful where portions of a database, a table, a portion of a table, or a set of tables need to be recovered. It is also useful in situations where tables need to be moved across server versions or platforms.

The archecker utility can also be used as a method of copying data. For example, you can move a table from the production system to another system.

The archecker utility is more efficient than other mechanisms for copying data. Because archecker extracts data as text, it can copy data between platforms or server versions.

The archecker utility uses a configuration file to set certain parameters. Set the AC_CONFIG environment variable to the full path name of the archecker configuration file. By default, the AC_CONFIG environment variable is set to $INFORMIXDIR/etc/ac_config.std.

The archecker utility uses a schema command file to specify the following:

  • Source tables
  • Destination tables
  • Table schemas
  • Databases

archecker can also be used to check the ontape backup is working properlly by checking restore capability without restoring anything.

$ archecker  -tvds

-t Read the tape directly

-v Verbose mode

-d Delete old temporary files used in previous archecker executions

-s Print status message to the screen

Copy the standard file

cp  $INFORMIXDIR/etc/ac_config.std $INFORMIXDIR/etc/ac_config.`date +\%Y%m%d`

And export the variable to the new value:

export AC_CONFIG=$INFORMIXDIR/etc/ac_config.`date +\%Y%m%d`

It is necessary to use the multiple connections, the easy way is set the informix server via socket:

export INFORMIXSERVER=ol_instance_name

Edit the file


and ajusts the parameters: AC_TAPEDEV, AC_TAPEBLOCK, AC_LTAPEDEV, AC_LTAPEBLOCK and AC_RESTORE_FILTER (with the compressor with which the backup was made).

AC_MSGPATH      /tmp/ac_msg.log         # archecker message log
AC_STORAGE      /tmp                    # Directory used for temp storage
AC_VERBOSE      1                       # 1 verbose messages 0 terse messages

# Adjusts theses parameters
AC_TAPEDEV		 	/path/of/tape/backup
AC_LTAPEDEV		 	/path/of/tape/logbackup
AC_RESTORE_FILTER	/usr/bin/gzip -d

There is also another way to use the ac_config file. Putting the exact path (including the ontape name) in the AC_TAPEDEV variable and removing the others (AC_TAPEBLOCK, AC_LTAPEDEV, AC_LTAPEBLOCK, AC_RESTORE_FILTER) as below

AC_MSGPATH      /tmp/ac_msg.log         # archecker message log
AC_STORAGE      /tmp                    # Directory used for temp storage
AC_VERBOSE      1                       # 1 verbose messages 0 terse messages

# Adjusts theses parameters
AC_TAPEDEV		 	/path/of/tape/backup

Set the environment variable IFX_ONTAPE_FILE_PREFIX with the value of "My instance" in order to point the corrent ontape file. For example, if you set IFX_ONTAPE_FILE_PREFIX to “My_Instance”, then during archive, the files are named My_Instance_L0, My_Instance_L1, My_Instance_L2, and, My_Instance_Log0000000001, My_Instance_Log0000000002, and so on. During restore, ontape searches for files in the TAPEDEV directory with file names like My_Instance_L0, and searches for files in the LTAPEDEV directory with file names like My_Instance_Log0000000001.

export IFX_ONTAPE_FILE_PREFIX=My_Instance_0

As an example of the power of archecker to restore tables from backup, we show a sample of script file to restore the table customer in database stores7 to the tables customer_california and customer_nj in database stores7.

database stores7;
-- SOURCE Table
create table customer (
  customer_num serial,
  phone char(18)
 ) in rootdbs ;

database demo2;
-- CA TARGET Table
create table customer_california(
  customer_num serial,
  phone char(18)
) in rootdbs ;

-- NJ TARGET Table
create table customer_nj (
  customer_num serial,
  phone char(18)
) in rootdbs ;

insert into demo2:customer_california
  select * from stores7:customer
    where state = "CA";
insert into demo2:customer_newjersey
  select * from stores7:customer
    where state = "NJ";

restore to current with no log;

Clean up the temporary staging files by running the following command:

archecker –DX -s

The following command can be run to perform the restore:

archecker -t -X -f restore_table.scm -d -v -s

12.3.2 Full restore

Full restore from backup is very easy if your backup storage configuration is pointing to a directory. In this case, Informix names automatically backup files for physical ontape backups and also for logical log backups.

Be sure, TAPEDEV and LTAPEDEV are pointing to a filesystem directory in your $ONCONFIG configuration file.

$ onstat -c | grep TAPEDEV

You'll find in this folders files with server backups

$ ls /data/ifx/INFX-ONTAPES
infor1_0_20170526_133001_L0  infor1_0_20170529_100001_L1
infor1_0_20170526_170001_L1  infor1_0_20170529_120001_L2
infor1_0_20170526_190001_L2  infor1_0_20170529_133001_L0

Files are named by hostname_SERVERNUM_BACKUPDATE_BACKUPTIME_BACKUPLEVEL. If you want to change the begining of the filenames composed by hostname_SERVERNUM you can set IFX_ONTAPE_FILE_PREFIX environment variable to whatever you want before executing backup or restore procedures.

$ export IFX_ONTAPE_FILE_PREFIX=myserver_0

If everything is setted properlly, restoring full system is as easy as executing command:

$ ontape -r

This command will search for last physical and logical backups available and will restore then to retrieve the system up to available date restore.

12.3.3 Recover logical logs

System than has been restored physically or is in a status waiting for logical restore, can be logically restored and brought online again by executing the logical restore command:

$ ontape -l -d

This command will search the required logical log backups in LTAPEDEV folder and restore them automatically.

12.3.4 Cloning a primary server

Informix supports to clone a primary server to other server "on-the-fly"

to do this, you need to make some configuration changes in the source server

$ onmode -wm CDR_AUTO_DISCOVER=1
$ onmode -wm REMOTE_SERVER_CFG=authfile.server_1

On the target server, log in as user informix and use the touch, chown, and chmod commands to create, change the owner, and change the permissions for the chunks. The chunk paths must match the paths of the chunks on the source server. You can use the --createchunkfile option (-k) to automatically create cooked chunks on the target server.

Run the ifxclone utility on the target server as user informix:

ifxclone -T -S ol_server_1 -I -P 123 -t ol_server_2 
             -i -p 456 -a -k

The ifxclone utility modifies the sqlhosts file on the source server and creates a copy of the file on the new target server:

#dbservername  nettype   hostname           servicename  options
 server_1      onsoctcp  123
 server_2      onsoctcp  456

The ifxclone utility also propagates the trusted-host file on the source server to the target server.

12.4 ISAM error: illegal key descriptor (too many parts or too long)

Executing an operation the server returns an error like:

ISAM error: illegal key descriptor (too many parts or too long)

Run oncheck command to determine the cause:

oncheck -cc database_name

The following example shows problems with the table wic_user_soaplogs_hist:

$ oncheck -cc wic_conf
Validating database wic_conf

    Validating systables for database wic_conf
ISAM error:  illegal key descriptor (too many parts or too long).

    TBLspace name                  wic_user_soaplogs_
    Owner                          informix
    TBLspace number                b000ee
    Tabid                          271
    Row size                       531
    Number of columns              14
    Number of indexes              7
    Number of rows                 0
    Date created                   21-07-2019
    TBLspace major version number  278
    TBLspace minor version number  963
    TBLspace type                  T
    Locking level                  R
    Number of pages used           0
    First extent size              16
    Next extent size               16
    Flags                          0
    Database name
ERROR:	Database wic_conf  sysfragments has 425 records,
	but systables and sysindices refer to 424.

    Validating syscolumns for database wic_conf

    Validating sysindices for database wic_conf
ISAM error:  illegal key descriptor (too many parts or too long).

The solution could be dbexport/drop/dbimport the whole database or recreate the table.

13 Automatic alarms

To take even more active approach in monitoring, you can modify the alarm program provided by Informix during IDS installation. The alarm program is actually a UNIX shell script that is automatically called by IDS when certain events occur. IDS classifies all instance events into five severity levels; one being the lowest and five the highest. You can set up the alarm program so that it can email DBA or sends a message to his or her pager when instance assertion failures occur. For detailed information on how to modify the alarm program and sample programs, refer to IBM Informix Administrator's Guide.

In addition, you should perform some necessary maintenance routines to ensure that your databases are running healthy. Some basic routines are:

  • Check data replication is working
  • Check backups are being performed
  • Verifying dbspaces have room for growing
  • Verifying and repairing data and index integrity
  • Updating internal statistics for the query optimizer
  • Recycling unused shared memory segments

To configure alarmprogram notifications, check shell used as alarm program:

$ onstat -c | grep ALARMPROGRAM

Edit this shell and configure parameters to receive e-mails based on event status and also to allow this shell to perform automatic logical logs backups:

BACKUP_CMD="$INFORMIXDIR/bin/ontape -a -d"

ALARMADMIN refers to minimum criticality level of events to being taken into account for sending notification by e-mail. Criticality level mapping is:


14 Open files

Applications or servers can sometimes fail with an error indicating that there are too many open files for the current process. Most of the time the problem is due to a configuration too small for the current needs. Sometimes as well it might be that the process is 'leaking' file descriptors. In other words, the process is opening files but does not close them leading to exhaustion of the available file descriptors.

When the "Too Many Open Files" error message is written to the logs, it indicates that all available file handles for the process have been used including sockets as well. In a majority of cases, this is the result of file handles being leaked by some part of the application.

You can try to identify the source of the problem.

  1. Check the current limits.
  2. Check the limits of a running process.
  3. Tracking a possible file descriptors leak.
  4. Tracking open files in real time.
  5. Determine if the number of open files grows.

14.1 Check the current limits.

The ulimit -a command will print the list of current limitations for the current session.

ulimit -a
address space limit (kbytes) (-M) unlimited
core file size (blocks) (-c) 0
cpu time (seconds) (-t) unlimited
data size (kbytes) (-d) unlimited
file size (blocks) (-f) unlimited
locks (-x) unlimited
locked address space (kbytes) (-l) 32
message queue size (kbytes) (-q) 800
nice (-e) 0
nofile (-n) 65536
nproc (-u) 192837
pipe buffer size (bytes) (-p) 4096
max memory size (kbytes) (-m) unlimited
rtprio (-r) 0
socket buffer size (bytes) (-b) 4096
sigpend (-i) 192837
stack size (kbytes) (-s) 8192
swap size (kbytes) (-w) not supported
threads (-T) not supported
process size (kbytes) (-v) unlimited

14.2 Check the limits of a running process.

cat /proc/<pid>/limits

Limit of file descriptors will show as Max open files.

14.3 Tracking a possible file descriptors leak.

By checking regularly you would see the number growing on and on in case of a leak. Keep in mind that the number of files descriptors growing does not ALWAYS indicate a leak. It might simply be that the process needs to open a lot of files.

ls /proc/<pid>/fd

14.4 Tracking open files in real time.

This is a bit more advanced than the previous solutions but will provide most likely the most interesting results. Tracking in real time the usage of file descriptors means that you have to monitor both the open() and close() system calls. To be more accurate you can use the same method to also track system calls like 'dup()' and others like 'socket()' that would create a new file descriptor.

  • You can use a debugger like dbx (AIX. SOLARIS) or gdb (LINUX).
  • You can as well use system tools like probevue (AIX), dtrace (SOLARIS) or systemtap (LINUX).
  • You can use system traces if available.

The preferred choice would be system tools as the ones mentioned above are actually executing within the Kernel avoiding the long delays caused by debuggers.

14.5 Determine if the number of open files grows.

To determine if the number of open files is growing over a period of time, issue lsof to report the open files against a PID on a periodic basis. For example:

lsof -p [PID] -r [interval in seconds, 1800 for 30 minutes] > lsof.out

Alternately you can list the contents of the file descriptors as a list of symbolic links in the following directory, where you replace PID with the process ID. This is especially useful if you don't have access to the lsof command:

ls -al /proc/PID/fd

14.6 Descriptors by threads

Each thread in Informix can open a multiple descriptors. The command

onstat -g opn
$ onstat -g opn

IBM Informix Dynamic Server Version 12.10.FC12 -- On-Line -- Up 36 days 01:42:28 -- 4532860 Kbytes
rstcb 0x57279278 tid 28
isfd  op_mode    op_flags   partnum    ucnt ocnt lk ra   partp          ocnt ucnt
0     0x70000    0x403      0xd01ad6   1    0       0    0x5912f2b8     0    1
1     0x70000    0x403      0xd01d0c   1    0       0    0x66f6c830     0    1
2     0x70000    0x403      0xf007a0   1    0       0    0x58e22028     4    5   

display the descriptors open by thread. Each rstcb marks the starts of the thread and each line is a descriptor file. The thread ID corresponds with the user thread.

It is possible to check if there is a thread that has more than N descriptors open as for example 30000 using

onstat -g opn  | egrep "^30000"

Save the complete list in a file

onstat -g opn > /tmp/onstat_opn.txt

and find the thread that has more than N descriptors open. And you can obtain the thread ID that correspond with the user thread.

rstcb 0x57279278

You can get the session searching by

onstat -u | grep 57279278
57279278         Y--P--- 47958    axitask  -        da903b50         0    3     5378279  49560

And the command

onstat -g sql 47958

display the session information.

15 Reorganize clob/blobs fields to a different smartblob spaces

If you need to reorganize clob/blob field a different smartblobspace you can considerer the following queries to get the corresponding 'ALTER TABLE' instructions. Note that the execution of this instructions does not move the current data, this takes efect for the future data inserted on the table.

Blobs that can be nullable

select 'ALTER TABLE ' || tabname || ' MODIFY (' || colname || ' BLOB), PUT ' || colname || ' IN (s_sbspa) (LOG, NO KEEP ACCESS TIME)'
  from syscolumns, systables where syscolumns.tabid = systables.tabid
   and coltype = 41 and extended_id = 10 and systables.tabid > 99;

Blobs that are required

select 'ALTER TABLE ' || tabname || ' MODIFY (' || colname || ' BLOB NOT NULL), PUT ' || colname || ' IN (s_sbspa) (LOG, NO KEEP ACCESS TIME)'
  from syscolumns, systables where syscolumns.tabid = systables.tabid
   and coltype = 297 and extended_id = 10 and systables.tabid > 99;

Clobs that can be nullable

select 'ALTER TABLE ' || tabname || ' MODIFY (' || colname || ' CLOB), PUT ' || colname || ' IN (s_sbspa) (LOG, NO KEEP ACCESS TIME)'
  from syscolumns, systables where syscolumns.tabid = systables.tabid
   and coltype = 41 and extended_id = 11 and systables.tabid > 99;

Clobs that are required

select 'ALTER TABLE ' || tabname || ' MODIFY (' || colname || ' CLOB NOT NULL), PUT ' || colname || ' IN (s_sbspa) (LOG, NO KEEP ACCESS TIME)'
  from syscolumns, systables where syscolumns.tabid = systables.tabid
   and coltype = 297 and extended_id = 11 and systables.tabid > 99;

16 In which smartblobspace are blobs stored?

Table columns of type BLOB or CLOB sotores data in smartblob spaces and save a pointer to data stored into the column data. There are not any "backpointer" from data stored in smartblob spaces giving information about source table and column.

So, you can only get where a specific blob/clob is stored but you cannot get the source of data from smartblob chunks.

To get which smartblob space is storing column data of a table that contains a blob column, execute the following query

select cast(colname as lvarchar),
("0x" || substr(colname::lvarchar,23,2)
        || substr(colname::lvarchar,21,2)
        || substr(colname::lvarchar,19,2)
        || substr(colname::lvarchar,17,2))::int sbnum,
("0x" || substr(colname::lvarchar,31,2)
        || substr(colname::lvarchar,29,2)
        || substr(colname::lvarchar,27,2)
        || substr(colname::lvarchar,25,2))::int chknum,
("0x" || substr(colname::lvarchar,39,2)
        || substr(colname::lvarchar,37,2)
        || substr(colname::lvarchar,35,2)
        || substr(colname::lvarchar,33,2))::int loid,
("0x" || substr(colname::lvarchar,47,2)
        || substr(colname::lvarchar,45,2)
        || substr(colname::lvarchar,43,2)
        || substr(colname::lvarchar,41,2))::int offset
 from tabname

replace tabname and colname with the appopiate names.

16.1 Full scan to get storage smartblob spaces of all blob/clob columns

Next: we provide a JS Script showing smartblob origin by getting all blob/clob columns and scanning all rows in each table to get where they are stored.

var rs_cols =Ax.db.executeQuery(`select unique systables.tabname, syscolumns.colname
                                       from syscolumns, systables
                                      where syscolumns.tabid = systables.tabid AND systables.tabtype='T' AND coltype in (41, 297) and extended_id in (10,11)`);
for (var col_sb of rs_cols) {
    var rs_sbs = Ax.db.executeQuery(`SELECT ("0x" || substr(${col_sb['colname']}::lvarchar,23,2)
                                            || substr(${col_sb['colname']}::lvarchar,21,2)
                                            || substr(${col_sb['colname']}::lvarchar,19,2)
                                            || substr(${col_sb['colname']}::lvarchar,17,2))::INT sbnum,
                                            COUNT(*) cnt,
                                            MAX(rowid) rid
                                        FROM ${col_sb['tabname']} GROUP BY 1`);
    var m_sbstr = '';
    for (var r_sbs of rs_sbs) {
        if (r_sbs['sbnum'] != null) {
            var dbsname = Ax.db.executeGet(`select name from sysmaster:sysdbspaces where dbsnum = ?`, r_sbs['sbnum']);
            m_sbstr += dbsname + ' (' + r_sbs['cnt'] + ',' + r_sbs['rid'] + ') '; // + '(' + r_sbs['sbnum'] + ') ';
    if (m_sbstr != '') {
        console.log(`Table: ${col_sb['tabname']} Column: ${col_sb['colname']} \tSB Spaces: ${m_sbstr}`);

This script performs a full scan of all rows of each table containing blob/clob rows, so it can take a long time if tables contains a huge amount of rows.

16.2 Perform clean up stray smart large objects in sbspaces

During normal operation, no unreferenced (stray) smart large objects should exist. When you delete a smart large object, the space is released. If the database server fails or runs out of system memory while you are deleting a smart large object, the smart large object might remain as a stray object.

The following is an example of the onspaces -cl command:

onspaces -cl myspace