While there is apparently no official or standard definition for the term Very Large Database (VLDB), it is sometimes used to describe databases occupying magnetic storage in the terabyte range and containing billions of table rows.

Handling such large tables, require proper setup of database storage and have many complex issues. The following sections will explore how to deal with large tables and it's indexes by using a TPCH database example.

The TPC Benchmark™H (TPC-H) is a decision support benchmark. We will use the data generated by TPCH utilities to generare data for VLDB databases. The focus of this document is to learn how to load and index large data, not to run the TPCH Benchmark.

1 TPCH sample

To run the following examples we will use DBGEN utility. DBGEN is a database population program for use with the TPC-H benchmark.

We will use TPCH database example with focus on it's largest table lineitem (it's size is about 70% of total) to generate different sizes.

So we can use the -T modifier to generate only the lineitem table and skip all other TPCH database tables.

Copy
/home/dbgen/dbgen -s 10 -f -T L

The following table shows the tests that will be used for our sizing tutorial.

TPCH Scale lineitem.tbl Rows 16K pages
table
16K pages
index
dbgen
time
Load time
100K rows
second
Load time
(1)
Index time
PDQ=0
Index time
PDQ=100
10 7 GB 60M 488,046 69,451 00:02:35 10 min 00:09:47 00:02:09 00:00:36
100 74 GB 600M 4,881,838 694,622 00:24:28 1.6 hours 01:39:15 00:43:40 00:07:25
200 160 GB 1,2B 9,763,204 -- 00:51:10 3.3 hours 03:05:38 01:42:09 00:15:09
1000 --- GB 6B -- -- ... 16 hours ? ? hours ? hours
2000 --- GB 12B -- -- ... 32 hours ? ? hours ? hours

  1. Load time no transactions, no HDR, 1 SSD (rootdbs + logs + data), 1 SSD for index
  2. Create primary key index with PDQ PRIORITY = 0
  3. Create primary key index with PDQ PRIORITY = 100

As you can see, even for a teorical system load througput of 100K rows per second, the load time is significant.

Test was done in a Dell 730, virtualized machine with 12GB RAM, 16 CPU (32 cores) Intel Xeon E312xx 2.7 Ghz, SSD 1TB disks. Informix version was 12.10.FC9 configured with 8GB RAM total and 1GB for DS_TOTAL_MEMORY.

The SQL operations to test are:

  • Create data table and load it.
    Copy
    create raw table lineitem (
        l_orderkey integer not null ,
        l_partkey integer not null ,
        l_suppkey integer not null ,
        l_linenumber integer not null ,
        l_quantity decimal(15,2) not null ,
        l_extendedprice decimal(15,2) not null ,
        l_discount decimal(15,2) not null ,
        l_tax decimal(15,2) not null ,
        l_returnflag char(1) not null ,
        l_linestatus char(1) not null ,
        l_shipdate date not null ,
        l_commitdate date not null ,
        l_receiptdate date not null ,
        l_shipinstruct char(25) not null ,
        l_shipmode char(10) not null ,
        l_comment varchar(44) not null
    )
    
    in d_data
    extent size 160000000 next size 8000000 lock mode row ;
    
    create external table lineitem_ext sameas lineitem using ( datafiles("disk:/tpch/lineitem.tbl"));
    insert into lineitem select * from lineitem_ext;
    alter table lineitem type (standard);
  • Create primary key index with different options.
    Copy
    CREATE UNIQUE INDEX lineitem_pk ON lineitem (l_orderkey, l_linenumber) in i_data;

2 Database storage

We assume you are familiar with Informix database administration. But let's resume some concepts we will need to use.

The database server uses physical units of storage to allocate disk space. It stores data in logical units. Unlike the logical units of storage whose size fluctuates, each of the physical units has a fixed or assigned size that is determined by the disk architecture.

2.1 Physical units

The database server uses the following physical units to manage disk space:

Unit Description
Chunks A chunk is the largest unit of physical disk dedicated to database server data storage. Chunks provide administrators with a significantly large unit for allocating disk space. The maximum size of an individual chunk is 4 TB. The number of allowable chunks is 32,766.
Pages

A page is the physical unit of disk storage that the database server uses to read from and write. On most UNIX platforms, the page size is 2 KB.

A chunk contains a certain number of pages. A page is always entirely contained within a chunk; that is, a page cannot cross chunk boundaries.

Blobpages

A blobpage is the unit of disk-space allocation that the database server uses to store simple large objects ( TEXT or BYTE data) within a blobspace.

You specify blobpage size as a multiple of the database server page size. Because the database server allocates blobpages as contiguous spaces, it is more efficient to store simple large objects in blobpages that are as close to the size of the data as possible.

The following figure illustrates the concept of a blobpage, represented as a multiple (three) of a data page.

Just as with pages in a chunk, a certain number of blobpages compose a chunk in a blobspace, as the following figure illustrates. A blobpage is always entirely contained in a chunk and cannot cross chunk boundaries.

Instead of storing simple-large-object data in a blobspace, you can choose to store it in a dbspace. However, for a simple large object larger than two pages, performance improves when you store it in a blobspace. Simple large objects stored in a dbspace can share a page, but simple large objects stored in a blobspace do not share pages.

Sbpages An sbpage is the type of page that the database server uses to store smart large objects within an sbspace. Unlike blobpages, sbpages are not configurable. An sbpage is the same size as the database server page, which is usually 2 KB on UNIX.
Extents

An extent consists of a collection of contiguous pages that store data for a given table.

When you create a table, the database server allocates a fixed amount of space to contain the data to be stored in that table. When this space fills, the database server must allocate space for additional storage. The physical unit of storage that the database server uses to allocate both the initial and subsequent storage space is called an extent.

Every permanent database table has two extent sizes associated with it. The initial-extent size is the number of KB allocated to the table when it is first created. The next-extent size is the number of KB allocated to the table when the initial extent (and any subsequent extents) becomes full. For permanent tables and user-defined temporary tables, the next-extent size begins to double after each extent. For system-created temporary tables, the next-extent size begins to double after 4 extents have been added.

When you create a table, you can specify the size of the initial extent, and the size of the extents to be added as the table grows. You can also modify the size of an extent in a table in a dbspace, and you can modify the size of new subsequent extents. To specify the initial-extent size and next-extent size, use the CREATE TABLE and ALTER TABLE statements.

The following figure shows how the database server allocates six pages for an extent:

  • An extent is always entirely contained in a chunk; an extent cannot cross chunk boundaries.
  • If the database server cannot find the contiguous disk space that is specified for the next-extent size, it searches the next chunk in the dbspace for contiguous space.

2.2 Logical units

The database server stores data in the following logical units:

Unit Description
Dbspaces

A dbspace is a logical unit that can contain between 1 and 32,766 chunks. The database server uses the dbspace to store databases and tables. Place databases, tables, logical-log files, and the physical log in dbspaces.

When you create a standard or temporary dbspace, you can specify the page size for the dbspace. You cannot specify a page size for blobspaces, sbspaces, or external spaces. If you do not specify a page size, the size of the root dbspace is the default page size.

When you create a standard dbspace, you can specify the first and next extent sizes for the tblspace in the dbspace. Specifying the extent sizes reduces the number of tblspace extents and reduces the frequency of situations when you must place the tblspace extents in non-primary chunks.

Temporary dbspaces

A temporary dbspace is a dbspace reserved exclusively for the storage of temporary tables.

A temporary dbspace is temporary only in the sense that the database server does not preserve any of the dbspace contents when the database server shuts down abnormally.

Whenever you set up the database server, all temporary dbspaces are set up. The database server clears any tables that might remain since the last time that the database server shut down. The database server does not perform logical or physical logging for temporary dbspaces. Because temporary dbspaces are not physically logged, fewer checkpoints and I/O operations occur, which improves performance.

Blobspaces

A blobspace is a logical storage unit composed of one or more chunks that store only TEXT and BYTE data.

A blobspace stores TEXT and BYTE data in the most efficient way possible. You can store TEXT and BYTE columns associated with distinct tables (see Tables) in the same blobspace.

The database server writes data stored in a blobspace directly to disk. This data does not pass through resident shared memory. If it did, the volume of data might occupy so many of the buffer-pool pages that other data and index pages would be forced out. For the same reason, the database server does not write TEXT or BYTE objects that are assigned to a blobspace to either the logical or physical log. The database server logs blobspace objects by writing them directly from disk to the logical-log backup tapes when you back up the logical logs. Blobspace objects never pass through the logical-log files.

Sbspaces

An sbspace is a logical storage unit composed of one or more chunks that store smart large objects.

Smart large objects consist of CLOB (character large object) and BLOB (binary large object) data types. User-defined data types can also use sbspaces.

Sbspaces have advantages over blobspaces.

  • They have read, write, and seek properties similar to a standard UNIX file.
  • They are recoverable, so you can log all write operations on data stored in sbspaces. You can commit or roll back changes if a failure occurs during a transaction.
  • They obey transaction isolation modes. You can lock smart large objects at different levels of granularity, and the lock durations obey the rules for transaction isolation levels
  • Smart large objects within table rows are not required to be retrieved in one statement.

Temporary sbspaces

Use a temporary sbspace to store temporary smart large objects without metadata logging and user-data logging.

If you store temporary smart large objects in a standard sbspace, the metadata is logged. Temporary sbspaces are similar to temporary dbspaces.

Plogspace

A plogspace is a logical storage unit that is composed of one chunk that stores the physical log. When the physical log is in the plogspace, the database server increases the size of the physical log as needed to improve performance.

Extspaces

An extspace is a logical name associated with an arbitrary string that signifies the location of external data. The resource that the extspace references depends on a user-defined access method for accessing its contents.

For example, a database user might require access to binary files encoded in a proprietary format. First, a developer creates an access method, which is a set of routines that access the data. These routines are responsible for all interaction between the database server and the external file. A DBA then adds an extspace that has the file as its target to the database. After the DBA creates a table in the extspace, the users can access the data in the proprietary files through SQL statements. To locate those files, use the extspace information.

An extspace is not required to be a file name. For example, it can be a network location. The routines that access the data can use information found in the string associated with the extspace in any manner.

Databases

A database is a logical storage unit that contains tables and indexes. Each database also contains a system catalog that tracks information about many of the elements in the database, including tables, indexes, SPL routines, and integrity constraints.

A database is stored in the dbspace that is specified by the IN clause of the CREATE DATABASE statement. When you do not explicitly name a dbspace in the CREATE DATABASE statement, the database is stored in the root dbspace, unless automatic location is enabled. You can enable automatic location by setting the AUTOLOCATE configuration parameter or session environment variable to a positive integer. The database server chooses the dbspaces in which to create new databases and new tables that are created without specified storage locations. Tables are automatically fragmented by round robin in the dbspaces that are chosen by the server.

Tables

When you create a table, the database server allocates disk space for the table in a block of pages that is called an extent. You can specify the size of both the first and any subsequent extents.

You can place the table in a specific dbspace by naming the dbspace when the table is created (with the IN dbspace clause of the CREATE TABLE statement). When you do not specify the dbspace, the database server places the table in the dbspace where the database is located. You can fragment a table over more than one dbspace or within a dbspace by specifying a fragmentation distribution scheme.

Tblspaces

Database server administrators sometimes must track disk use by a particular table. A tblspace contains all the disk space allocated to a given table or table fragment (if the table is fragmented). A separate tblspace contains the disk space allocated for the associated index.

A tblspace, for example, does not correspond to any particular part of a chunk or even to any particular chunk. The indexes and data that make up a tblspace might be scattered throughout your chunks. The tblspace, however, represents a convenient accounting entity for space across chunks devoted to a particular table.

3 Database limits

Before to start, remeber the Informix server limits according different storage options.

3.1 Unfragmented tables and indexes

Every row in an unfragmented table has a number called a ROWID that uniquely identifies the row within the table and also indicates where in the table the row is stored. The ROWID for a row is represented by a 4 byte integer, the most significant 3 bytes of that integer indicate the page where the row is stored and the least significant byte indicates the slot on that page where the row is stored.

If you represent the ROWID as a hexidecimal number the leftmost 6 digits are the page number and the rightmost 2 digits are the slot number.

Example:

A row with a ROWID of 0x4F0033 is located in slot 0x33 (51) of page 0x4F00 (20224).

  • 3 bytes are used to store the page number a maximum of 16,777,215 pages can be used in one table.
  • 1 byte is used to store the slot number, a single page can have at most 255 slots/rows.
The maximum number of rows in a table or fragment is 4,278,189,825.

As we see, the limit of 16,777,215 pages per table partition or fragment means we can have up to 255 GB per table partition: $$ 16,777,215 * 16384 = \frac{274877890560}{1024*1024*1024} = 255 GB $$

Also it imposes a limit on the maximum extent size. On a 16K page size dbspace you can have an extent of a maximum of 268,435,440 KB (2748778905601024 b) cause: $$ \frac{16,777,215 * 16384}{1024} = 268,435,440 \texttt{ KB} = 255 GB $$

3.2 Fragmented tables and indexes

If a table is fragmented then the ROWID for each row is only guaranteed to be unique within its fragment. So to determine the limits in a fragmented table multiply the limits for an unfragmented table times the number of fragments used by your table. (The number of rows per page remains the same, of course).

On fragmented tables, you can use the WITH ROWIDS clause during table creation so a unique rowid is added as pseudo column.

3.3 ISAM error 136 : no more extents

As we know Informix can only allocate a maximum of 16,777,215 disk pages per table partition or fragment.

Also, it can only track approximately 200 extents (logical grouping of disk pages) per table partition or fragment.

If any of the limits above are reached, the ISAM error 136 will be throw while adding rows to a table.

4 Database storage

4.1 Physical log

Physical logging is the process of storing the pages that the database server is going to change before the changed pages are actually recorded on disk. Before the database server modifies certain pages in the shared-memory buffer pool, it stores before-images of the pages in the physical-log buffer in shared memory.

The size of the physical log depends on two factors: the rate at which transactions generate physical log activity and whether you set the RTO_SERVER_RESTART configuration parameter

The rate at which transactions generate physical log activity can affect checkpoint performance. During checkpoint processing, if the physical log starts getting too full as transactions continue to generate physical log data, the database server blocks transactions to allow the checkpoint to complete and to avoid a physical log overflow.

To avoid transaction blocking, the database server must have enough physical log space to contain all of the transaction activity that occurs during checkpoint processing. Checkpoints are triggered whenever the physical log becomes 75 percent full. When the physical log becomes 75 percent full, checkpoint processing must complete before the remaining 25 percent of the physical log is used. Transaction blocking occurs as soon as the system detects a potential for a physical log overflow, because every active transaction might generate physical log activity.

To read more about estimating the size of physical log you can look here.

4.1.1 Using a plogspace

When the database server initializes disk space, it places the disk pages that are allocated for the physical log in the root dbspace. You might improve performance by moving the physical log to another dbspace.

You can move the physical log to a dbspace or the plogspace. When the physical log is in the plogspace, the database server increases the size of the physical log as needed to improve performance. When the physical log is in a dbspace, you must manually increase the size of the physical log.

To move the physical log to the plogspace, create the plogspace by running the onspaces -c -P command or the SQL administration API admin() or task() function with the create plogspace argument. To change the location of the plogspace, create a new plogspace. The physical log is moved to the new plogspace and the old plogspace is dropped.

To create a plogspace you can do:

Copy
onspaces -c -P plog -p /INFORMIXDEV/d_plog -o 0 -s 16777216

Physical log will be moved from it's location (default rootdbs) to the new plogspace.

4.1.2 Check physical size and use

To check your physical log size use:

Copy
$ onstat -g ckp
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 09:46:52 -- 7766416 Kbytes

AUTO_CKPTS=Off   RTO_SERVER_RESTART=Off   

                                                                    Critical Sections                          Physical Log    Logical Log    
           Clock                                  Total Flush Block #      Ckpt  Wait  Long  # Dirty   Dskflu  Total    Avg    Total    Avg   
Interval   Time      Trigger    LSN               Time  Time  Time  Waits  Time  Time  Time  Buffers   /Sec    Pages    /Sec   Pages    /Sec  
692        01:19:04 *Admin      6:0xe1f1cc        0.1   0.0   0.0   0      0.0   0.0   0.0   7         7       55       0      1        0     
693        01:24:33  CKPTINTVL  6:0xe2e018        0.0   0.0   0.0   0      0.0   0.0   0.0   40        40      2        0      15       0     
694        05:04:35  CKPTINTVL  6:0xef31cc        0.3   0.0   0.0   0      0.0   0.0   0.0   15        15      70       0      197      0     
695        10:14:37  CKPTINTVL  6:0xef5334        0.1   0.0   0.0   0      0.0   0.0   0.0   294       294     58       0      2        0     
696        10:44:54  CKPTINTVL  6:0xef7018        15.5  15.2  0.0   0      0.0   0.0   0.0   111327    7333    1260     0      2        0     
697        10:50:37  CKPTINTVL  6:0xef90d8        43.0  42.7  0.0   1      0.0   0.3   0.3   40774     955     98       0      2        0     
698        10:53:53 *Admin      6:0xefb390        0.2   0.0   0.0   1      0.0   0.2   0.2   1         1       402      1      2        0     
699        10:53:53  Admin      6:0xefe080        0.0   0.0   0.0   1      0.0   0.0   0.0   0         0       0        0      4        4     
700        10:56:20 *Admin      6:0xf01018        0.2   0.0   0.0   0      0.0   0.0   0.0   1         1       2        0      3        0     
701        10:56:21 *Admin      6:0xf030d0        0.0   0.0   0.0   0      0.0   0.0   0.0   0         0       0        0      2        2     
702        11:01:34  CKPTINTVL  6:0xf06018        0.1   0.0   0.0   0      0.0   0.0   0.0   1         1       2        0      3        0     

Max Plog       Max Llog       Max Dskflush   Avg Dskflush   Avg Dirty      Blocked      
pages/sec      pages/sec      Time           pages/sec      pages/sec      Time         
10240          1280           43             1440           31             0

4.2 Creating regular dbspaces

Before loading a large table you should create the appropriate dbspace or dbspaces according it's storage needs.

Let's review how to create a dbspace using onspaces command.

Fore example to create a 100GB dbspace with 16K page size do:

Copy
onspaces -c -d d_data -k 16 -p /INFORMIXDEV/d_data_01 -o 0 -s 104857600
The onspaces size must be expressed in K bytes. So 100 GB expressed in K is: $$ sizeKB = \frac{100*(1024*1024*1024)}{1024} = 100*(1024*1024) = 104857600 $$

To add a second chunk of 100GB more do:

Copy
onspaces -a d_data  -p /INFORMIXDEV/d_data_02 -o 0 -s 104857600

Now use onstat -d to verify dbspace and chunks. As we see, d_data is the dbspace number 11 and it has two chunks numbers 15 and 16.

Copy
$ onstat -d
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 1 days 00:32:41 -- 8044384 Kbytes
...

Dbspaces
address          number   flags      fchunk   nchunks  pgsize   flags    owner    name
...
58905b68         11       0x4020001  15       2        16384    N  BA    informix d_data


Chunks
address   chunk/dbs   offset  size     free    bpages  flags pathname
...
5890f028    15     11     0   6553600  6553597         PO-B-D /INFORMIXDEV/d_data_01
590bbdf8    16     11     0   6553600  6553597         PO-B-D /INFORMIXDEV/d_data_02
...

The values in the "size" and "free" columns for DBspace chunks are displayed in terms of "pgsize" of the DBspace to which they belong

4.3 Creating temporary dbspaces

To create indexes you will need temporary space. Make sure that your onconfig parameter DBSPACETEMP is set to a temporary dbspace (or better to multiple temporary dbspaces). In our example we have 4 temporay dbspaces.

Copy
$ onstat -c | grep DBSPACETEMP
DBSPACETEMP t_tmp1,t_tmp2,t_tmp3,t_tmp4

4.3.1 Temp space estimation

To estimate the amount of temporary space needed for an index build, perform the following steps:

  1. Add the total widths of the indexed columns or returned values from user-defined functions. This value is referred to as colsize.
  2. Estimate the size of a typical item to sort with one of the following formulas, depending on whether the index is attached or not:
    • For a nonfragmented table and a fragmented table with an index created without an explicit fragmentation strategy, use the following formula: $$ \texttt{sizeof_sort_item} = keysize + 4 $$
    • For fragmented tables with the index explicitly fragmented, use the following formula: $$ \texttt{sizeof_sort_item} = keysize + 8 $$
  3. Estimate the number of bytes needed to sort with the following formula: $$ \texttt{temp_bytes} = 2 * (\texttt{rows} * \texttt{sizeof_sort_item}) $$

This formula uses the factor 2 because everything is stored twice when intermediate sort runs use temporary space. Intermediate sort runs occur when not enough memory exists to perform the entire sort in memory.

The following table shows the estimation of temporary space need to create the lineitem primary key index on different TPCH scales. The keysize is 8 and we add 8 (instead of 4) even without fragmentation.

TPCH Rows Calculus
at 2K
2K pages 4K pages GB
10 60,000,000 $$ 2 * (\texttt{60,000,000} * \texttt{8+8}) = \frac{1,920,000,000}{2048} $$ 937,500 11,7187 1.8
100 600,000,000 $$ 2 * (\texttt{600,000,000} * \texttt{8+8}) = \frac{19,200,000,000}{2048} $$ 9,375,000 1,171,875 18
200 1,200,000,000 $$ 2 * (\texttt{1,200,000,000} * \texttt{8+8}) = \frac{38,400,000,000}{2048} $$ 18,750,000 2,343,750 36
1000 6,000,000,000 $$ 2 * (\texttt{6,000,000,000} * \texttt{8+8}) = \frac{192,000,000,000}{2048} $$ 93,750,000 11,718,750 179
2000 12,000,000,000 $$ 2 * (\texttt{12,000,000,000} * \texttt{8+8}) = \frac{384,000,000,000}{2048} $$ 187,500,000 23,437,500 356

Notice that we will use keysize + 8 always. For experience it's safe to use this factor even with non fragmented indexs.

Number of temp spaces

In the previous table, we can see again that we can reach the 16,777,215 pages limit if we were using only a single temp dbspace. Fortunatelly, you can have multiple temporary dbspaces so it's easy to divide the number of pages need in diferent dbspaces.

The decision to use 2K or 16K can be ignored but we suggest to use 16K if large tables with large average row sizes are used.

4.3.2 Determine temp space available

It is important to verify you have enought pages avaiable to sort the index in temp space.

You can use onstat -d to look for temporary spaces and sum it's pages. Or you can use the following SQL commands to get a more detailed view.

Copy
SELECT   sum(syschunks.chksize) pages, 
         format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) total,
         format_units(sum(syschunks.nfree   * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) free,
         format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)) -
                                  sum(syschunks.nfree   * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) used,
         round (100 - ((sum(syschunks.nfree)) / (sum(syschunks.chksize)) * 100), 2) pct_used
    FROM sysdbspaces,syschunks
   WHERE sysdbspaces.dbsnum = syschunks.dbsnum
     AND sysdbspaces.is_temp = 1
     AND sysdbspaces.is_sbspace = 0
pages total      free       used               pct_used

         8192000 15.6 GB    13.5 GB    2.10 GB               13,42

To see all temporary dbspaces in detail

Copy
SELECT              
         sysdbspaces.name[1,15] name,
         sum(syschunks.chksize) pages,
         format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) total,
         format_units(sum(syschunks.nfree   * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) free,
         format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)) -
                                  sum(syschunks.nfree   * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) used,
         round (100 - ((sum(syschunks.nfree)) / (sum(syschunks.chksize)) * 100), 1)::DECIMAL(5,1) pct_used
    FROM sysdbspaces,syschunks
   WHERE sysdbspaces.dbsnum = syschunks.dbsnum
     AND sysdbspaces.is_sbspace = 0
     AND sysdbspaces.is_temp = 1
GROUP BY 1
ORDER BY 1
name                       pages total      free       used       pct_used
         
t_tmp1                   2048000 3.91 GB    3.38 GB    537 MB         13,4
t_tmp2                   2048000 3.91 GB    3.38 GB    537 MB         13,4
t_tmp3                   2048000 3.91 GB    3.38 GB    537 MB         13,4
t_tmp4                   2048000 3.91 GB    3.38 GB    537 MB         13,4

If you don't have enogth space for parallel index build system will run slow during index creation. You can see the following message in Informix messages:

Copy
$ onstat -m
20:51:33  WARNING: Not enough temp space for parallel index build.
          Space required = 18750290 pages; space available = 8163188 pages.
          Partial index build started.

If you have not enough temporary space for sort you will get error

Copy
179: ISAM error: no free disk space for sort

4.3.3 Adding more temporary space

As we can see from previous data we don't have enought space to create any of the test indexs. So we need to resize our temp spaces or allocate more units and add them to DBSPACETEMP configuration variable.

In our case, we will keep 4 temp spaces but of 50GB each to have 200GB of temp space witch is the need for TPCH at scale 1000. As we need to give the size parameter for the onspaces command in K we can convert 50 GB to Kbytes doing: $$ \frac{50 * (1024*1024*1024)}{1024} = \frac{53687091200}{1024} = 52428800 $$

Copy
$ onspaces -c -d t_tmp1 -t -p /INFORMIXDEV/t_tmp1 -o 0 -s 52428800 
$ onspaces -c -d t_tmp2 -t -p /INFORMIXDEV/t_tmp2 -o 0 -s 52428800 
$ onspaces -c -d t_tmp3 -t -p /INFORMIXDEV/t_tmp3 -o 0 -s 52428800 
$ onspaces -c -d t_tmp4 -t -p /INFORMIXDEV/t_tmp4 -o 0 -s 52428800

4.4 Using temp space in memory

Queries performing sort operations should consume a lot of temp space in the Informix temporary dbspaces. But if the environment variable PSORT_DBTEMP is set, the engine will send all the temporary data to its value.

The database server uses the directory that PSORT_DBTEMP specifies, even if the environment variable PSORT_NPROCS is not set.

You can check if a session has PSORT_DBTEMP (or any other env variable) by using:

Copy
$ onstat -g env 40
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 00:30:26 -- 7766416 Kbytes

Environment for session 40:

Variable            Value [values-list]
CLIENT_LOCALE       en_us.utf8
CLNT_PAM_CAPABLE    1
DBCENTURY           C
DBDATE              DMY4-
DBDELIMITER         |
DBMONEY             ,
DBPATH              .
...
PSORT_DBTEMP        /mnt/ramdisk
...

If we look at /mnt/ramdisk during index creation we will se the sort temp files

Copy
ls -l /mnt/ram*
total 5022332
-rw------- 1 root informix  857145344 ago 13 21:32 srt0000107_001
-rw------- 1 root informix  779223040 ago 13 21:32 srt0000108_002
-rw------- 1 root informix 1792212992 ago 13 21:36 srt0000109_003
-rw------- 1 root informix 1714290688 ago 13 21:36 srt0000110_004

5 Table sizing

You can get a simple estimation of table size by multiplying the number of expected rows by it's row size. Row size can be determined by using:

Copy
SELECT rowsize FROM systables WHERE tabname = "lineitem";
146

To determine the data pages need you can use: $$ \texttt{data_pages} = \frac{rows}{trunc(pagesize/(rowsize + 4))} $$ so we have divisor of 2048/(146+4) = 13.6 for 2K pages and 16384/(146+4) = 109.2 for 16K pages.

Using TPCH sample we get the following metrics:

TPCH
Scale
Rows Row size * rows Bytes GB 2K pages
div=13.6
16K pages
div=109.2
Requires
Fragments
100 600M 146 * 600,000,000 87,600,000,000 81 44,117,647 5,494,505 NO
200 1.2B 146 * 1,200,000,000 175,200,000,000 163 88,235,294 10,989,010 NO
1000 6B 146 * 6,000,000,000 876,000,000,000 815 441,176,470 54,945,054 YES

So we can deduce:

  • We can place the 600M or even 1.2B rows in a single dbspace but only using 16K pagesize. Using 2K we will exceed the maximum number of pages per table of fragment (16,777,215).
  • We can not place the 6B rows in a single dbspace for two causes.
    1. 6B rows is gretater than the 4B rows limit per fragment
    2. Even at 16K pagesize we exceed the maximum number of extenters per tblspace of fragment (16,777,215).

5.1 Sizing dbspaces

The following table shows dbspace size need (in K bytes) to place the table according it's size, using 16K page size values.

Size 16K Pages need Roundup Extra Calculus K GB
100 5,494,505 5,500,000 6,000,000 $$ \frac{6,000,000 * 16384}{1024} $$ 96000000 Kb 91 GB
200 10,989,010 11,000,000 12,000,000 $$ \frac{12,000,000 * 16384}{1024} $$ 192000000 Kb 183 GB
1000 54,945,054 55,000,000 60,000,000 $$ \frac{60,000,000 * 16384}{1024} $$ 960000000 Kb 915 GB

5.1.1 Creating a dbspace

For example, to create a dbspace with pageize 16K for a size of 192000000 KB (onparams -s uses Kbytes as unit).

Copy
$ onspaces -c -d d_data -k 16 -p /INFORMIXDEV/d_data_01 -o 0 -s 192000000
Verifying physical disk space, please wait ...
Space successfully added.

** WARNING **  A level 0 archive of Root DBSpace will need to be done.

After it's added (we should make a backup before using it) we can check it's state.

Copy
$ onstat -d
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 1 days 03:14:56 -- 8044384 Kbytes

Dbspaces
address          number   flags      fchunk   nchunks  pgsize   flags    owner    name
....
58905b68         11       0x20001    11       1        16384    N  BA    informix d_data

Chunks
address          chunk/dbs     offset     size       free       bpages     flags pathname
....
5890f028         11     11     0          12000000   11999947              PO-B-D /INFORMIXDEV/d_data_01

Notice that chunk size and free is expressed as 16K pages and we have 12M pages as expected with is: $$ 12000000 * 16384 = 196608000000 b = 192000000 KB = 183 GB $$

5.2 Define table extent size

We will create the table in the appropriate dbspace and using a prefefined initial extent size that will try to fit all data in a single extent. As size of extent is given in K bytes, we can use the same conversion that we have used for dbspace allocation.

TPCH scale Pages Roundup Extents
100 5,494,505 5,500,000 $$ \texttt{first_extent} = \frac{5,500,000 * 16384}{1024} = 88000000 \texttt{ KB} \texttt{ (84 GB)} $$
200 10,989,010 11,000,000 $$ \texttt{first_extent} = \frac{11,000,000 * 16384}{1024} = 176000000 \texttt{ KB} \texttt{ (167 GB)} $$
1000 ... ... ...
Copy
EXTENT SIZE Options

|--+------------------------------+----------------------------->
   '-EXTENT SIZE--first_kilobytes-'   

>--+---------------------------+--------------------------------|
   '-NEXT SIZE--next_kilobytes-'
Copy
create raw table lineitem (
  l_orderkey integer not null,
  ...
) 
in d_data
extent size 160000000 next size 8000000
lock mode row;

5.2.1 Check allocation

Lets check allocation with a test table. Execute the following code (you don't need all columns as we are only checking extent allocation). This procedure is useful to test if we have allocated all according we expected.

Copy
create raw table lineitem (
  l_orderkey integer not null
) 
in d_data
extent size 160000000 next size 8000000
lock mode row;

Now run oncheck -pt tpch:lineitem and let's analyze the output.

Copy
$ oncheck -pt tpch:lineitem
TBLspace Report for tpch:informix.lineitem

    Physical Address               11:2192
    Creation date                  08/11/2017 23:46:23
    TBLspace Flags                 8902       Row Locking
                                              TBLspace contains VARCHARS
                                              TBLspace use 4 bit bit-maps
    Maximum row size               146       
    Number of special columns      1         
    Number of keys                 0         
    Number of extents              1         
    Current serial value           1         
    Current SERIAL8 value          1         
    Current BIGSERIAL value        1         
    Current REFID value            1         
    Pagesize (k)                   16        
    First extent size              10000000  
    Next extent size               500000    
    Number of pages allocated      10000000  
    Number of pages used           9763204   
    Number of data pages           9760782   
    Number of rows                 1200018434
    Partition partnum              11534399  
    Partition lockid               11534399  

    Extents                       
         Logical Page     Physical Page        Size Physical Pages
                    0           11:4472    10000000   80000000

You can see:

  • Pagesize (k) is 16K
  • First extent size is 10000000 k
  • Next extent size is 500000 k (but it's not allocated)
  • Number of pages allocated is 1000000 (the first extent)
  • There are 1,200,018,434 rows
  • There is only one extent allocated with size 1000000 and 8000000 physical pages that is: $$ \frac{10000000 * 16384}{2048} = 80,000,000 $$ as physical pages are 2K in our system. So physical pages are 8 times our page size that is 16K.

5.3 Loading table data

Now you can simply load the data from the external lineitem.tbl file.

Before load you need to setup env variables with the format for dates and numbers
Copy
DBDATE=Y4MD-
DBMONEY=.
Copy
create external table lineitem_ext sameas lineitem using (datafiles("disk:/data/lineitem.tbl"));
insert into lineitem select * from lineitem_ext;
alter table lineitem type (standard);

5.3.1 Verify occupation

After loading, you can check tblspace occupation by using oncheck -pt:

Copy
$ oncheck -pt tpch:lineitem
...

6 Index sizing

To create a large index we should check:

  • Estimate it's size and determine the appropriate dbspace or dbspaces (if fragmented).
  • Estimate the temp space need for sorting and ensure you have temporary dbspaces available for such space.
  • Ensure DBSPACETEMP onconfig variable points to those temporary dbspaces.
  • Ensure to set PDQ PRIORITY to a high value before running index creation.

6.1 Raw estimation of size

First, determine how big is the index using it's "raw" size. This size it's only an approximation cause index will not fill all pages.

TPCH scale Teorical size Raw size
100 $$ 6,000,000 * (8+5) = 7800000000 \texttt{ bytes} $$ 7.2 GB
200 $$ 1,200,000,000 * (8+5) = 15600000000 \texttt{ bytes} $$ 15 GB
1000 $$ 6,000,000,000 * (8+5) = 78000000000 \texttt{ bytes} $$ 72 GB
2000 $$ 12,000,000,000 * (8+5) = 156000000000 \texttt{ bytes} $$ 144 GB

6.2 Pages estimation

You can estimate the size of an R-tree index in pages by performing a series of calculations based on the number of rows in the table.

  1. Determine the size, in bytes, of each index entry in the leaf page with the following formula that incorporates the overhead: $$ leafentrysize = \sum_{colsizes} + 5 $$
    For data types other than VARCHAR, the length of an index item is calculated by adding the length of the key value plus 5 bytes for each rowid information associated with the key value.
  2. Estimate the number of entries per index-leaf page with the following formula: $$ leafpagents = \frac{pagesize}{leafentrysize} * \frac{60}{100} $$

    The value leafpagents is multiplied by 60 % because index leaf pages are usually just over half full.

  3. Estimate the number of leaf pages need with the following formula $$ leaves = \frac{rows}{leafpagents} $$ and finally $$ leaves = \frac{rows}{\frac{pagesize}{leafentrysize} * \frac{60}{100}} $$

Now determine how many pages will need using 2K or 16K and verify if they fit in a single fragment that is 16,777,215 pages. We will use the previous formula from this section.

TPCH scale 2K pages 16K pages Index size 16K Round up Dbspace size
100 $$ \frac{600000000}{\frac{2048}{ (8 + 5)} * \frac{60}{100}} = 6,281,407 $$ $$ \frac{600000000}{\frac{16384}{ (8 + 5)} * \frac{60}{100}} = 793,461 $$ 12 GB 1,000,000 15 GB
200 $$ \frac{1200000000}{\frac{2048}{ (8 + 5))} * \frac{60}{100}} = 12,695,725 $$ $$ \frac{1200000000}{\frac{16384}{ (8 + 5))} * \frac{60}{100}} = 1,586,923 $$ 24 GB 2,000,000 30 GB
1000 $$ \frac{6000000000}{\frac{2048}{ (8 + 5))} * \frac{60}{100}} = 63,478,628 $$ $$ \frac{6000000000}{\frac{16384}{ (8 + 5))} * \frac{60}{100}} = 7,934,570 $$ 122 GB 10,000,000 152 GB
2000 $$ \frac{12000000000}{\frac{2048}{ (8 + 5))} * \frac{60}{100}} = 126,984,126 $$ $$ \frac{12000000000}{\frac{16384}{ (8 + 5))} * \frac{60}{100}} = 15,869,140 $$ 242 GB 16,000,000 244 GB
  • Using 2K page size, we can't create the index at TPCH 1000 or higher cause it exceeds the 16,777,215 limit.
  • On TPCH 2000, we are near the 16,777,215 pages limit. So we should start index fragmentation.
  • In general for large tables and index, we suggest to always use 16K pages.

6.3 Sizing dbspaces

We can use the information obtained above to determine the size of index dbspace. The follwing table shows the size in K bytes to be used in onspaces command and the resulting space need in the device.

TPCH scale 16K Round up dbspace K size disk size
100 1,000,000 $$ \frac{1,000,000*16384}{1024} = 16000000 $$ 15.2 GB
200 2,000,000 $$ \frac{2,000,000*16384}{1024} = 32000000 $$ 30 GB
1000 10,000,000 $$ \frac{10,000,000*16384}{1024} = 160000000 $$ 152 GB
2000 10,000,000 $$ \frac{16,000,000*16384}{1024} = 320000000 $$ 244 GB

The onspaces command to create a dbspace for an index of 16000000 K pages is:

Copy
$ onspaces -c -d i_data -k 16 -p /INFORMIXDEV/i_data_01  -o 0 -s 16000000

7 Tunning onconfig

Sample suggestions for onconfig variables.

Variable 16GB 32GB
SHMTOTAL 16777216 33554432
SHMVIRTSIZE 8388608 16777216
DS_TOTAL_MEMORY 2560000 5120000
DS_NONPDQ_QUERY_MEM 640000 128000
DS_MAX_QUERIES 64 64
DS_MAX_SCANS 1048576 1048576
MAX_PDQPRIORITY 100 100
BUFFERPOOL 4 GB 8 GB
DBSPACETEMP 4 dbspaces or more on different disks if possible

And scale up a

8 Table of results

The following table contains a general overview of the examples incuded in this document for the different TPCH scales.

TPCH Rows
milions
Data size
GB
Index size
GB
Temp space
GB
Table pages
Estimated
Real
Index pages
Estimated
Real
Table
load time
Index
build time
PDQ=0
Index
build time
PDQ=100
100 600 74 12 18 5,346,679
4,881,838
793,461
694,622
01:39:15 00:43:40 00:07:25
200 1200 160 24 36 10,693,359
9,763,204
1,586,923
1,571,916
03:05:38 01:42:09 00:15:09
1000 6000 ... ... 179 ... ... ... ... ...
2000 12000 ... ... 356 ... ... ... ... ...

9 Disk considerations

On very large database systems a good disk performance and architecture is critical. If fragmentation us used for a large table or index it's desirable to know that each fragment will be placed on different disks.

As general rules to apply if possible:

  • Use a specific disk for rootdbs.
  • Use a specific disk for physical log.
  • Use a specific disk for logical log.
  • Use a specific disk for each temporay space.
  • Use a specific disk for each large table and a different one for it's index or indexes.
  • If fragmentation is used, try to use a disk foreach fragment of data and a different disk for each fragment of index.

9.1 Disk performance

Under Linux, the dd command can be used for simple sequential I/O performance measurements.

For more detailed I/O performance benchmarking, a tool like the Flexible I/O Tester (Fio) can be used.

9.1.1 Measuring Write Performance

For measuring write performance, the data to be written should be read from /dev/zero and ideally written it to an empty RAID array, hard disk or partition (such as using of=/dev/sda for the first hard disk or of=/dev/sda2 for the second partition on the first hard disk). If this is not possible, a normal file in the file system (such as using of=/disk1/test) can be written. For safety reasons we are using test files in the following examples. The write performance achieved thereby will be a little slower (because metadata will also be written to the file system).

Throughput (Streaming I/O)

Copy
$ dd if=/dev/zero of=/disk1/test bs=1G count=1 oflag=direct
1+0 registros leídos
1+0 registros escritos
1073741824 bytes (1,1 GB) copied, 4,49933 s, 239 MB/s

Latency

In this test, 512 bytes were written one thousand times

Copy
$ dd if=/dev/zero of=/disk1/test bs=512 count=1000 oflag=direct
1+0 registros leídos
1+0 registros escritos
512000 bytes (512 kB) copiados, 0,138732 s, 3,7 MB/s

10 Memory considerations

When using large memory (for example using a memory filesysem for PSORT_TEMP) you must ensure to have or assign (in virtualized env) enought physical memory to the operating system.

In the following assert failure, we can see how database server receives a signal 9 from OS with a missconfigured Linux using a large filesystem in memory.

As the memory filesystem grows and OS can't allocate sufficient memory, it kills the process demanding memory. In this case, the Informix database using PSORT_TEMP for a large sort operation.

Copy
22:18:14  Assert Failed: Unexpected virtual processor termination: pid = 2117, signal 9 received.
22:18:14  IBM Informix Dynamic Server Version 12.10.FC9
22:18:14   Who: Session(2, informix@, 0, (nil))
                Thread(8, sm_poll, 0, 1)
                File: mt.c Line: 14715
22:18:14  stack trace for pid 2116 written to /home/informix/tmp/af.3f0b405
22:18:14   See Also: /home/informix/tmp/af.3f0b405
22:18:15  The Master Daemon Died
22:18:16  PANIC: Attempting to bring system down