While there is apparently no official or standard definition for the term
Very Large Database (VLDB), it is sometimes used to describe databases
occupying magnetic storage in the terabyte range and containing billions of table
rows.
Handling such large tables, require proper setup of database
storage and have many
complex issues.
The following sections will explore how to deal with large tables and it's indexes
by using a TPCH database
example.
1 TPCH sample
To run the following examples we will use DBGEN utility.
DBGEN is a database
population program for use with the TPC-H benchmark.
We will use TPCH database
example with focus on it's largest table
lineitem
(it's size is about 70% of total) to generate different sizes.
So we can use the -T modifier to generate only the lineitem
table
and skip
all other TPCH database
tables.
/home/dbgen/dbgen -s 10 -f -T L
The following table
shows the tests that will be used for our sizing tutorial.
TPCH Scale | lineitem.tbl | Rows | 16K pagestable
|
16K pagesindex
|
dbgen time |
Load time 100K rows second |
Load time (1) |
Index time PDQ=0 |
Index time PDQ=100 |
---|---|---|---|---|---|---|---|---|---|
10 | 7 GB | 60M | 488,046 | 69,451 | 00:02:35 | 10 min | 00:09:47 | 00:02:09 | 00:00:36 |
100 | 74 GB | 600M | 4,881,838 | 694,622 | 00:24:28 | 1.6 hours | 01:39:15 | 00:43:40 | 00:07:25 |
200 | 160 GB | 1,2B | 9,763,204 | -- | 00:51:10 | 3.3 hours | 03:05:38 | 01:42:09 | 00:15:09 |
1000 | --- GB | 6B | -- | -- | ... | 16 hours | ? | ? hours | ? hours |
2000 | --- GB | 12B | -- | -- | ... | 32 hours | ? | ? hours | ? hours |
- Load time no transactions, no HDR, 1 SSD (rootdbs + logs + data), 1 SSD for
index
- Create primary key
index
with PDQ PRIORITY = 0 - Create primary key
index
with PDQ PRIORITY = 100
As you can see, even for a teorical system load througput of 100K rows per second, the load time is significant.
Test was done in a Dell 730,
virtualized machine with 12GB RAM, 16 CPU (32 cores) Intel Xeon E312xx 2.7 Ghz, SSD 1TB disks.
Informix
version was 12.10.FC9 configured with 8GB RAM total and 1GB for DS_TOTAL_MEMORY.
The SQL operations to test are:
- Create data
table
and load it.Copycreate raw table lineitem ( l_orderkey integer not null , l_partkey integer not null , l_suppkey integer not null , l_linenumber integer not null , l_quantity decimal(15,2) not null , l_extendedprice decimal(15,2) not null , l_discount decimal(15,2) not null , l_tax decimal(15,2) not null , l_returnflag char(1) not null , l_linestatus char(1) not null , l_shipdate date not null , l_commitdate date not null , l_receiptdate date not null , l_shipinstruct char(25) not null , l_shipmode char(10) not null , l_comment varchar(44) not null ) in d_data extent size 160000000 next size 8000000 lock mode row ; create external table lineitem_ext sameas lineitem using ( datafiles("disk:/tpch/lineitem.tbl")); insert into lineitem select * from lineitem_ext; alter table lineitem type (standard);
- Create primary key
index
with different options.CopyCREATE UNIQUE INDEX lineitem_pk ON lineitem (l_orderkey, l_linenumber) in i_data;
2 Database storage
We assume you are familiar with Informix
database
administration. But let's resume some
concepts we will need to use.
The database
server uses physical units of storage to allocate disk space.
It stores data in logical units. Unlike the logical units of storage whose size fluctuates,
each of the physical units has a fixed or assigned size that is determined by the disk architecture.
2.1 Physical units
The database
server uses the following physical units to manage disk space:
2.2 Logical units
The database
server stores data in the following logical units:
Unit | Description |
---|---|
Dbspaces
|
A
When you create a standard or temporary
When you create a standard |
Temporary dbspaces
|
A temporary
A temporary
Whenever you set up the |
Blobspaces
|
A blobspace is a logical storage unit composed of one or more chunks that store only TEXT and BYTE data.
A blobspace stores TEXT and BYTE data in the most efficient way possible.
You can store TEXT and BYTE columns associated with distinct tables (see
The |
Sbspaces
|
An sbspace is a logical storage unit composed of one or more chunks that store smart large objects. Smart large objects consist of CLOB (character large object) and BLOB (binary large object) data types. User-defined data types can also use sbspaces.
|
Temporary sbspaces
|
Use a temporary sbspace to store temporary smart large objects without metadata logging and user-data logging.
If you store temporary smart large objects in a standard sbspace, the metadata is logged.
|
Plogspace
|
A |
Extspaces
|
An extspace is a logical name associated with an arbitrary string that signifies the location of external data. The resource that the extspace references depends on a user-defined access method for accessing its contents.
For example, a An extspace is not required to be a file name. For example, it can be a network location. The routines that access the data can use information found in the string associated with the extspace in any manner. |
Databases
|
A
A |
Tables
|
When you create a
You can place the |
Tblspaces
|
Database server administrators sometimes must track disk use by a particular
A |
3 Database limits
Before to start, remeber the Informix
server limits according different storage options.
3.1 Unfragmented tables and indexes
Every row in an unfragmented table
has a number called a ROWID that uniquely identifies the row within
the table
and also indicates where in the table
the row is stored.
The ROWID for a row is represented by a 4 byte integer,
the most significant 3 bytes of that integer indicate the page
where the row is stored
and the least significant byte indicates the slot on that page
where the row is stored.
If you represent the ROWID as a hexidecimal number the leftmost 6 digits are the page
number and the rightmost 2 digits are the slot number.
Example:
A row with a ROWID of 0x4F0033 is located in slot 0x33 (51) of page
0x4F00 (20224).
- 3 bytes are used to store the
page
number a maximum of 16,777,215 pages can be used in onetable
. - 1 byte is used to store the slot number, a single
page
can have at most 255 slots/rows.
table
or fragment is 4,278,189,825.
As we see, the limit of 16,777,215 pages per table
partition or fragment
means we can have up to 255 GB per table
partition:
$$
16,777,215 * 16384 = \frac{274877890560}{1024*1024*1024} = 255 GB
$$
Also it imposes a limit on the maximum extent size. On a 16K page
size dbspace
you can have
an extent of a maximum of 268,435,440 KB (2748778905601024 b) cause:
$$
\frac{16,777,215 * 16384}{1024} = 268,435,440 \texttt{ KB} = 255 GB
$$
3.2 Fragmented tables and indexes
If a table
is fragmented then the ROWID for each row is only guaranteed to be unique within its fragment.
So to determine the limits in a fragmented table
multiply the limits for an unfragmented table
times the number of fragments used by your table
.
(The number of rows per page
remains the same, of course).
WITH ROWIDS
clause during table
creation so a unique
rowid is added as pseudo column.
3.3 ISAM error 136 : no more extents
As we know Informix
can only allocate a maximum of 16,777,215 disk pages
per table
partition or fragment.
Also, it can only track approximately 200 extents (logical grouping of disk pages)
per table
partition or fragment.
If any of the limits above are reached, the ISAM error 136 will be throw while
adding rows to a table
.
4 Database storage
4.1 Physical log
Physical logging is the process of storing the pages that the database
server is
going to change before the changed pages are actually recorded on disk.
Before the database
server modifies certain pages in the shared-memory buffer pool,
it stores before-images of the pages in the physical-log buffer in shared memory.
The size of the physical log depends on two factors: the rate at which transactions generate physical log activity and whether you set the RTO_SERVER_RESTART configuration parameter
The rate at which transactions generate physical log activity can affect checkpoint performance.
During checkpoint processing, if the physical log starts getting too full as transactions continue
to generate physical log data, the database
server blocks transactions to allow the checkpoint to complete and to avoid a physical log overflow.
To avoid transaction blocking, the database
server must have enough physical log space to
contain all of the transaction activity that occurs during checkpoint processing.
Checkpoints are triggered whenever the physical log becomes 75 percent full.
When the physical log becomes 75 percent full, checkpoint processing must complete
before the remaining 25 percent of the physical log is used.
Transaction blocking occurs as soon as the system detects a potential for a physical log overflow,
because every active transaction might generate physical log activity.
To read more about estimating the size of physical log you can look here.
4.1.1 Using a plogspace
When the database
server initializes disk space, it places the disk pages that are allocated for the physical log in the root dbspace
.
You might improve performance by moving the physical log to another dbspace
.
You can move the physical log to a dbspace
or the plogspace
. When the physical log is in the plogspace
,
the database
server increases the size of the physical log as needed to improve performance.
When the physical log is in a dbspace
, you must manually increase the size of the physical log.
To move the physical log to the plogspace
, create the plogspace
by running the onspaces -c -P
command or the SQL administration API admin() or task() function with the create plogspace
argument.
To change the location of the plogspace
, create a new plogspace
.
The physical log is moved to the new plogspace
and the old plogspace
is dropped.
To create a plogspace
you can do:
onspaces -c -P plog -p /INFORMIXDEV/d_plog -o 0 -s 16777216
Physical log will be moved from it's location (default rootdbs) to the new plogspace
.
4.1.2 Check physical size and use
To check your physical log size use:
$ onstat -g ckp
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 09:46:52 -- 7766416 Kbytes
AUTO_CKPTS=Off RTO_SERVER_RESTART=Off
Critical Sections Physical Log Logical Log
Clock Total Flush Block # Ckpt Wait Long # Dirty Dskflu Total Avg Total Avg
Interval Time Trigger LSN Time Time Time Waits Time Time Time Buffers /Sec Pages /Sec Pages /Sec
692 01:19:04 *Admin 6:0xe1f1cc 0.1 0.0 0.0 0 0.0 0.0 0.0 7 7 55 0 1 0
693 01:24:33 CKPTINTVL 6:0xe2e018 0.0 0.0 0.0 0 0.0 0.0 0.0 40 40 2 0 15 0
694 05:04:35 CKPTINTVL 6:0xef31cc 0.3 0.0 0.0 0 0.0 0.0 0.0 15 15 70 0 197 0
695 10:14:37 CKPTINTVL 6:0xef5334 0.1 0.0 0.0 0 0.0 0.0 0.0 294 294 58 0 2 0
696 10:44:54 CKPTINTVL 6:0xef7018 15.5 15.2 0.0 0 0.0 0.0 0.0 111327 7333 1260 0 2 0
697 10:50:37 CKPTINTVL 6:0xef90d8 43.0 42.7 0.0 1 0.0 0.3 0.3 40774 955 98 0 2 0
698 10:53:53 *Admin 6:0xefb390 0.2 0.0 0.0 1 0.0 0.2 0.2 1 1 402 1 2 0
699 10:53:53 Admin 6:0xefe080 0.0 0.0 0.0 1 0.0 0.0 0.0 0 0 0 0 4 4
700 10:56:20 *Admin 6:0xf01018 0.2 0.0 0.0 0 0.0 0.0 0.0 1 1 2 0 3 0
701 10:56:21 *Admin 6:0xf030d0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0 0 0 2 2
702 11:01:34 CKPTINTVL 6:0xf06018 0.1 0.0 0.0 0 0.0 0.0 0.0 1 1 2 0 3 0
Max Plog Max Llog Max Dskflush Avg Dskflush Avg Dirty Blocked
pages/sec pages/sec Time pages/sec pages/sec Time
10240 1280 43 1440 31 0
4.2 Creating regular dbspaces
Before loading a large table
you should create the appropriate dbspace
or dbspaces
according it's storage needs.
Let's review how to create a dbspace
using onspaces command.
Fore example to create a 100GB dbspace
with 16K page
size do:
onspaces -c -d d_data -k 16 -p /INFORMIXDEV/d_data_01 -o 0 -s 104857600
To add a second chunk
of 100GB more do:
onspaces -a d_data -p /INFORMIXDEV/d_data_02 -o 0 -s 104857600
Now use onstat -d
to verify dbspace
and chunks. As we see,
d_data is the dbspace
number 11 and it has two chunks numbers 15 and 16.
$ onstat -d IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 1 days 00:32:41 -- 8044384 Kbytes ... Dbspaces address number flags fchunk nchunks pgsize flags owner name ... 58905b68 11 0x4020001 15 2 16384 N BA informix d_data Chunks address chunk/dbs offset size free bpages flags pathname ... 5890f028 15 11 0 6553600 6553597 PO-B-D /INFORMIXDEV/d_data_01 590bbdf8 16 11 0 6553600 6553597 PO-B-D /INFORMIXDEV/d_data_02 ...
The values in the "size" and "free" columns for DBspace chunks are displayed in terms of "pgsize" of the DBspace to which they belong
4.3 Creating temporary dbspaces
To create indexes you will need temporary space. Make sure that your onconfig parameter DBSPACETEMP
is set to a temporary dbspace
(or better to multiple temporary dbspaces). In our example we have
4 temporay dbspaces.
$ onstat -c | grep DBSPACETEMP
DBSPACETEMP t_tmp1,t_tmp2,t_tmp3,t_tmp4
4.3.1 Temp space estimation
To estimate the amount of temporary space needed for an index
build, perform the following steps:
- Add the total widths of the indexed columns or returned values from user-defined functions. This value is referred to as colsize.
- Estimate the size of a typical item to sort with one of the following formulas, depending on whether the
index
is attached or not:-
For a nonfragmented
table
and a fragmentedtable
with anindex
created without an explicit fragmentation strategy, use the following formula: $$ \texttt{sizeof_sort_item} = keysize + 4 $$ -
For fragmented tables with the
index
explicitly fragmented, use the following formula: $$ \texttt{sizeof_sort_item} = keysize + 8 $$
-
For a nonfragmented
- Estimate the number of bytes needed to sort with the following formula: $$ \texttt{temp_bytes} = 2 * (\texttt{rows} * \texttt{sizeof_sort_item}) $$
This formula uses the factor 2 because everything is stored twice when intermediate sort runs use temporary space. Intermediate sort runs occur when not enough memory exists to perform the entire sort in memory.
The following table
shows the estimation of temporary space need to create the lineitem primary key index
on different TPCH scales. The keysize is 8 and we add 8 (instead of 4) even without fragmentation.
TPCH | Rows | Calculus at 2K |
2K pages | 4K pages | GB |
---|---|---|---|---|---|
10 | 60,000,000 | $$ 2 * (\texttt{60,000,000} * \texttt{8+8}) = \frac{1,920,000,000}{2048} $$ | 937,500 | 11,7187 | 1.8 |
100 | 600,000,000 | $$ 2 * (\texttt{600,000,000} * \texttt{8+8}) = \frac{19,200,000,000}{2048} $$ | 9,375,000 | 1,171,875 | 18 |
200 | 1,200,000,000 | $$ 2 * (\texttt{1,200,000,000} * \texttt{8+8}) = \frac{38,400,000,000}{2048} $$ | 18,750,000 | 2,343,750 | 36 |
1000 | 6,000,000,000 | $$ 2 * (\texttt{6,000,000,000} * \texttt{8+8}) = \frac{192,000,000,000}{2048} $$ | 93,750,000 | 11,718,750 | 179 |
2000 | 12,000,000,000 | $$ 2 * (\texttt{12,000,000,000} * \texttt{8+8}) = \frac{384,000,000,000}{2048} $$ | 187,500,000 | 23,437,500 | 356 |
Notice that we will use keysize + 8
always. For experience it's safe to use this factor
even with non fragmented indexs.
Number of temp spaces
In the previous table
, we can see again that we can reach the 16,777,215 pages
limit if we were using only a single temp dbspace
. Fortunatelly, you can have multiple
temporary dbspaces so it's easy to divide the number of pages need in diferent
dbspaces.
The decision to use 2K or 16K can be ignored but we suggest to use 16K if large tables with large average row sizes are used.
4.3.2 Determine temp space available
It is important to verify you have enought pages avaiable to sort the index
in temp space.
You can use onstat -d
to look for temporary spaces and sum it's pages.
Or you can use the following SQL commands to get a more detailed view.
SELECT sum(syschunks.chksize) pages, format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) total, format_units(sum(syschunks.nfree * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) free, format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)) - sum(syschunks.nfree * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) used, round (100 - ((sum(syschunks.nfree)) / (sum(syschunks.chksize)) * 100), 2) pct_used FROM sysdbspaces,syschunks WHERE sysdbspaces.dbsnum = syschunks.dbsnum AND sysdbspaces.is_temp = 1 AND sysdbspaces.is_sbspace = 0
pages total free used pct_used
8192000 15.6 GB 13.5 GB 2.10 GB 13,42
To see all temporary dbspaces in detail
SELECT sysdbspaces.name[1,15] name, sum(syschunks.chksize) pages, format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) total, format_units(sum(syschunks.nfree * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) free, format_units(sum(syschunks.chksize * (SELECT sh_pagesize FROM sysshmvals)) - sum(syschunks.nfree * (SELECT sh_pagesize FROM sysshmvals)))::CHAR(10) used, round (100 - ((sum(syschunks.nfree)) / (sum(syschunks.chksize)) * 100), 1)::DECIMAL(5,1) pct_used FROM sysdbspaces,syschunks WHERE sysdbspaces.dbsnum = syschunks.dbsnum AND sysdbspaces.is_sbspace = 0 AND sysdbspaces.is_temp = 1 GROUP BY 1 ORDER BY 1
name pages total free used pct_used
t_tmp1 2048000 3.91 GB 3.38 GB 537 MB 13,4
t_tmp2 2048000 3.91 GB 3.38 GB 537 MB 13,4
t_tmp3 2048000 3.91 GB 3.38 GB 537 MB 13,4
t_tmp4 2048000 3.91 GB 3.38 GB 537 MB 13,4
If you don't have enogth space for parallel index
build system will run slow during index
creation.
You can see the following message in Informix
messages:
$ onstat -m
20:51:33 WARNING: Not enough temp space for parallel index build.
Space required = 18750290 pages; space available = 8163188 pages.
Partial index build started.
If you have not enough temporary space for sort you will get error
179: ISAM error: no free disk space for sort
4.3.3 Adding more temporary space
As we can see from previous data we don't have enought space to create any of the
test indexs. So we need to resize our temp spaces or allocate more units and add
them to DBSPACETEMP
configuration variable.
In our case, we will keep 4 temp spaces but of 50GB each to have 200GB of temp space witch is the need for TPCH at scale 1000. As we need to give the size parameter for the onspaces command in K we can convert 50 GB to Kbytes doing: $$ \frac{50 * (1024*1024*1024)}{1024} = \frac{53687091200}{1024} = 52428800 $$
$ onspaces -c -d t_tmp1 -t -p /INFORMIXDEV/t_tmp1 -o 0 -s 52428800 $ onspaces -c -d t_tmp2 -t -p /INFORMIXDEV/t_tmp2 -o 0 -s 52428800 $ onspaces -c -d t_tmp3 -t -p /INFORMIXDEV/t_tmp3 -o 0 -s 52428800 $ onspaces -c -d t_tmp4 -t -p /INFORMIXDEV/t_tmp4 -o 0 -s 52428800
4.4 Using temp space in memory
Queries performing sort operations should consume a lot of temp space in the Informix
temporary dbspaces.
But if the environment variable PSORT_DBTEMP
is set, the engine will send all the temporary data to its value.
database
server uses the directory that PSORT_DBTEMP
specifies,
even if the environment variable PSORT_NPROCS
is not set.
You can check if a session has PSORT_DBTEMP
(or any other env variable) by using:
$ onstat -g env 40
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 00:30:26 -- 7766416 Kbytes
Environment for session 40:
Variable Value [values-list]
CLIENT_LOCALE en_us.utf8
CLNT_PAM_CAPABLE 1
DBCENTURY C
DBDATE DMY4-
DBDELIMITER |
DBMONEY ,
DBPATH .
...
PSORT_DBTEMP /mnt/ramdisk
...
If we look at /mnt/ramdisk during index
creation we will se the sort temp files
ls -l /mnt/ram*
total 5022332
-rw------- 1 root informix 857145344 ago 13 21:32 srt0000107_001
-rw------- 1 root informix 779223040 ago 13 21:32 srt0000108_002
-rw------- 1 root informix 1792212992 ago 13 21:36 srt0000109_003
-rw------- 1 root informix 1714290688 ago 13 21:36 srt0000110_004
5 Table sizing
You can get a simple estimation of table
size by multiplying the number of expected
rows by it's row size. Row size can be determined by using:
SELECT rowsize FROM systables WHERE tabname = "lineitem";
146
To determine the data pages need you can use: $$ \texttt{data_pages} = \frac{rows}{trunc(pagesize/(rowsize + 4))} $$ so we have divisor of 2048/(146+4) = 13.6 for 2K pages and 16384/(146+4) = 109.2 for 16K pages.
Using TPCH sample we get the following metrics:
TPCH
Scale |
Rows | Row size * rows | Bytes | GB | 2K pages div=13.6 |
16K pages div=109.2 |
Requires Fragments |
---|---|---|---|---|---|---|---|
100 | 600M | 146 * 600,000,000 | 87,600,000,000 | 81 | 44,117,647 | 5,494,505 | NO |
200 | 1.2B | 146 * 1,200,000,000 | 175,200,000,000 | 163 | 88,235,294 | 10,989,010 | NO |
1000 | 6B | 146 * 6,000,000,000 | 876,000,000,000 | 815 | 441,176,470 | 54,945,054 | YES |
So we can deduce:
- We can place the 600M or even 1.2B rows in a single
dbspace
but only using 16K pagesize. Using 2K we will exceed the maximum number of pages pertable
of fragment (16,777,215). - We can not place the 6B rows in a single
dbspace
for two causes.- 6B rows is gretater than the 4B rows limit per fragment
- Even at 16K pagesize we exceed the maximum number of extenters per
tblspace
of fragment (16,777,215).
5.1 Sizing dbspaces
The following table
shows dbspace
size need (in K bytes) to place the table
according it's size,
using 16K page
size values.
Size | 16K Pages need |
Roundup | Extra | Calculus | K | GB |
---|---|---|---|---|---|---|
100 | 5,494,505 | 5,500,000 | 6,000,000 | $$ \frac{6,000,000 * 16384}{1024} $$ | 96000000 Kb | 91 GB |
200 | 10,989,010 | 11,000,000 | 12,000,000 | $$ \frac{12,000,000 * 16384}{1024} $$ | 192000000 Kb | 183 GB |
1000 | 54,945,054 | 55,000,000 | 60,000,000 | $$ \frac{60,000,000 * 16384}{1024} $$ | 960000000 Kb | 915 GB |
5.1.1 Creating a dbspace
For example, to create a dbspace
with pageize 16K for a size of 192000000 KB (onparams -s uses Kbytes as unit).
$ onspaces -c -d d_data -k 16 -p /INFORMIXDEV/d_data_01 -o 0 -s 192000000
Verifying physical disk space, please wait ...
Space successfully added.
** WARNING ** A level 0 archive of Root DBSpace will need to be done.
After it's added (we should make a backup before using it) we can check it's state.
$ onstat -d
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 1 days 03:14:56 -- 8044384 Kbytes
Dbspaces
address number flags fchunk nchunks pgsize flags owner name
....
58905b68 11 0x20001 11 1 16384 N BA informix d_data
Chunks
address chunk/dbs offset size free bpages flags pathname
....
5890f028 11 11 0 12000000 11999947 PO-B-D /INFORMIXDEV/d_data_01
Notice that chunk
size and free is expressed as 16K pages and we have 12M pages
as expected with is:
$$
12000000 * 16384 = 196608000000 b = 192000000 KB = 183 GB
$$
5.2 Define table extent size
We will create the table
in the appropriate dbspace
and using a prefefined
initial extent size that will try to fit all data in a single extent.
As size of extent is given in K bytes, we can use the same conversion
that we have used for dbspace
allocation.
TPCH scale |
Pages
|
Roundup |
Extents
|
---|---|---|---|
100 | 5,494,505 | 5,500,000 | $$ \texttt{first_extent} = \frac{5,500,000 * 16384}{1024} = 88000000 \texttt{ KB} \texttt{ (84 GB)} $$ |
200 | 10,989,010 | 11,000,000 | $$ \texttt{first_extent} = \frac{11,000,000 * 16384}{1024} = 176000000 \texttt{ KB} \texttt{ (167 GB)} $$ |
1000 | ... | ... | ... |
EXTENT SIZE Options |--+------------------------------+-----------------------------> '-EXTENT SIZE--first_kilobytes-' >--+---------------------------+--------------------------------| '-NEXT SIZE--next_kilobytes-'
create raw table lineitem ( l_orderkey integer not null, ... ) in d_data extent size 160000000 next size 8000000 lock mode row;
5.2.1 Check allocation
Lets check allocation with a test table
. Execute the following code (you don't need all columns
as we are only checking extent allocation). This procedure is useful to test if we have allocated
all according we expected.
create raw table lineitem ( l_orderkey integer not null ) in d_data extent size 160000000 next size 8000000 lock mode row;
Now run oncheck -pt tpch:lineitem
and let's analyze the output.
$ oncheck -pt tpch:lineitem
TBLspace Report for tpch:informix.lineitem
Physical Address 11:2192
Creation date 08/11/2017 23:46:23
TBLspace Flags 8902 Row Locking
TBLspace contains VARCHARS
TBLspace use 4 bit bit-maps
Maximum row size 146
Number of special columns 1
Number of keys 0
Number of extents 1
Current serial value 1
Current SERIAL8 value 1
Current BIGSERIAL value 1
Current REFID value 1
Pagesize (k) 16
First extent size 10000000
Next extent size 500000
Number of pages allocated 10000000
Number of pages used 9763204
Number of data pages 9760782
Number of rows 1200018434
Partition partnum 11534399
Partition lockid 11534399
Extents
Logical Page Physical Page Size Physical Pages
0 11:4472 10000000 80000000
You can see:
- Pagesize (k) is 16K
- First extent size is 10000000 k
- Next extent size is 500000 k (but it's not allocated)
- Number of pages allocated is 1000000 (the first extent)
- There are 1,200,018,434 rows
- There is only one extent allocated with size 1000000 and 8000000 physical pages that is:
$$
\frac{10000000 * 16384}{2048} = 80,000,000
$$
as physical pages are 2K in our system.
So physical pages are 8 times our
page
size that is 16K.
5.3 Loading table data
Now you can simply load the data from the external lineitem.tbl
file.
DBDATE=Y4MD- DBMONEY=.
create external table lineitem_ext sameas lineitem using (datafiles("disk:/data/lineitem.tbl")); insert into lineitem select * from lineitem_ext; alter table lineitem type (standard);
5.3.1 Verify occupation
After loading, you can check tblspace
occupation by using oncheck -pt
:
$ oncheck -pt tpch:lineitem
...
6 Index sizing
To create a large index
we should check:
- Estimate it's size and determine the appropriate
dbspace
or dbspaces (if fragmented). - Estimate the temp space need for sorting and ensure you have temporary dbspaces available for such space.
- Ensure
DBSPACETEMP
onconfig variable points to those temporary dbspaces. - Ensure to set PDQ PRIORITY to a high value before running
index
creation.
6.1 Raw estimation of size
First, determine how big is the index
using it's "raw" size. This size it's only an approximation
cause index
will not fill all pages.
TPCH scale | Teorical size | Raw size |
---|---|---|
100 | $$ 6,000,000 * (8+5) = 7800000000 \texttt{ bytes} $$ | 7.2 GB |
200 | $$ 1,200,000,000 * (8+5) = 15600000000 \texttt{ bytes} $$ | 15 GB |
1000 | $$ 6,000,000,000 * (8+5) = 78000000000 \texttt{ bytes} $$ | 72 GB |
2000 | $$ 12,000,000,000 * (8+5) = 156000000000 \texttt{ bytes} $$ | 144 GB |
6.2 Pages estimation
You can estimate the size of an R-tree index
in pages by performing a
series of calculations based on the number of rows in the table
.
-
Determine the size, in bytes, of each
index
entry in the leafpage
with the following formula that incorporates the overhead: $$ leafentrysize = \sum_{colsizes} + 5 $$For data types other than VARCHAR, the length of anindex
item is calculated by adding the length of the key value plus 5 bytes for each rowid information associated with the key value. -
Estimate the number of entries per
index
-leafpage
with the following formula: $$ leafpagents = \frac{pagesize}{leafentrysize} * \frac{60}{100} $$The value leafpagents is multiplied by 60 % because
index
leaf pages are usually just over half full. - Estimate the number of leaf pages need with the following formula $$ leaves = \frac{rows}{leafpagents} $$ and finally $$ leaves = \frac{rows}{\frac{pagesize}{leafentrysize} * \frac{60}{100}} $$
Now determine how many pages will need using 2K or 16K and verify if they fit in a single fragment that is 16,777,215 pages. We will use the previous formula from this section.
TPCH scale | 2K pages | 16K pages | Index size | 16K Round up | Dbspace size |
---|---|---|---|---|---|
100 | $$ \frac{600000000}{\frac{2048}{ (8 + 5)} * \frac{60}{100}} = 6,281,407 $$ | $$ \frac{600000000}{\frac{16384}{ (8 + 5)} * \frac{60}{100}} = 793,461 $$ | 12 GB | 1,000,000 | 15 GB |
200 | $$ \frac{1200000000}{\frac{2048}{ (8 + 5))} * \frac{60}{100}} = 12,695,725 $$ | $$ \frac{1200000000}{\frac{16384}{ (8 + 5))} * \frac{60}{100}} = 1,586,923 $$ | 24 GB | 2,000,000 | 30 GB |
1000 | $$ \frac{6000000000}{\frac{2048}{ (8 + 5))} * \frac{60}{100}} = 63,478,628 $$ | $$ \frac{6000000000}{\frac{16384}{ (8 + 5))} * \frac{60}{100}} = 7,934,570 $$ | 122 GB | 10,000,000 | 152 GB |
2000 | $$ \frac{12000000000}{\frac{2048}{ (8 + 5))} * \frac{60}{100}} = 126,984,126 $$ | $$ \frac{12000000000}{\frac{16384}{ (8 + 5))} * \frac{60}{100}} = 15,869,140 $$ | 242 GB | 16,000,000 | 244 GB |
6.3 Sizing dbspaces
We can use the information obtained above to determine the size of index
dbspace
.
The follwing table
shows the size in K bytes to be used in onspaces command and
the resulting space need in the device.
TPCH scale | 16K Round up |
dbspace K size |
disk size |
---|---|---|---|
100 | 1,000,000 | $$ \frac{1,000,000*16384}{1024} = 16000000 $$ | 15.2 GB |
200 | 2,000,000 | $$ \frac{2,000,000*16384}{1024} = 32000000 $$ | 30 GB |
1000 | 10,000,000 | $$ \frac{10,000,000*16384}{1024} = 160000000 $$ | 152 GB |
2000 | 10,000,000 | $$ \frac{16,000,000*16384}{1024} = 320000000 $$ | 244 GB |
The onspaces command to create a dbspace
for an index
of 16000000 K pages is:
$ onspaces -c -d i_data -k 16 -p /INFORMIXDEV/i_data_01 -o 0 -s 16000000
7 Tunning onconfig
Sample suggestions for onconfig variables.
Variable | 16GB | 32GB |
---|---|---|
SHMTOTAL | 16777216 | 33554432 |
SHMVIRTSIZE | 8388608 | 16777216 |
DS_TOTAL_MEMORY | 2560000 | 5120000 |
DS_NONPDQ_QUERY_MEM | 640000 | 128000 |
DS_MAX_QUERIES | 64 | 64 |
DS_MAX_SCANS | 1048576 | 1048576 |
MAX_PDQPRIORITY | 100 | 100 |
BUFFERPOOL | 4 GB | 8 GB |
DBSPACETEMP
|
4 dbspaces or more on different disks if possible |
And scale up a
8 Table of results
The following table
contains a general overview of the examples incuded in this document for the different
TPCH scales.
TPCH | Rows milions |
Data size GB |
Index size GB |
Temp space GB |
Table pages Estimated Real |
Index pages Estimated Real |
Table load time |
Index build time PDQ=0 |
Index build time PDQ=100 |
---|---|---|---|---|---|---|---|---|---|
100 | 600 | 74 | 12 | 18 | 5,346,679 4,881,838 |
793,461 694,622 |
01:39:15 | 00:43:40 | 00:07:25 |
200 | 1200 | 160 | 24 | 36 | 10,693,359 9,763,204 |
1,586,923 1,571,916 |
03:05:38 | 01:42:09 | 00:15:09 |
1000 | 6000 | ... | ... | 179 | ... | ... | ... | ... | ... |
2000 | 12000 | ... | ... | 356 | ... | ... | ... | ... | ... |
9 Disk considerations
On very large database
systems a good disk performance and architecture is critical.
If fragmentation us used for a large table
or index
it's desirable to know that each
fragment will be placed on different disks.
As general rules to apply if possible:
- Use a specific disk for rootdbs.
- Use a specific disk for physical log.
- Use a specific disk for logical log.
- Use a specific disk for each temporay space.
- Use a specific disk for each large
table
and a different one for it'sindex
or indexes. - If fragmentation is used, try to use a disk foreach fragment of data and a different disk for each fragment of
index
.
9.1 Disk performance
Under Linux, the dd command can be used for simple sequential I/O performance measurements.
For more detailed I/O performance benchmarking, a tool like the Flexible I/O Tester (Fio) can be used.
9.1.1 Measuring Write Performance
For measuring write performance, the data to be written should be read from /dev/zero and ideally written it to an empty RAID array, hard disk or partition (such as using of=/dev/sda for the first hard disk or of=/dev/sda2 for the second partition on the first hard disk). If this is not possible, a normal file in the file system (such as using of=/disk1/test) can be written. For safety reasons we are using test files in the following examples. The write performance achieved thereby will be a little slower (because metadata will also be written to the file system).
Throughput (Streaming I/O)
$ dd if=/dev/zero of=/disk1/test bs=1G count=1 oflag=direct
1+0 registros leídos
1+0 registros escritos
1073741824 bytes (1,1 GB) copied, 4,49933 s, 239 MB/s
Latency
In this test, 512 bytes were written one thousand times
$ dd if=/dev/zero of=/disk1/test bs=512 count=1000 oflag=direct
1+0 registros leídos
1+0 registros escritos
512000 bytes (512 kB) copiados, 0,138732 s, 3,7 MB/s
10 Memory considerations
When using large memory (for example using a memory filesysem for PSORT_TEMP
)
you must ensure to have or assign (in virtualized env) enought physical memory
to the operating system.
In the following assert failure, we can see how database
server receives a signal 9 from OS
with a missconfigured Linux using a large filesystem in memory.
As the memory filesystem grows and OS can't allocate sufficient memory,
it kills the process demanding memory.
In this case, the Informix
database
using PSORT_TEMP
for a large sort operation.
22:18:14 Assert Failed: Unexpected virtual processor termination: pid = 2117, signal 9 received. 22:18:14 IBM Informix Dynamic Server Version 12.10.FC9 22:18:14 Who: Session(2, informix@, 0, (nil)) Thread(8, sm_poll, 0, 1) File: mt.c Line: 14715 22:18:14 stack trace for pid 2116 written to /home/informix/tmp/af.3f0b405 22:18:14 See Also: /home/informix/tmp/af.3f0b405 22:18:15 The Master Daemon Died 22:18:16 PANIC: Attempting to bring system down