A very large database, or VLDB, is a database containing an extremely high number of tuples (database rows), or occupying an extremely large physical filesystem storage space. The most common definition of a VLDB is a database that occupies more than 1 terabyte or that contains several billion rows, although naturally this definition evolves over time.

In some cases, storing data on a single instance may generate a VLDB, raising issues around the handling of its backup, replication and disaster recovery.

One solution may be to split databases into multiple instances (a virtual appliance), with each instance handling the same database schema with data partitioned by key.

The VLDB datablade provides tools to implement:

  • BLOB offloading, with the ability to store new Blob types that put content in local clustered filesystems.
  • RBS (remote blob offloading), with the ability to store new Blob types that put content in Amazon S3.
  • Hash coordinator functions to implement horizontal data partitions using consistent hash algorithms.

1 Before You Begin

Before you begin, you need to:

  • Acquire the VLDB JAR and its server components.
  • Configure Informix instance for Java support.

1.1 Getting deister VLDB Datablade

Contact Deister to get the VLDB library. It consists of two jar files

  1. deister-informix-vldb.jar, the VLDB code that will be deployed on each database we require VLDB support.
  2. deister-informix-vldb-libs.jar the VLDB library components that must run inside informix krakatora, once per server.

You can copy both files to $INFORMIXDIR/extend/vldb to keep it together and have a unique reference.

1.2 Configure Informix for Java

Before starting, you will need to configure an Informix instance to handle Java UDRs. Edit onconfig file and set up Krakatoa Informix Parameters.

Parameter Value Comments
VPCLASS jvp,num=2 Define the Java Virtual Processor number. For example, '2' will allow up to 2 simultaneous threads.
JVPARGS -Dcom.ibm.tools.attach.enable=no;-Xms128m;-Xmx2048m;-XX:-UseParNewGC Increase maximum memory from 128Mb to 2 Gb and use aggresive garbage collection.
JVPLOGFILE $INFORMIXDIR/tmp/jvp.log Point to a log file for debug.
JVPDEBUG 0 When set to 1, writes tracing messages to the JVPLOG file.
JVPPROPFILE $INFORMIXDIR/extend/krakatoa/.jvpprops Point jvp profile. This file allows you to set JVM debug properties.

1.2.1 Adding Java Libraries to a JVP

VLDB Datablade requires some external libraries to perform Amazon S3 connection. You must set up these libraries in the JVPCLASSPATH. Assuming you have placed deister-informix-vldb-libs.jar under the extend/vldb directory, edit the onconfig file and review the paramter JVPCLASSPATH JVPCLASSPATH should look like:

Copy
JVPCLASSPATH  $INFORMIXDIR/extend/krakatoa/krakatoa.jar:$INFORMIXDIR/extend/krakatoa/jdbc.jar:$INFORMIXDIR/extend/vldb/deister-informix-vldb-libs.jar

1.2.2 Set the correct jre path

Review the symbolic link of jre file at $INFORMIXDIR/extend/krakatoa directory.

The correct path is

Copy
jre -> ../../jvm/jre

Execute the following commands to make the correct symbolic link

Copy
cd $INFORMIXDIR/extend/krakatoa/
rm jre
ln -s ../../jvm/jre .

1.2.3 The .jvpprops file

A JVP properties file contains property settings that control various runtime behaviors of the Java™ virtual processor. The JVPPROPFILE configuration parameter specifies the path to the properties file. When you initialize the database server, the JVP initializes the environment based on the settings in the JVP property file. The .jvpprops.template file in the $INFORMIXDIR/extend/krakatoa directory documents the properties that you can set.

Copy
#JVP.trace.settings:JVP=2
#JVP.trace.verbose:1
#JVP.trace.timestampformat:HH:MM
#JVP.splitLog:10000
#JVP.monitor.port: 10000

1.2.4 The informix.policy file

You must set up the $INFORMIXDIR/extend/krakatoa/informix.policy file to allow VLDB to access some specific Java methods.

Copy
grant {

    // simple non restrictive
    permission java.security.AllPermission;

    // Krakatoa defaults
    permission java.util.PropertyPermission "user.language", "write";
    permission java.security.SecurityPermission "getPolicy";
    permission java.net.NetPermission "specifyStreamHandler";
    permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
    
    // VLDB
    permission java.lang.RuntimePermission "accessUserInformation";
    permission java.lang.RuntimePermission "getStackTrace";
    permission java.lang.RuntimePermission "modifyThreadGroup";
    // VLDB AWS
    permission java.lang.RuntimePermission "getenv.AWS_PROFILE";
    permission java.lang.RuntimePermission "getenv.AWS_REGION";
    permission java.lang.RuntimePermission "getenv.AWS_CONFIG_FILE";
    permission java.lang.RuntimePermission "getenv.AWS_CONTAINER_CREDENTIALS_RELATIVE_URI";
    permission java.lang.RuntimePermission "getenv.AWS_CREDENTIAL_PROFILES_FILE";
    permission java.lang.RuntimePermission "getenv.AWS_EXECUTION_ENV";
    // VLDB Apache tika
    permission java.lang.RuntimePermission "getenv.TIKA_CONFIG";
    // VLDB VM
    permission java.lang.management.ManagementPermission "monitor";
    permission javax.management.MBeanServerPermission "findMBeanServer";
};

1.2.5 Restart informix

Now you should stop and start to make changes effective.

1.2.6 Monitor Session Using JVP

You can check JVP are configured by looking for jvp virtual processors

Copy
$ onstat -g sch
IBM Informix Dynamic Server Version 12.10.FC12AEE -- On-Line -- Up 00:02:53 -- 26866136 Kbytes

VP Scheduler Statistics:
 vp    pid       class       semops    busy waits  spins/wait  bsy lspins
 1     25635     cpu         45        49          9781        0         
 2     25636     adm         0         0           0           0         
 3     25637     lio         2         0           0           0         
 4     25638     pio         2         0           0           0         
 5     25639     aio         474       0           0           0         
 6     25640     msc         14        0           0           0         
 7     25641     fifo        2         0           0           0         
 8     25642     cpu         13772     13816       9986        0         
 9     25643     cpu         1522      1575        9829        0         
 10    25644     cpu         568       607         9631        0         
 11    25645     jvp         1         1           10001       0         
 12    25646     soc         2         2           10000       0         
 13    25647     soc         2         2           10000       0         
 14    25648     soc         2         2           10000       0         
 15    25649     soc         2         2           10000       0         
 16    25650     aio         81        0           0           0         
 17    25651     aio         65        0           0           0         
 18    25652     aio         23        0           0           0         
 19    25653     aio         36        0           0           0         
 20    25654     aio         34        0           0           0

During a large operation using VLDB routines, you can use the following commands to monitor a session.

  • Determine session ID of the SQL command by using onstat -g sql
    Copy
    $ onstat -g sql
    IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 00:05:36 -- 6521380 Kbytes
    
    
    Sess       SQL            Current            Iso Lock       SQL  ISAM F.E. 
    Id         Stmt type      Database           Lvl Mode       ERR  ERR  Vers  Explain    
    33                        sysadmin           DR  Wait 5     0    0    -     Off        
    32                        sysadmin           DR  Wait 5     0    0    -     Off        
    29                        sysadmin           DR  Wait 5     0    0    -     Off        
    28                        sysadmin           CR  Not Wait   0    0    -     Off        
    5          -              test_vsc           NL  Not Wait   0    0    9.24  Off
  • Check memory on pool STM for session (STM.5.* in our example)
    Copy
    $ onstat -g mem | grep STM
    STM.5.101    V     5a164040         74887168         408              789788     1
  • Check fragment allocations for pool shown before
    Copy
    $ onstat -g afr STM.5.101
    IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 00:13:02 -- 6521380 Kbytes
    
    Allocations for pool name STM.5.101:
    addr             size       memid        fileid location
    5a181000         3328       overhead     301    mtshpool.c:606
    5a181d00         64         SAPI         2814   sapimem.c:1202
    5a181d40         64         SAPI         2814   sapimem.c:1202
    5a181d80         64         SAPI         2814   sapimem.c:1202
    5a181dc0         64         SAPI         2814   sapimem.c:1202
    5a181e00         64         SAPI         2814   sapimem.c:1202
    5a181e40         64         SAPI         2814   sapimem.c:1202
    5a181e80         64         SAPI         2814   sapimem.c:1202
    5a181ec0         64         SAPI         2814   sapimem.c:1202
    5a181f00         64         SAPI         2814   sapimem.c:1202
    5a181f40         192        SAPI         2814   sapimem.c:1202
    5a177000         64         SAPI         2814   sapimem.c:1378
    5a177040         64         SAPI         2814   sapimem.c:1378
    5a177080         64         SAPI         2814   sapimem.c:1378
    5a1770c0         64         SAPI         2814   sapimem.c:1378
    5a177100         64         SAPI         2814   sapimem.c:1378
    5a177140         64         SAPI         2814   sapimem.c:1378
    5a177180         64         SAPI         2814   sapimem.c:1378
    5a1771c0         64         SAPI         2814   sapimem.c:1378
    5a177200         64         SAPI         2814   sapimem.c:1378
    5a177240         64         SAPI         2814   sapimem.c:1378
    5a177280         64         SAPI         2814   sapimem.c:1378
    5a1772c0         176        SAPI         2814   sapimem.c:1202
    5a177370         72         SAPI         2814   sapimem.c:1202
    5a1773b8         64         SAPI         2814   sapimem.c:1378
    5a1773f8         416        SAPI         2814   sapimem.c:1202

2 Deploying VLDB

Once the database is configured and JAR files are installed in the appropiate directories you are ready to deploy the VLDB Datablade to your database.

On each database where you want to use Deister VLDB Datablade, deploy it by executing:

Copy
execute procedure install_jar("file:$INFORMIXDIR/extend/vldb/deister-informix-vldb.jar", "deister_vldb", 1);

Possible exception

If you get an exception executing this procedure like:

Copy
Can't find $INFORMIXDIR/extend/krakatoa/jre/bin/j9vm/libjvm.so

check the correct path of the jre symbolic link (explained on a previous section).

You can check the log file defined in the JVPLOGFILE variable.

Copy
Installing jar s3.informix.deister_vldb...
Deploying file: deploy_FSBlob.txt
Deploying file: deploy_S3Blob.txt
Deploying file: deploy_FSUtils.txt
Deploying file: deploy_Digest.txt
Deploying file: deploy_VMUtils.txt
Deploying file: deploy_S3Utils.txt
Registering UDT support functions for type FSBlob...
Setting UDT external name as deister_vldb:deister.informix.vldb.store.fs.FSBlob for type fsblob...

2.1 Updating VLDB

To update VLDB software, you need to upload a new JAR file to the database and relaunch all functions. Download deister-informix-vldb.jar from the Deister Nexus server using curl and save file to $INFORMIXDIR/extend/vldb, overwriting existing JAR file.

Copy
execute procedure unregisterJUDTfuncs("fsblob");
execute procedure unsetUDTextName("fsblob");

execute procedure unregisterJUDTfuncs("s3blob");
execute procedure unsetUDTextName("s3blob");

The oldest versions of vldb blade didn't drop objects used by the module and this produced that the module could not be unregister. Please remove manually in this case:

Copy
-- Required for the unregister
drop function fsblob(blob);
drop function fsblob(lvarchar);
drop function fsblob(clob);
drop function fsblob2clob;
drop function s3blob2clob;
-- Required for the new registration
drop cast (fsblob as clob);
drop cast (s3blob as clob);

Remove the current jar installed:

Copy
execute procedure remove_jar_undeploy("deister_vldb");

Upgrade the JAR files

THIS ACTION IS REQUIRED TO CONTINUE.

Once the current module is deregistered, you must update the JAR files on the corresponding path (extend/vldb)

Copy
cd $INFORMIXDIR/extend/vldb
mkir OLD
mv *.jar OLD

cp $PATH/informix-vldb.jar deister-informix-vldb.jar
cp $PATH/informix-vldb-libs.jar deister-informix-vldb-libs.jar

THIS ACTION IS REQUIRED TO CONTINUE.

Install the new jar indicating the correct path (can be the same if the new jars are located on the same previous version path):

DO NOT REGISTER THE NEW MODULE UNTIL YOU UPGRADE THE JAR FILES.

Copy
execute procedure install_jar("file:$INFORMIXDIR/extend/vldb/deister-informix-vldb.jar", "deister_vldb", 1);

Finally register the module on the database:

Copy
execute procedure registerJUDTfuncs("FSBlob");
execute procedure setUDTExtName("fsblob", "deister_vldb:deister.informix.vldb.store.fs.FSBlob");

2.1.1 Known problems

  • Doing the following select of fsblob field

    Copy
    SELECT * FROM fstable;

    gets an exception

    Copy
    9634: No cast from fsblob to lvarchar

    This is caused by the register of module haven't be done.

    Copy
    execute procedure registerJUDTfuncs("FSBlob");
    execute procedure setUDTExtName("fsblob", "deister_vldb:deister.informix.vldb.store.fs.FSBlob");

3 Virtual Machine

VLDB includes a simple set of virtual machine commands to allow monitoring of VM status.

3.1 vmGC

Force full GC on a VM.

Copy
execute procedure vmgc();

3.2 vmThreads

Returns thread execution state and shows thread dumps in JVPLOGFILE.

Copy
execute function vmThreads();
1 RUNNABLE         main
2 WAITING          Reference Handler
3 WAITING          Finalizer
4 RUNNABLE         Signal Dispatcher
8 RUNNABLE         ReaderThread

3.3 vmMemory

Return a description of VM memory pools.

Copy
execute function  vmmemory();
(expression)  Pool               Type       Used Commit    Max % Used
(expression)  Java heap          HEAP         18    128   1024   1.77
(expression)  class storage      NON_HEAP      7      7      0   0.00
(expression)  JIT code cache     NON_HEAP      8      8      0   0.00
(expression)  JIT data cache     NON_HEAP      0      2      0   0.00
(expression)  miscellaneous non- NON_HEAP      0      1      0   0.00

3.4 vmTotalMemory

Returns VM's total memory in bytes.

Copy
execute function vmtotalmemory();
134217728

3.5 vmFreeMemory

Returns VM's free memory in bytes.

Copy
execute function vmfreememory();
111141208

3.6 vmDebug

Enable or disable debug (verbose) mode to JVPLOGFILE. To enable debug log, set to a value > 0.

Copy
execute procedure vmdebug(1);

When you execute operations over fsblob field as:

Copy
execute procedure vmdebug(1);
select b::blob
  from fstable
 where key < 100

you can check de logs in the file defined.

Copy
16:22:17.958 [jvp-worker-1] INFO  jvp - sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2) - Verbose mode set to true
16:22:17.960 [jvp-worker-1] INFO  i.jvp.dbapplet.impl.DBAThreadImpl - Finish runDBWork
16:22:17.960 [jvp-worker-1] INFO  i.jvp.dbapplet.impl.DBAThreadImpl - return thread Thread[jvp-worker-1,9,main],state = DESTROY T#1 back to pool
16:22:17.960 [jvp-worker-1] INFO  i.jvp.dbapplet.impl.DBAThreadImpl - Thread Thread[jvp-worker-1,9,main],state = IN-POOL T#1 IN POOL WAITS FOR WORK
16:22:17.961 [main] INFO  i.j.dbapplet.impl.UDRDBAContextImpl - Acquiring a jvp-worker thread: open context for deister_vldb:deister.informix.vldb.store.fs.FSBlob of thread group java.lang.ThreadGroup[name=DBAPPLET,maxpri=10]
16:22:17.961 [main] INFO  i.jvp.dbapplet.impl.DBAThreadImpl - Thread Thread[jvp-worker-2,9,main],state = IN-POOL T#2 IN POOL GETS WORK
16:22:17.961 [main] INFO  jvp - PerJavaUDRContext::constructor
16:22:17.961 [jvp-worker-2] INFO  i.jvp.dbapplet.impl.DBAThreadImpl - start runDBWork
16:22:17.963 [jvp-worker-2] INFO  i.j.dbapplet.impl.PerJavaUDRContext - Initialising JAVA UDR Context
16:22:24.827 [jvp-worker-2] INFO  i.jvp.dbapplet.impl.DBAThreadImpl - Finish runDBWork
16:22:24.827 [jvp-worker-2] INFO  i.jvp.dbapplet.impl.DBAThreadImpl - return thread Thread[jvp-worker-2,9,main],state = DESTROY T#2 back to pool
16:22:24.827 [jvp-worker-2] INFO  i.jvp.dbapplet.impl.DBAThreadImpl - Thread Thread[jvp-worker-2,9,main],state = IN-POOL T#2 IN POOL WAITS FOR WORK

3.7 vmDump

Generate a full thread dump to JVPLOGFILE.

Copy
execute procedure vmdump();

4 Blob Offloading

BLOBs are binary objects that are created once, read many times, never modified, and sometimes deleted. BLOB types include photos, videos, documents, XML data, json data and many other structured or unstructured data types.

Storage for BLOBs must be reliable, cost-effective cloud object storage for large amounts of unstructured data. One point of contention tends to arise with the topic of BLOB data: Is it better to integrate BLOBs within the database or store them in the file system? Each method has its advantages and disadvantages. If you choose the database route, storing BLOB data can dramatically increase the size of your databases.

The common alternative to this technique is storing files outside databases, including in the database field containing a file path or URL to the object. This separate storage method has some advantages over integrating BLOB data within the database. It can be faster, because reading data from the file system involves a bit less overhead than reading data from a database. And without the BLOBs, your databases are smaller. However, a unique naming or storage scheme for the OS files is usually needed to clearly identify the potentially thousands or even billions of files.

Deister VLDB Datablade was designed to solve this problem by using FSBlob and S3Blob opaque types from the Deister VLDB Datablade.

Read the following presentation to get an overview.

4.1 FSBlob

The FSBlob opaque type lets you store a BLOB in a filesystem linked to the database instance. Any regular or replicated filesystem configured with the appropriate size and number of inodes can be used.

A hierarchical structure of directories will automatically be created as required. Stucture for this tree folder has three levels.

  • Root directories with five digits numbering up to 32000 entries.
  • Each root directory can have up to 10000 subdirectories named from 00001 to 10000.
  • Each subdirectory will store up to 10000 BLOB files. The naming convention for these files is a sequential number from 00001 to 10000 followed by ".b" representing a BLOB file.
This structure can support up to 32 billion files, but normally should be limited to a max number of root entries.
The maximum number of files is based in the rootdir counter: $$ maxfiles = RootDirs ^ 3 $$

4.1.1 Install FSBlob Support

Define FSBlob Opaque Type

Create the SQL routine and cast definitions for the I/O functions of FSBlob by executing registerJUDTfuncs and define the mapping between a user-defined data type and a Java™ class by executing setUDTExtName.

Copy
execute procedure registerJUDTfuncs("FSBlob");
execute procedure setUDTExtName("fsblob", "deister_vldb:deister.informix.vldb.store.fs.FSBlob");

These commands register FSBlob functions in your database and declare a new type of column called fsblob. To check everything worked properly, this new type should be defined in your database as:

Copy
select * from sqlj.udtextnames;
sqlname       fsblob
externalname  cpe.informix.deister_vldb:deister.informix.vldb.store.fs.FSBlob

Declare FSBlob Storage

During datablade deployment, the table vldb_fsstore will be created. This table defines the filesystems where "fsblobs" will be stored.

Field Type Description
seqno serial Automatic sequence number to identify a filessytem. It's associated with each FSBlob to allow you to point to data on the filesysem.
bucket varchar(255) The "bucket" or filesystem path.
closed smallint Indicates this filesysem is closed so it will not be selected for data placement.
nfiles int8 (long) Indicates the number of files stored. This is just statistical information.
created datetime Timestamp of filesystem declaration.

To store an FSBlob, datablade will look for the first non-closed entry in vldb_fsstore ordered by seqno, and will store files in this directory.

Provisioning Filesystems (Buckets)

Assuming you have a /data1 filesystem available for BLOB storage, simply do the following:

Copy
insert into vldb_fsstore (bucket) values ("/data1");
select * from vldb_fsstore;
seqno    1
bucket   /data1
closed   0
nfiles   0
created  2017-06-20 17:01:56

Permissions on Blob storage directories should be 660 and owned by user Informix and group Informix.

List Provisioning

fsls command will show available (not closed) provisioning filesystems by order of precedence.

Copy
execute function fsls();
/data1

4.1.2 FSBlob Type

Now you can create a table using the FSBlob column type. In this example, we will create a simple table to store a key and an FSBlob.

Copy
CREATE TABLE fstable(key integer not null, b fsblob);

The FSBlob type structure requires 30 bytes and has the following structure:

Copy
struct FSBlob {
   int4 bucketID,
   short path1,
   short path2,
   short leaf,
   byte[16] md5,
   int4 size
}

When data is deleted from a table (or table is dropped), database-linked files in the filesystem will also be removed. Also, when FSBlob field is set to null, data is deleted from disk.

4.1.3 Insert an FSBlob

To insert an FSBlob into a table, you can use Blob constructors (from string or from Blob) or directly send data to FSBlob.

  • Inserting a Blob from a longvarchar text requires an FSBlob constructor (a string can't automatically be converted to FSBlob because the built-in deserializer uses the string as Blob representation).
    Copy
    -- using constructor
    insert into fstable values(1, fsblob("This is text from entry 1"));
  • To insert a Blob from a Blob type, you can choose to use FSBlob constructor or not (you can also use a Blob type directly).
    Copy
    -- using constructor
    insert into fstable values(2, fsblob(FILETOBLOB("/filesrc/mydocument.pdf", "client")));
    -- direct load passing a blob
    insert into fstable values(2, FILETOBLOB("/filesrc/mydocument.pdf", "client"));

4.1.4 Select an FSBlob

When selecting an FSBlob, you will set its representation as: storageId:storage folder:file path:md5:size

Copy
select * FROM fstable;
key  1
b    1:/data1:0001/0001/0001:76fb2ba51550cfbe5d19240fce7d43d3:25 

key  2
b    1:/data1:0001/0001/0002:973d7111fbcf0492434e6e7d5aa15f48:191857

4.1.5 Delete FSBlob (s)

To delete a Blob:

Copy
DELETE FROM fstable WHERE key = 1;

This statement deletes rows from Informix tables, but also deletes every file in remote filesystems pointed to by the FSBlob columns contained in these rows.

Copy
$ ls -l /data1/0001/0001
total 192
-rw-rw-r-- 1 informix informix 191857 Aug 31 19:53 0002

4.1.6 Update an FSBlob

You can update an FSBlob as a regular Blob. When Blob is set to null, Blob content data is removed from storage.

4.1.7 Casting to Blob

An FSBlob can be cast to a regular database Blob by using an explicit cast:

Copy
select key, b::blob, lotofile(b::blob, "/tmp/fsb", "client") from fstable;
key           1
(expression)  <SBlob Data>
(expression)  /tmp/fsb.00000000598ae7ef

key           2
(expression)  <SBlob Data>
(expression)  /tmp/fsb.00000000598ae7f0

4.1.8 FSBlob Metadata

You can obtain FSBlob metadata using functions provided by the datablade. The following code shows the bucket (filesystem), the filename (its path under filesystem), the size and the store MD5. It can also use the MD5(FSBlob) to compute the MD5 hash from data (must be the same if data has not been modified externally).

Copy
select b, FSBucket(b) bucket, FSName(b) name, FSSize(b) size, FSMd5(b) md5a, MD5(b) md5b FROM fstable;
b       1:/data1:0001/0001/0001:76fb2ba51550cfbe5d19240fce7d43d3:25 
bucket  /data1
name    0001/0001/0001
size    25
md5a    76fb2ba51550cfbe5d19240fce7d43d3
md5b    76fb2ba51550cfbe5d19240fce7d43d3

b       1:/data1:0001/0001/0002:973d7111fbcf0492434e6e7d5aa15f48:191857 
bucket  /data1
name    0001/0001/0002
size    191857
md5a    973d7111fbcf0492434e6e7d5aa15f48
md5b    973d7111fbcf0492434e6e7d5aa15f48

4.1.9 Unloading and Loading FSBlobs

You can unload a table with FSBlob using the unload command as usual.

Copy
unload to "fstable.unl" SELECT * FROM fstable;
1|1:/data1:0001/0001/0001:76fb2ba51550cfbe5d19240fce7d43d3:25|
2|1:/data1:0001/0001/0002:973d7111fbcf0492434e6e7d5aa15f48:191857|

Warning: Information unloaded for FSBlob column is the remote file pointer but does not point to file content. Do not trust the unload command to create a backup of your remote files.

If you want to migrate tables with FSBlobs between different servers, you can choose two different methods:

Saving FSBlob remote files manually

  1. Unload table contents: unload to "fstable.unl" SELECT * FROM fstable;
  2. Save remote storage folder contents (/data1) or rename folder before dropping table.
  3. Drop table in original server: drops all database rows and linked files in remote filesystem.
  4. Load table contents in new database or table.
  5. Restore remote filesystem contents or rename remote filesystem to its original name.
Copy
unload to "fstable.unl" SELECT * FROM fstable;

$ mv /data1 /data1.bak

truncate fstable;
load from "fstable.unl" insert into fstable;

$ mv /data1.bak /data1
Warning: 'truncate FStable' or 'drop table fstable' also erase all files saved in remote filesystem storage. If users have not stored this data with the UNLOAD statement, they need to backup these files in the filesystem before deleting or dropping to avoid losing them.

Fully unload database rows amd remote FSBlob filesystem content

  1. Unload table contents, selecting all table columns but changing fscolumn with LOToFile function. This will take remote filesystem files and create new file copies in an unload folder.
  2. Drop table in original server: drops all database rows and linked files in remote filesystem. This is not an issue, because all of these files have been saved by the unload with LOToFile statement.
  3. Load table contents in new database or table.
Copy
unload to "fstable.unl" SELECT key, lotofile(b, "bin_file", "client") FROM fstable;
truncate fstable;
load from "fstable.unl" insert into fstable;

4.1.10 Copying FSBlob Metadata and Contents

Regular copies of FSBlob columns copy only metadata pointers to remote files. Do not rely on copied pointers, as deleting these rows can cause remote files to be erased and pointers in the original table to be orphaned.

Copy
select * from fstable into temp tmp_1;
delete from tmp_1;
select *, lotofile(b::blob, "/tmp/bfs", "client") from fstable;
(38000) - Unknown throwable: (java.io.FileNotFoundException: /data1/0001/0001/00

To copy contents of an FSBlob and generate new pointers to copied data, you must use the standard Informix LOCOPY function.

Copy
select key, locopy(b) b from fstable into temp tmp_1;
select * from fstable;
select * from tmp_1;
delete from tmp_1;
select *, lotofile(b::blob, "/tmp/fs", "client") from fstable;
key  1
b    1:/data1:0001/0001/0018:76fb2ba51550cfbe5d19240fce7d43d3:25

key  1
b    1:/data1:0001/0001/0020:76fb2ba51550cfbe5d19240fce7d43d3:25

key           1
b             1:/data1:0001/0001/0018:76fb2ba51550cfbe5d19240fce7d43d3:25
(expression)  /tmp/fs.00000000598ae818

LOCOPY generates a new 0001/0001/0020 file, copying the original 0001/0001/0018 file and changing pointer into the new table (tmp_1).

When the command deletes rows from temporary table, file 0001/0001/0020 is deleted but the original 0001/0001/0018 remains and is still linked from the original FSTable row.

4.1.11 Migrate from Blob to FSBlob

Migrating from the original BLOB or CLOB Informix column to the new FSBlob storage type is very easy. As an example, let's look at an original table being migrated to use FSBlobs:

Copy
CREATE TABLE ifx_blob(
    key integer,
    bl  blob
);
insert into ifx_blob values(1, FILETOBLOB("/etc/hosts", "client"));

To migrate BLOB column to FSBlob, you can add a new FSBlob column (fs) in the table, transfer contents from original BLOB column to new remote filesystem column, and also drop original column (bl), allowing new FSBlob column (fs) to be renamed to the original name (bl).

Copy
alter table ifx_blob add(fs fsblob);
UPDATE ifx_blob SET fs = bl::fsblob WHERE fs is null;
alter table ifx_blob drop(bl);
rename column ifx_blob.fs to bl;

4.1.12 Integrity

As Blob data is stored outside the database, there is no way to ensure it is not modified or removed outside database control. It's the responsibility of system administrators to keep storage strictly coordinated with databases. Even so, FSBlob provides some utilities to verify integrity.

Data Missing

If files in fileset storage are removed or missing, content returned to database server is null.

Copy
select lotofile(b::blob, "/tmp/fs", "client") from fstable;
(expression)

Checking for missing files in repository is faster with the FSISNULL function. This function verifies if a file exists in provisoned storage without reading the Blob.

The following SQL shows that a Blob for key 1 is missing:

Copy
select key, b from fstable where FSISNULL(b) AND b IS NOT NULL;
key  1
b    1:/data1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25

Data Consistency

We can inspect the storage where we see the two Blobs added from rows inserted into an Informix table containing an FSBlob column.

Copy
$ ls -l /data1/0001/0001
total 192
-rw-rw-r-- 1 informix informix     25 Aug 31 19:53 0001
-rw-rw-r-- 1 informix informix 191857 Aug 31 19:53 0002

Now modify the text file by editing it and apply a simple change:

Copy
This is text from entry 1
This is text from entry 2

Reselect the Blobs and their MD5.

Copy
select key, fsmd5(b), md5(b) from fstable;
key (expression)                     (expression)                     

          1 76fb2ba51550cfbe5d19240fce7d43d3 c940678bce753fb829b53aa53d8afc13
          2 973d7111fbcf0492434e6e7d5aa15f48 973d7111fbcf0492434e6e7d5aa15f48

You can see how the MD5 of key 1 has changed. Its stored value in FSBlob metadata is 76fb2ba51550cfbe5d19240fce7d43d3 while the computed value is c940678bce753fb829b53aa53d8afc13

Data Verification

From the previous example, we can deduce a way to determine if a Blob has been unsealed (modified outside database scope): by simply comparing the MD5 stored in FSBlob opaque type with the computed MD5 to find modified files.

Copy
select key from fstable where fsmd5(b) != md5(b);
1

Transaction Handling

FSBlob file references during rollback cannot be recovered. This means that during an insert, delete or update operation, the file is affected independently of any rollback event.

4.1.13 Test load

This section shows an example to test the performance inserting a gret number of files inside the table.

Compile the following procedure and change the path with the appropiate name:

Copy
CREATE PROCEDURE test_insert_fsblob( num INT )

   DEFINE i INTEGER;
   LET i = 1;

   WHILE i < num
      insert into fstable values(0, fsblob(FILETOBLOB("/data_fs/vldb/LOAD/master.pdf", "client")));
      LET i = i + 1;
   END WHILE;

END PROCEDURE;

Then execute the procedure:

Copy
execute procedure test_insert_fsblob(50000);

You can unload the data of the table and check the file consistency by execution of this shell.

First save the content and modify the WHERE clause and put the appropiate file path:

Copy
cat > check_unload.sh
Copy
#!/bin/sh

echo "unload to 'fstable.unl' SELECT lotofile(b::blob, '/data_fs/vldb/UNLOAD/bfs', 'client') from fstable where key < 2" | dbaccess demo_vldb

md5master=`md5sum /data_fs/vldb/LOAD/master.pdf | awk '{print $1}'`

for i in $( ls /data_fs/vldb/UNLOAD/bfs*); do
    echo "Check File: $i"
    md5file=`md5sum $i | awk '{print $1}'`

    if [ "$md5master" != "$md5file" ]; then
    	echo "ERROR ON FILE $i MD5 MISMATCH"
        exit -1
    fi
done

Execute the shell from the command line:

Copy
chmod 754 check_unload.sh
./check_unload.sh

4.2 Log to JVPLOGFILE

The log messages are writen on the JVPLOGFILE. The system automatically sends to the message log file a basic info about the registration of UDT external, the JAR addeds and the working directory. But if you need more information as the executed queries, the paths of FSBlobs and the creation of BLOB instances you can execute the procedure

Copy
execute procedure vmdebug(1);

in order to activate the extended log mode as you can see in the text below:

Copy
Installing jar s3.informix.deister_vldb...
Deploying file: deploy_vldb.txt
Registering UDT support functions for type FSBlob...
Setting UDT external name as deister_vldb:deister.informix.vldb.store.fs.FSBlob for type fsblob...
ADDING TO JARMAP: s3.informix.deister_vldb + informix.jvp.dbapplet.impl.JVPClassLoader@d8ecd5b
T#13 2020/01/15 13:14:00.631 JVP#11: deister.informix.vldb.store.fs.FSPath.<init>(FSPath.java:93) - Scanning counter position for /tmp/fs1
T#13 2020/01/15 13:14:00.634 JVP#11: deister.informix.vldb.store.fs.FSPath.<init>(FSPath.java:107) - Counters [00001, 00001, 00000] 0 file(s)
Installing jar s3.informix.deister_vldb...
Deploying file: deploy_vldb.txt
Registering UDT support functions for type FSBlob...
Setting UDT external name as deister_vldb:deister.informix.vldb.store.fs.FSBlob for type fsblob...
ADDING TO JARMAP: s3.informix.deister_vldb + informix.jvp.dbapplet.impl.JVPClassLoader@a3d54ce
T#1 2020/01/15 13:24:47.485 JVP#11: deister.informix.vldb.vm.VMUtils.vmDebug(VMUtils.java:41) - Verbose mode set to true
T#3 2020/01/15 13:24:47.488 JVP#11: deister.informix.vldb.util.jdbc.JDBCDirect.executeQuery(JDBCDirect.java:120) - EXECUTE QUERY SELECT * FROM VLDB_FSStore WHERE closed = 0 ORDER BY seqno
T#3 2020/01/15 13:24:47.491 JVP#11: deister.informix.vldb.store.fs.FSBlob.<init>(FSBlob.java:74) - Create
T#3 2020/01/15 13:24:47.491 JVP#11: deister.informix.vldb.store.fs.FSStore.nextUUID(FSStore.java:85) - Initialized FSPath for /tmp/fs1
T#3 2020/01/15 13:24:47.492 JVP#11: deister.informix.vldb.store.fs.FSPath.<init>(FSPath.java:93) - Scanning counter position for /tmp/fs1
T#3 2020/01/15 13:24:47.493 JVP#11: deister.informix.vldb.util.jdbc.JDBCDirect.executeQuery(JDBCDirect.java:120) - EXECUTE QUERY select count(*) from sysmaster:sysvpprof where class='jvp'
T#3 2020/01/15 13:24:47.495 JVP#11: deister.informix.vldb.store.fs.FSPath.<init>(FSPath.java:107) - Counters [00001, 00001, 00000] 0 file(s)
T#3 2020/01/15 13:24:47.498 JVP#11: deister.informix.vldb.store.fs.FSBlob.writeSQL(FSBlob.java:327) - 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 - using com.informix.jdbc.IfmxSQLOutStream@555c5809
T#3 2020/01/15 13:24:47.498 JVP#11: deister.informix.vldb.store.fs.FSBlob.<init>(FSBlob.java:61) - Created FSBlob
T#3 2020/01/15 13:24:47.498 JVP#11: deister.informix.vldb.store.fs.FSBlob.readSQL(FSBlob.java:174) - SQLInput com.informix.jdbc.IfmxSQLInStream@57756e2d typeName fsblob
T#3 2020/01/15 13:24:47.499 JVP#11: deister.informix.vldb.store.fs.FSBlob.readSQL(FSBlob.java:211) - 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 - using com.informix.jdbc.IfmxSQLInStream@57756e2d
T#3 2020/01/15 13:24:47.499 JVP#11: deister.informix.vldb.store.fs.FSBlob.assign(FSBlob.java:381) - assign 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25
T#3 2020/01/15 13:24:47.499 JVP#11: deister.informix.vldb.store.fs.FSBlob.writeSQL(FSBlob.java:327) - 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 - using com.informix.jdbc.IfmxSQLOutStream@3570f42a
T#4 2020/01/15 13:24:47.502 JVP#11: deister.informix.vldb.store.fs.FSBlob.build(FSBlob.java:529) - Build getInputStream com.informix.jdbc.IfxBblob@7ceef038 ENV informix.jvp.dbapplet.impl.UDREnvImpl@c939ba96
T#3 2020/01/15 13:24:47.499 JVP#11: deister.informix.vldb.store.fs.FSBlob.writeSQL(FSBlob.java:327) - 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 - using com.informix.jdbc.IfmxSQLOutStream@3570f42a
T#4 2020/01/15 13:24:47.502 JVP#11: deister.informix.vldb.store.fs.FSBlob.build(FSBlob.java:529) - Build getInputStream com.informix.jdbc.IfxBblob@7ceef038 ENV informix.jvp.dbapplet.impl.UDREnvImpl@c939ba96
T#6 2020/01/15 13:24:47.522 JVP#11: deister.informix.vldb.store.fs.FSBlob.get(FSBlob.java:349) - get 1:/tmp/fs1:00001/00001/00002.b:196571480fd64d2d598c82a4a2d2289a:192214 exists=true
K-*????R0?3??r.JM,IM?u?.??(hx:?*8??%?j?r?rP???=?{?L&informix/jvp/impl/JVPFactoryImpl.class??mK?P??w.Ws?=?=g?(?hA???"*(?^n????dM?oUPnstance?{?L	META-INF/???{?LMETA-INF/MANIFEST.MF?M??LK-.?
??KdPJ????[????:d?#!!)C@b?=o?3??$LK?a??h?vL~?2???Z]Sk.7?????t??1?????m?P:s? ??E?!z?-??V?
@I????9#??F?.??2?????A?*?2l?h?b?
....
....
....

4.3 S3Blob

S3Blob is an opaque type for RBS (Remote Blob Storage). By using S3Blob, a Blob is stored or retrieved directly from an Amazon S3 bucket.

S3 files are autonamed using 128 bits UUIDs (122 random bits + 4 version bits + 2 bits for Leach-Salz variant).

4.3.1 Prepare S3 Account

Before using the S3 Blob, you must configure Informix with AWS settings. Under the Informix home directory, create a .aws directory.

AWS Config

The config file should contain your default S3 region. For example:

Copy
[default]
region = eu-central-1

AWS Credentials

The credentials file should contain your S3 credentials. For example:

Copy
[default]
aws_access_key_id = BIAIXGCB44N....
aws_secret_access_key = X3Qu7hk3HDtDFjsdhh342lovG......

AWS Client

Install the AWS client commands interface and after installation, test whether you can connect to S3 using your default configuration and credentials. For example, list your bucket by using:

Copy
$ aws s3 ls
2017-08-22 19:32:57 s3bucket1
2017-08-19 15:18:56 s3bucket2

4.3.2 Install S3Blob Support

Define S3Blob Opaque Type

Create the SQL routine and cast definitions for the I/O functions of S3Blob by executing registerJUDTfuncs and define the mapping between a user-defined data type and a Java™ class by executing setUDTExtName.

Copy
execute procedure registerJUDTfuncs("S3Blob");
execute procedure setUDTExtName("S3Blob", "deister_vldb:deister.informix.vldb.store.amazon.s3.S3Blob");

These commands register S3Blob functions into your database and declare a new type of column called S3Blob. To check everything worked properly, this new type should be defined in your database as such:

Copy
select * from sqlj.udtextnames;
sqlname       S3Blob
externalname  s3.informix.deister_vldb:deister.informix.vldb.store.amazon.s3.S3Blob

Declare S3Blob Storage

During datablade launch, vldb_s3store table has been created. This table defines filesystems where S3Blobs will be stored.

Field Type Description
seqno serial Automatic sequence number to identify a filessytem. It's associated with each S3Blob to point to data on the filesystem.
bucket varchar(64) The "bucket" on S3.
closed smallint Indicates this filesystem is closed so it will not be selected for data placement.
nfiles int8 (long) Indicates the number of files stored. This is just statistical information.
created datetime Timestamp of filesystem declaration.

To store an S3Blob, datablade will look for the first non-closed entry in vldb_s3store ordered by seqno, and will store files in this directory.

Provisioning s3 buckets

Assuming you have a /data1 filesystem available for Blob storage, simply do the following:

Copy
insert into vldb_s3store (bucket) values ("s3bucket1");
select * from vldb_s3store;
seqno    1
bucket   s3bucket1
closed   0
nfiles   0
created  2017-06-20 17:01:56

You can provide more filesystems when needed and change closed state at any time. Upon each modification, the internal filesystems cache will be reset by calling s3reset procedure. In this way, changes will take effect immediatelly.

To start using a new filesystem, insert new filesystem row in vldb_s3store table and mark the old row as closed.

Copy
insert into vldb_s3store (bucket) values ("s3bucket2");
update vldb_s3store set closed = 1 where seqno = 1;

Permissions on Blob storage directories should be 660 and owned by both user Informix and group Informix.

List Provisioning

s3ls command will show available (not closed) provisioning filesystems by order of precedence.

Copy
execute function s3ls();
s3bucket1

4.3.3 S3Blob Type

Now you can create a table using the FSBlob column type. In this example, we will create a simple table to store a key and an S3Blob.

Copy
CREATE TABLE s3table(key integer not null, b s3blob);

The FSBlob type structure requires 30 bytes and has the following structure:

Copy
struct s3blob {
   int4 bucketID,
   byte[16] uuid,
   byte[16] md5,
   int4 size
}

When data is deleted from a database table (or table is dropped), a command is sent to Amazon S3 to remove associated data. Also, when an S3Blob field is set to null, linked data from S3 is deleted.

4.4 Using JDBC

FSBlob and S3Blob types can be used with JDBC, using prepared statements by means of the appropiate type constructor. The following table shows mapping for different data types.

Source type Parameter PreaparedStatement
String FSBlob(?) setString
byte[] FSBlob(?) setBytes
Blob FSBlob(?) setBlob
InputStream FSBlob(?) setBinaryStream

Assuming the example table below:

Copy
create table fstable(c1 integer, c2 fsblob);

4.4.1 Inserting Data

Copy
private String DATA = "This is a test";

@Test
public void insertTypeString() 
	throws SQLException
{
	PreparedStatement st = m_conn.prepareStatement("INSERT INTO fsstable  VALUES (?, fsblob(?))");
	st.setInt(1, 1);
	st.setString(2, DATA);
	st.execute();
	st.close();
}

@Test
public void insertTypeBytes() 
	throws SQLException
{
	PreparedStatement st = m_conn.prepareStatement("INSERT INTO fsstable  VALUES (?, fsblob(?))");
	st.setInt(1, 1);
	st.setBytes(2, DATA.getBytes());
	st.execute();
	st.close();
}

@Test
public void insertTypeBlob() 
	throws SQLException
{
	PreparedStatement st = m_conn.prepareStatement("INSERT INTO fsstable  VALUES (?, fsblob(?))");
	st.setInt(1, 1);
	st.setBlob(2, new myBlob(DATA.getBytes()));
	st.execute();
	st.close();
}

@Test
public void insertTypeStream() 
	throws SQLException
{
	PreparedStatement st = m_conn.prepareStatement("INSERT INTO fsstable  VALUES (?, fsblob(?))");
	st.setInt(1, 1);
	st.setBinaryStream(2, new ByteArrayInputStream(DATA.getBytes()));
	st.execute();
	st.close();
}

4.4.2 Selecting Data

To select an FSBlob or S3Blob data type as BLOB or CLOB, you must use the appropiate CAST in SQL statement. This allows you to read data as usual with the appropiate ResultSet methods.

Type Cast SQL Java function
FSBlob, S3Blob blob SELECT c2::blob FROM ... getBinaryStream()
FSBlob, S3Blob clob SELECT c2::clob FROM ... getString()

If you don't use a type cast, the resulting data will be VLDB type (FSBlob or S3Blob). If you try to select this type, you will get:
java.sql.SQLException: A customized type map is required for this data type.

Setting Type Map

To read the underlying data types, you must set type mapping before using database connection. You must provide the type map by setting the class to load for a given type:

Copy
HashMap<String, Class<?>> map = new HashMap<String, Class<?>>();
map.put("fsblob", FSBlob.class);
// set type map to connection
connection.setTypeMap(map);

Now, let's try to declare a class for the FSBlob type on the client side (notice that VLDB Blob types only work on the server side, as they refer to data that is only accessible from the server). This class will allow us to decode binary information on Blob struct stored in opaque type.

  • The FSBlob type should extend to the SQLData Java interface.
    Copy
    import java.sql.SQLData;
    import java.sql.SQLException;
    
    public class FSBlob implements SQLData {
    	private String m_typeName;
    	/**
    	 * MD5 (32 hex = 16 bytes)
    	 */
    	protected final int	OBJECT_HASH_LENGTH  = 16;
    	
    	@Override
    	public String toString()
    	{
    		return String.format("%s[%s]", getClass().getSimpleName(), m_typeName);
    	}
    	
    	@Override
    	public String getSQLTypeName() throws SQLException {
    		return m_typeName;
    	}
    
    	@Override
    	public void readSQL(SQLInput stream, String typeName) throws SQLException {
    		m_typeName = typeName;
    	    m_bucketSeqno = is.readInt();
            m_objectName = new FSUUID(m_bucketName, is.readShort(), is.readShort(), is.readShort());
            m_objectHash = new MD5(is.readBytes(OBJECT_HASH_LENGTH));
            m_objectSize = is.readInt();
    	}
    
    	@Override
    	public void writeSQL(SQLOutput stream) throws SQLException {
    		throw new SQLException("Not supported");
    	}
    }
  • The FSUUID encapsulates a file location in a filesystem.
    Copy
    class FSUUID {
    	public static int		PATH_NUMBER_DIGITS = 5;
    	public static String	PATH_NUMBER_FORMAT = "%05d";
    	
    	private String	m_bucketName;
    	private short	m_counter1;
    	private short	m_counter2;
    	private short	m_counter3;
    
    	public FSUUID(String bucketName, short leaf1, short leaf2, short leaf3)
    	{
    		m_bucketName = bucketName;
    		m_counter1 = leaf1;
    		m_counter2 = leaf2;
    		m_counter3 = leaf3;
    	}
    	
    	@Override
    	public String toString()
    	{
    		return String.format("%05d/%05d/%05d.b", m_counter1, m_counter2, m_counter3);
    	}
    	
    	// ... other methods to access files on server side (do not work on client side)
    }
  • The MD5 encapsulates MD5 (16 bytes) to allow string hexadecimal representation.
    Copy
    class MD5 {
    	private byte[]	m_md5;
    	
    	public MD5(byte[] data)
    	{
    		m_md5 = data;
    	}
    
    	/**
    	 * 
    	 */
    	@Override
    	public String toString()
    	{
    		 return m_md5 != null ? convert(m_md5) : "";
    	}
    
    	/**
    	 * Convert a byte array into a printable format containing a String of
    	 * hexadecimal digit characters (two per byte).
    	 *
    	 * @param bytes
    	 *            Byte array representation
    	 *
    	 *            public cause required by function_crypt_digest_sha ...
    	 */
    	public final static String convert(byte bytes[])
    	{
    		StringBuilder sb = new StringBuilder(bytes.length * 2);
    		for (int i = 0; i < bytes.length; i++) {
    			sb.append(convertDigit(bytes[i] >> 4));
    			sb.append(convertDigit(bytes[i] & 0x0f));
    		}
    		return (sb.toString());
    	}
    
    	/**
    	 * [Private] Convert the specified value (0 .. 15) to the corresponding
    	 * hexadecimal digit.
    	 *
    	 * @param value
    	 *            Value to be converted
    	 */
    	private final static char convertDigit(int value)
    	{
    		value &= 0x0f;
    		if (value >= 10)
    			return ((char) (value - 10 + 'a')); // lowercase
    		else
    			return ((char) (value + '0'));
    	}
    }

4.5 Blob Encryption

As Blobs are stored outside the database, VLDB provides a way to keep them encrypted. To initiate VLDB blob encryption, you must start up Informix JVM with special property

Property Default value Description
vldb.crypt false If true, enables encryption using the default or provided password.
vldb.salt 0 An optional number salt between 0:32768 for random number generator. Changing this value will affect ability to decrypt previous stored values. Default vaule is 0.

To start default crypt, edit the JVPARGS property in $ONCONFIG and add vldb.crypt

Copy
JVPARGS         -Dcom.ibm.tools.attach.enable=no;-Xms128m;-Xmx1024m;-Dvldb.crypt=true

When data is encrypted, a special header is added to the file. This allows you to start or disable crypt at any time. Encrypted data will still be retrieved correctly, even when crypt is disabled, because its header will determine that it must be decrypted.

The password should never be changed in any situation once crypt process has begun.

5 Hash Coordinator

5.1 Data Placement

VLDB includes algorithms that control data placement inside a distributed database. These algorithms are responsible for mapping between data items and physical nodes. Before looking into VLDB functions for data placement, let's introduce some hash techniques.

5.1.1 Rebalancing

Let us start with a simple protocol that aims to provide outage-free data migration between cluster nodes. This task arises in situations like cluster expansion (new nodes are added), failover (some node goes done), or rebalancing (data became unevenly distributed across the nodes). Consider a situation that is depicted in section (A) of the figure below – there are three nodes and each node contains a portion of data (we assume a key-value data model without loss of generality) that is distributed across the nodes according to an arbitrary data placement policy:

If one does not have a database that supports data rebalancing internally, he probably will deploy several instances of the database to each node as it is shown in the section (B) of the figure above. This allows one to perform a manual cluster expansion by turning a separate instance off, copying it to a new node, and turning it on, as it is shown in the section (C).

5.1.2 Sharding and Replication

The next question we have to address is how to map records to physical nodes. A straightforward approach is to have a table of key ranges where each range is assigned to a node or to use procedures like NodeID = hash(key) % TotalNodes. However, modulus-based hashing does not explicitly address cluster reconfiguration because addition or removal of nodes causes complete data reshuffling throughout the cluster. As a result, it is difficult to handle replication and failover.

There are different ways to enhance the basic approach from the replication and failover perspectives. The most famous technique is a consistent hashing. The following figure depicts the basic ideas of consistent hashing:

Consistent hashing is basically a mapping schema for key-value store – it maps keys (hashed keys are typically used) to physical nodes. A space of hashed keys is an ordered space of binary strings of a fixed length, so it is quite obvious that each range of keys is assigned to some node as it depicted in the figure (A) for 3 nodes, namely, A, B, and C. To cope with replication, it is convenient to close a key space into a ring and traverse it clockwise until all replicas are mapped, as it shown in the figure (B). In other words, item Y should be placed on node B because its key corresponds to B’s range, first replica should be placed on C, second replica on A and so on.

The benefit of this schema is efficient addition and removal of a node because it causes data rebalancing only in neighbor sectors. As it shown in the figures (C), addition of the node D affects only item X but not Y. Similarly, removal (or failure) of the node B affects Y and the replica of X, but not X itself. However, as it was pointed in [8], the dark side of this benefit is vulnerability to overloads – all the burden of rebalancing is handled by neighbors only and makes them to replicate high volumes of data. This problem can be alleviated by mapping each node not to a one range, but to a set of ranges, as it shown in the figure (D). This is a tradeoff – it avoids skew in loads during rebalancing, but keeps the total rebalancing effort reasonably low in comparison with module-based mapping.

Maintenance of a complete and coherent vision of a hashing ring may be problematic in very large deployments. Although it is not a typical problem for databases because of relatively small clusters, it is interesting to study how data placement was combined with the network routing in peer-to-peer networks. A good example is the Chord algorithm that trades completeness of the ring vision by a single node to efficiency of the query routing. The Chord algorithm is similar to consistent hashing in the sense that it uses a concept of a ring to map keys to nodes. However, a particular node maintains only a short list of peers with exponentially growing offset on the logical ring (see the picture below). This allows one to locate a key in several network hops using a kind of binary search:

This figure depicts a cluster of 16 nodes and illustrates how node A looks up a key that is physically located on node D. Part (A) depicts the route and part (B) depicts partial visions of the ring for nodes A, B, and C.

5.2 Consistent Hashing

VLDB includes a fast and simple consistent hash implementation that can be used to generate horizontal data shards on Informix databases.

As seen before, consistent hashing is a special kind of hashing such that when a hash table is resized, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. By contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

5.2.1 Define Nodes

Before using the hash coordinator, you must define the nodes in the group. The table vldb_nodes contains the nodes that will be used for constant hashing.

For example, assuming you have 6 nodes n1 to n6.

Copy
INSERT INTO vldb_coord_nodes (node) VALUES ("n1");
INSERT INTO vldb_coord_nodes (node) VALUES ("n2");
INSERT INTO vldb_coord_nodes (node) VALUES ("n3");
INSERT INTO vldb_coord_nodes (node) VALUES ("n4");
INSERT INTO vldb_coord_nodes (node) VALUES ("n5");
INSERT INTO vldb_coord_nodes (node) VALUES ("n6");

5.2.2 List Nodes

Once you have nodes, you can list them to see the hash ring distribution.

Copy
execute function vldb_coord_list();
(expression)  n2                   46094345497948089875079948975171633019 
(expression)  n3                   147163246512606534862005676325853375605 
(expression)  n5                   147893521604165371384293798848531632933 
(expression)  n6                   165390747131543269262950133155247867366 
(expression)  n1                   237880202315960234213502779142634839473 
(expression)  n4                   322100840680493280540581365842339131936

5.2.3 Setting Replication Factor

To provide uniform distributions of nodes, you can use a replication factor. For example, after setting replication factor to 4 (each node will appear 4 times), we can list the ring again. The change is made persistent.

Copy
execute procedure vldb_coord_setReplication(4);
execute function vldb_coord_list();
(expression)  n6                   2985069186267451340532497363503766562 
(expression)  n5                   20994376421658530533596250750271820153 
(expression)  n2                   46094345497948089875079948975171633019 
(expression)  n4                   85600711671213689823494371072541489642 
(expression)  n3                   89907211917890554484697069576326533871 
(expression)  n1                   95017565841949300825370391209421804053 
(expression)  n2                   105133766731246238747349254909296008364 
(expression)  n6                   110247696212705834631285407643141124210 
(expression)  n3                   134388743549866618614843288510624197696 
(expression)  n1                   143097155655279886698440465011731480610 
(expression)  n3                   147163246512606534862005676325853375605 
(expression)  n5                   147893521604165371384293798848531632933 
(expression)  n6                   155541610523959574369729082648463457816 
(expression)  n3                   161127503121771633574137585322442050689 
(expression)  n6                   165390747131543269262950133155247867366 
(expression)  n2                   190803495377781704784933122945842263455 
(expression)  n5                   196511581923287593309938899129465690275 
(expression)  n4                   199640007412686395412756985075877484263 
(expression)  n1                   237880202315960234213502779142634839473 
(expression)  n5                   253686919698485015154416500741485145766 
(expression)  n2                   286760289674156152109863627639238632903 
(expression)  n4                   322008981568208832230847066875374440544 
(expression)  n4                   322100840680493280540581365842339131936 
(expression)  n1                   323937041115108658234608508042942466067

5.2.4 Setting Hash Algorithms

Object.hashCode implementations tend to be very fast, but have weak collision prevention and no expectation of bit dispersion. This leaves them perfectly suitable for use in hash tables, because extra collisions cause only a slight performance hit, while poor bit dispersion is easily corrected using a secondary hash function (which all reasonable hash table implementations in Java use).

To improve hash smoothness or large datasets, VLDB provides a number of stock hash function algorithms.

By default, MD5 is used to provide hash values. You may change it to SHA1 or even SHA-256 to increase distribution smoothness.

Copy
execute procedure vldb_coord_setAlgorithm('MD5');

5.2.5 Getting a Node for a Key

Assumming you have set the correct number of nodes and replication factor, you can start using the hash by using an Integer or a String as key arguments. In the following example, we can see the hash node for numbers between 1 and 20.

Copy
select tabid, vldb_coord_getHash(tabid) from systables;
tabid (expression)
      
          1 n2
          2 n2
          3 n4
          4 n1
          5 n4
          6 n2
          7 n2
          8 n2
          9 n1
         10 n2
         11 n3
         12 n2
         13 n2
         14 n1
         15 n1
         16 n2
         17 n6
         18 n6
         19 n2
         20 n1

6 Digest Functions

VLDB includes a set of MD5 hash functions that may be used on BLOB, CLOB, FSBlob and S3Blob data types.

Function Return Description
MD5(blob) char(32) MD5 on type java.sql.Blob
MD5(clob) char(32) MD5 on type java.sql.Clob
MD5(FSBlob) char(32) MD5 on type FSBlob
MD5(S3Blob) char(32) MD5 on type S3Blob

7 Text Search

Full text search engines and relational databases each have unique strengths as development tools, but also have overlapping capabilities. Both can provide for storage and update of data and both support search of the data. Full text systems are better for quickly searching high volumes of unstructured text for the presence of any word or combination of words. They provide rich text search capabilities and sophisticated relevancy ranking tools for ordering results based on how well they match a potentially fuzzy search request. Relational databases, on the other hand, excel at storing and manipulating structured data — records of fields of specific types (text, integer, currency, etc.). They can do so with little or no redundancy. They support flexible search of multiple record types for specific values of fields, as well strong tools for quickly and securely updating individual records.

You can use VLDB text functions in conjunction with BTS (Basic Text Search Datablade) to create a text search engine on top of the Informix database using CLucene text search capabilities.

BTS is built on top of CLucene. CLucene is a port of the very popular Java Apache Lucene text search engine API. Most popular search engines like Apache Solr, Lucidworks or Elasticsearch are built using Apache Lucene.

The following functions allow PDF text extraction from Blob types.

Function Returns Description
getText(blob) clob Extract PDF text store in java.sql.Blob
getText(FSBlob) clob Extract PDF text store in FSBlob
getText(S3Blob) clob Extract PDF text store in S3Blob

7.1 Install BTS

To install BTS text search datablade on a database, simply run:

Copy
EXECUTE FUNCTION sysbldprepare ('bts.*', 'create');

The BTS datablade requires a database with logging enabled.

7.2 Create a Sample Table

Let us create a sample table to store a Blob and its text content, which will be used for text search using a BTS index. In this sample, file_data will hold a Blob to store PDF documents and file_text will hold the PDF text.

Copy
CREATE TABLE btstable(seqno serial not null, file_name varchar(60), file_data BLOB, file_text CLOB);
CREATE INDEX i_btstable ON btstable (file_text bts_clob_ops) USING bts;

7.3 Create a Trigger to Extract Text

You can create a trigger to automatically extract text from PDFs during an insert operation.

Copy
CREATE TRIGGER btstable_ins
     INSERT ON btstable
   REFERENCING new AS nxt 
  FOR EACH ROW (
      EXECUTE FUNCTION getText(nxt.file_data) INTO file_text
)

7.4 Insert PDF Files

Now you are ready to insert PDF files into a sample table. When a file is inserted, PDF text extraction will be triggered and it will be placed in the file_text column. As the column contains a BTS index, its content is automatically indexed and it's immediately available for text search.

Copy
-- Insert Edgard Allan Poe books from http://www.freeclassicebooks.com/edgar_alan_poe.htm
INSERT INTO btstable (file_name, file_data) values ("CloudPortal",                    FILETOBLOB('pdf/CloudPortal2013.pdf', 'client'));
INSERT INTO btstable (file_name, file_data) values ("Eureka",                         FILETOBLOB('pdf/Eureka.pdf', 'client'));
INSERT INTO btstable (file_name, file_data) values ("Le Corbeau",                     FILETOBLOB('pdf/Le Corbeau.pdf', 'client'));
INSERT INTO btstable (file_name, file_data) values ("The Cask of Amontillado",        FILETOBLOB('pdf/The Cask of Amontillado.pdf', 'client'));
INSERT INTO btstable (file_name, file_data) values ("The Fall of the House of Usher", FILETOBLOB('pdf/The Fall of the House of Usher.pdf', 'client'));
INSERT INTO btstable (file_name, file_data) values ("The Masque of the Red Death",    FILETOBLOB('pdf/The Masque of the Red Death.pdf', 'client'));
INSERT INTO btstable (file_name, file_data) values ("The Raven",                      FILETOBLOB('pdf/The Raven.pdf', 'client'));

7.5 Perform text search

Once data is loaded, you can perform a text search command.

7.5.1 Simple search

For example, to search for a book containing the word campanella

Copy
SELECT file_name 
  FROM btstable
 WHERE bts_contains(file_text, "campanella");
The Fall of the House of Usher                              

1 row(s) retrieved.

7.5.2 Boolean Operators Search

Boolean operators combine terms in logical combinations. You can use the operators AND, OR, and NOT, or their equivalent special characters in the bts_contains() search predicate.

To seach books including the words campanella or masquerade

Copy
SELECT file_name 
  FROM btstable
 WHERE bts_contains(file_text, "campanella OR masquerade");
The Masque of the Red Death                                 
The Fall of the House of Usher                              

2 row(s) retrieved.

To seach books including the words physical and mathematical

Copy
SELECT file_name 
  FROM btstable
 WHERE bts_contains(file_text, "physical AND mathematical");
Eureka

1 row(s) retrieved.

7.5.3 Fuzzy Search

To perform a fuzzy search on mystery~

Copy
SELECT file_name 
  FROM btstable
 WHERE bts_contains(file_text, "mystery~");
Eureka

1 row(s) retrieved.

7.5.4 Proximity Search

A proximity search allows you to specify the number of non-search words that can occur between search terms. To perform a proximity search, enclose the search terms within double quotation marks and append a tilde (~) followed by the number of non-search words allowed.

Below we perform a simple search to find the term “heaven”. We see it appears in 4 documents.

Copy
SELECT file_name 
  FROM btstable
 WHERE bts_contains(file_text, "heaven");
The Raven                                                   
Le Corbeau                                                  
The Fall of the House of Usher                              
Eureka                                                      

4 row(s) retrieved.

Now we refine the query to search for the terms heaven and bends within two words of each other within a document.

Copy
SELECT file_name 
  FROM btstable
 WHERE bts_contains(file_text, '"heaven bends"~3');
The Raven                                                   
Le Corbeau                                                  

2 row(s) retrieved.

7.6 Range Searches

With a range search, you match terms that are within the lower and upper bounds specified by the query. Range searches can be inclusive or exclusive of the upper and lower bounds. Use brackets ([ ]) in the search predicate to specify an inclusive search.

Copy
SELECT file_name 
  FROM btstable
 WHERE bts_contains(file_text, "[arabesque TO arabic]");
The Masque of the Red Death                                 
The Fall of the House of Usher                                                

2 row(s) retrieved.

7.7 Scoring

There is an optional argument for bts_contains() called score. If score is used, the search engine uses this variable to record the document score it assigns to each row in the results. The score value is a REAL number between 0.0 and 100.0 (inclusive) that indicates the relevance of the results to the search criteria, compared to that of other indexed records. The higher the document score value, the more closely the results match the criteria.

Copy
SELECT file_name, score
  FROM btstable
 WHERE bts_contains(file_text, "heaven", score #real);
file_name                                                             score 

The Raven                                                    4,176476000000
Le Corbeau                                                   2,610297440000
The Fall of the House of Usher                               2,584062580000
Eureka                                                       1,305148720000

4 row(s) retrieved.

7.8 Boosting a Term

The score can be thought of as the relevance of the returned results to what you are searching for. By default, all terms have equal value when sorted in the search results. Boosting a term assigns more relevance to a word or phrase. The search results are the same, but the specified term appears higher in the results.

To boost a term, use the caret symbol (^) followed by a number for the boost factor after the term that you want to appear more relevant. By default the boost factor is 1. It must be a positive integer, but it can be less than one.

Copy
SELECT file_name, score
  FROM btstable
 WHERE bts_contains(file_text,  "mask or bones", score #real);
file_name                                                             score 

The Cask of Amontillado                                      13,54730610000
The Masque of the Red Death                                  1,941250320000

2 row(s) retrieved.

The word bones appears 5 times in The Cask of Amontillado, whereas mask appears only 2 times in The Masque of the Red Death. So score reflects the relevance of a word in the search.

To boost the word mask twice in search:

Copy
SELECT file_name, score
  FROM btstable
 WHERE bts_contains(file_text,  "mask^2 or bones", score #real);
file_name                                                             score 

The Cask of Amontillado                                      10,96906570000
The Masque of the Red Death                                  2,613902570000

2 row(s) retrieved.

7.9 Escaping Special Characters

To escape special characters that are part of a Basic Text Search query syntax, use the backslash (\) as an escape character before the special character.

The following characters are Basic Text Search special characters:

Copy
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

For example, to search for the phrase (7+1), you could use the following search predicate:

Copy
bts_contains(column, "\(7\+1\)")

7.10 Stop Words

https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.dbext.doc/ids_dbxt_041.htm