A very large database, or VLDB
, is a database containing an extremely high number of tuples (database rows),
or occupying an extremely large physical filesystem storage space. The most common definition of a VLDB
is a database
that occupies more than 1 terabyte or that contains several billion rows, although naturally this definition evolves over time.
In some cases, storing data on a single instance may generate a VLDB
, raising issues around the handling of its
backup, replication and disaster recovery.
One solution may be to split databases into multiple instances (a virtual appliance), with each instance
handling the same database schema with data partitioned by key
.
The VLDB
datablade provides tools to implement:
-
BLOB
offloading, with the ability to store new Blob types that put content in local clustered filesystems. -
RBS
(remote blob offloading), with the ability to store new Blob types that put content in Amazon S3. -
Hash
coordinator functions to implement horizontal data partitions using consistent hash algorithms.
1 Before You Begin
Before you begin, you need to:
- Acquire the
VLDB
JAR and its server components. - Configure
Informix
instance for Java support.
1.1 Getting deister VLDB Datablade
Contact Deister to get the VLDB
library. It consists of two jar files
-
deister-informix-vldb.jar, the
VLDB
code that will be deployed on each database we requireVLDB
support. -
deister-informix-vldb-libs.jar the
VLDB
library components that must run inside informix krakatora, once per server.
You can copy both files to $INFORMIXDIR/extend/vldb
to keep it together and have a unique reference.
1.2 Configure Informix for Java
Before starting, you will need to configure an Informix
instance to handle Java UDRs.
Edit onconfig file and set up Krakatoa Informix
Parameters.
Parameter | Value | Comments |
---|---|---|
VPCLASS | jvp,num=2 | Define the Java Virtual Processor number. For example, '2' will allow up to 2 simultaneous threads. |
JVPARGS
|
-Dcom.ibm.tools.attach.enable=no;-Xms128m;-Xmx2048m;-XX:-UseParNewGC | Increase maximum memory from 128Mb to 2 Gb and use aggresive garbage collection. |
JVPLOGFILE | $INFORMIXDIR/tmp/jvp.log | Point to a log file for debug. |
JVPDEBUG | 0 | When set to 1, writes tracing messages to the JVPLOG file. |
JVPPROPFILE | $INFORMIXDIR/extend/krakatoa/.jvpprops | Point jvp profile. This file allows you to set JVM debug properties. |
1.2.1 Adding Java Libraries to a JVP
VLDB
Datablade requires some external libraries to perform Amazon S3 connection. You must set up
these libraries in the JVPCLASSPATH
. Assuming you have placed deister-informix-vldb-libs.jar
under the extend/vldb
directory, edit the onconfig file and review the paramter JVPCLASSPATH JVPCLASSPATH
should look like:
JVPCLASSPATH $INFORMIXDIR/extend/krakatoa/krakatoa.jar:$INFORMIXDIR/extend/krakatoa/jdbc.jar:$INFORMIXDIR/extend/vldb/deister-informix-vldb-libs.jar
1.2.2 Set the correct jre path
Review the symbolic link of jre file at $INFORMIXDIR/extend/krakatoa directory.
The correct path is
jre -> ../../jvm/jre
Execute the following commands to make the correct symbolic link
cd $INFORMIXDIR/extend/krakatoa/ rm jre ln -s ../../jvm/jre .
1.2.3 The .jvpprops file
A JVP properties file contains property settings that control various runtime behaviors of the Java™ virtual processor. The JVPPROPFILE configuration parameter specifies the path to the properties file. When you initialize the database server, the JVP initializes the environment based on the settings in the JVP property file. The .jvpprops.template file in the $INFORMIXDIR/extend/krakatoa directory documents the properties that you can set.
#JVP.trace.settings:JVP=2 #JVP.trace.verbose:1 #JVP.trace.timestampformat:HH:MM #JVP.splitLog:10000 #JVP.monitor.port: 10000
1.2.4 The informix.policy file
You must set up the $INFORMIXDIR/extend/krakatoa/informix.policy
file to allow VLDB
to access some specific Java methods.
grant { // simple non restrictive permission java.security.AllPermission; // Krakatoa defaults permission java.util.PropertyPermission "user.language", "write"; permission java.security.SecurityPermission "getPolicy"; permission java.net.NetPermission "specifyStreamHandler"; permission java.lang.reflect.ReflectPermission "suppressAccessChecks"; // VLDB permission java.lang.RuntimePermission "accessUserInformation"; permission java.lang.RuntimePermission "getStackTrace"; permission java.lang.RuntimePermission "modifyThreadGroup"; // VLDB AWS permission java.lang.RuntimePermission "getenv.AWS_PROFILE"; permission java.lang.RuntimePermission "getenv.AWS_REGION"; permission java.lang.RuntimePermission "getenv.AWS_CONFIG_FILE"; permission java.lang.RuntimePermission "getenv.AWS_CONTAINER_CREDENTIALS_RELATIVE_URI"; permission java.lang.RuntimePermission "getenv.AWS_CREDENTIAL_PROFILES_FILE"; permission java.lang.RuntimePermission "getenv.AWS_EXECUTION_ENV"; // VLDB Apache tika permission java.lang.RuntimePermission "getenv.TIKA_CONFIG"; // VLDB VM permission java.lang.management.ManagementPermission "monitor"; permission javax.management.MBeanServerPermission "findMBeanServer"; };
1.2.5 Restart informix
Now you should stop and start to make changes effective.
1.2.6 Monitor Session Using JVP
You can check JVP are configured by looking for jvp virtual processors
$ onstat -g sch
IBM Informix Dynamic Server Version 12.10.FC12AEE -- On-Line -- Up 00:02:53 -- 26866136 Kbytes
VP Scheduler Statistics:
vp pid class semops busy waits spins/wait bsy lspins
1 25635 cpu 45 49 9781 0
2 25636 adm 0 0 0 0
3 25637 lio 2 0 0 0
4 25638 pio 2 0 0 0
5 25639 aio 474 0 0 0
6 25640 msc 14 0 0 0
7 25641 fifo 2 0 0 0
8 25642 cpu 13772 13816 9986 0
9 25643 cpu 1522 1575 9829 0
10 25644 cpu 568 607 9631 0
11 25645 jvp 1 1 10001 0
12 25646 soc 2 2 10000 0
13 25647 soc 2 2 10000 0
14 25648 soc 2 2 10000 0
15 25649 soc 2 2 10000 0
16 25650 aio 81 0 0 0
17 25651 aio 65 0 0 0
18 25652 aio 23 0 0 0
19 25653 aio 36 0 0 0
20 25654 aio 34 0 0 0
During a large operation using VLDB
routines, you can use the following commands to monitor a session.
-
Determine session ID of the SQL command by using onstat -g sql
Copy
$ onstat -g sql
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 00:05:36 -- 6521380 Kbytes Sess SQL Current Iso Lock SQL ISAM F.E. Id Stmt type Database Lvl Mode ERR ERR Vers Explain 33 sysadmin DR Wait 5 0 0 - Off 32 sysadmin DR Wait 5 0 0 - Off 29 sysadmin DR Wait 5 0 0 - Off 28 sysadmin CR Not Wait 0 0 - Off 5 - test_vsc NL Not Wait 0 0 9.24 Off
-
Check memory on pool STM for session (STM.5.* in our example)
Copy
$ onstat -g mem | grep STM
STM.5.101 V 5a164040 74887168 408 789788 1
-
Check fragment allocations for pool shown before
Copy
$ onstat -g afr STM.5.101
IBM Informix Dynamic Server Version 12.10.FC9 -- On-Line -- Up 00:13:02 -- 6521380 Kbytes Allocations for pool name STM.5.101: addr size memid fileid location 5a181000 3328 overhead 301 mtshpool.c:606 5a181d00 64 SAPI 2814 sapimem.c:1202 5a181d40 64 SAPI 2814 sapimem.c:1202 5a181d80 64 SAPI 2814 sapimem.c:1202 5a181dc0 64 SAPI 2814 sapimem.c:1202 5a181e00 64 SAPI 2814 sapimem.c:1202 5a181e40 64 SAPI 2814 sapimem.c:1202 5a181e80 64 SAPI 2814 sapimem.c:1202 5a181ec0 64 SAPI 2814 sapimem.c:1202 5a181f00 64 SAPI 2814 sapimem.c:1202 5a181f40 192 SAPI 2814 sapimem.c:1202 5a177000 64 SAPI 2814 sapimem.c:1378 5a177040 64 SAPI 2814 sapimem.c:1378 5a177080 64 SAPI 2814 sapimem.c:1378 5a1770c0 64 SAPI 2814 sapimem.c:1378 5a177100 64 SAPI 2814 sapimem.c:1378 5a177140 64 SAPI 2814 sapimem.c:1378 5a177180 64 SAPI 2814 sapimem.c:1378 5a1771c0 64 SAPI 2814 sapimem.c:1378 5a177200 64 SAPI 2814 sapimem.c:1378 5a177240 64 SAPI 2814 sapimem.c:1378 5a177280 64 SAPI 2814 sapimem.c:1378 5a1772c0 176 SAPI 2814 sapimem.c:1202 5a177370 72 SAPI 2814 sapimem.c:1202 5a1773b8 64 SAPI 2814 sapimem.c:1378 5a1773f8 416 SAPI 2814 sapimem.c:1202
2 Deploying VLDB
Once the database is configured and JAR files are installed in the appropiate directories
you are ready to deploy the VLDB
Datablade to your database.
On each database where you want to use Deister VLDB
Datablade, deploy it by executing:
execute procedure install_jar("file:$INFORMIXDIR/extend/vldb/deister-informix-vldb.jar", "deister_vldb", 1);
Possible exception
If you get an exception executing this procedure like:
Can't find $INFORMIXDIR/extend/krakatoa/jre/bin/j9vm/libjvm.so
check the correct path of the jre symbolic link (explained on a previous section).
You can check the log file defined in the JVPLOGFILE variable.
Installing jar s3.informix.deister_vldb... Deploying file: deploy_FSBlob.txt Deploying file: deploy_S3Blob.txt Deploying file: deploy_FSUtils.txt Deploying file: deploy_Digest.txt Deploying file: deploy_VMUtils.txt Deploying file: deploy_S3Utils.txt Registering UDT support functions for type FSBlob... Setting UDT external name as deister_vldb:deister.informix.vldb.store.fs.FSBlob for type fsblob...
2.1 Updating VLDB
To update VLDB
software, you need to upload a new JAR file to the database and relaunch all functions.
Download deister-informix-vldb.jar from the Deister Nexus server using curl and save file to $INFORMIXDIR/extend/vldb,
overwriting existing JAR file.
execute procedure unregisterJUDTfuncs("fsblob"); execute procedure unsetUDTextName("fsblob"); execute procedure unregisterJUDTfuncs("s3blob"); execute procedure unsetUDTextName("s3blob");
The oldest versions of vldb blade didn't drop objects used by the module and this produced that the module could not be unregister. Please remove manually in this case:
-- Required for the unregister drop function fsblob(blob); drop function fsblob(lvarchar); drop function fsblob(clob); drop function fsblob2clob; drop function s3blob2clob; -- Required for the new registration drop cast (fsblob as clob); drop cast (s3blob as clob);
Remove the current jar installed:
execute procedure remove_jar_undeploy("deister_vldb");
Upgrade the JAR files
THIS ACTION IS REQUIRED TO CONTINUE.
Once the current module is deregistered, you must update the JAR files on the corresponding path (extend/vldb)
cd $INFORMIXDIR/extend/vldb mkir OLD mv *.jar OLD cp $PATH/informix-vldb.jar deister-informix-vldb.jar cp $PATH/informix-vldb-libs.jar deister-informix-vldb-libs.jar
THIS ACTION IS REQUIRED TO CONTINUE.
Install the new jar indicating the correct path (can be the same if the new jars are located on the same previous version path):
DO NOT REGISTER THE NEW MODULE UNTIL YOU UPGRADE THE JAR FILES.
execute procedure install_jar("file:$INFORMIXDIR/extend/vldb/deister-informix-vldb.jar", "deister_vldb", 1);
Finally register the module on the database:
execute procedure registerJUDTfuncs("FSBlob"); execute procedure setUDTExtName("fsblob", "deister_vldb:deister.informix.vldb.store.fs.FSBlob");
2.1.1 Known problems
-
Doing the following select of fsblob field
CopySELECT * FROM fstable;
gets an exception
Copy9634: No cast from fsblob to lvarchar
This is caused by the register of module haven't be done.
Copyexecute procedure registerJUDTfuncs("FSBlob"); execute procedure setUDTExtName("fsblob", "deister_vldb:deister.informix.vldb.store.fs.FSBlob");
3 Virtual Machine
VLDB
includes a simple set of virtual machine commands to allow monitoring of VM status.
3.1 vmGC
Force full GC on a VM.
execute procedure vmgc();
3.2 vmThreads
Returns thread execution state and shows thread dumps in JVPLOGFILE.
execute function vmThreads();
1 RUNNABLE main
2 WAITING Reference Handler
3 WAITING Finalizer
4 RUNNABLE Signal Dispatcher
8 RUNNABLE ReaderThread
3.3 vmMemory
Return a description of VM memory pools.
execute function vmmemory();
(expression) Pool Type Used Commit Max % Used
(expression) Java heap HEAP 18 128 1024 1.77
(expression) class storage NON_HEAP 7 7 0 0.00
(expression) JIT code cache NON_HEAP 8 8 0 0.00
(expression) JIT data cache NON_HEAP 0 2 0 0.00
(expression) miscellaneous non- NON_HEAP 0 1 0 0.00
3.4 vmTotalMemory
Returns VM's total memory in bytes.
execute function vmtotalmemory();
134217728
3.5 vmFreeMemory
Returns VM's free memory in bytes.
execute function vmfreememory();
111141208
3.6 vmDebug
Enable or disable debug (verbose) mode to JVPLOGFILE. To enable debug log, set to a value > 0.
execute procedure vmdebug(1);
When you execute operations over fsblob field as:
execute procedure vmdebug(1); select b::blob from fstable where key < 100
you can check de logs in the file defined.
16:22:17.958 [jvp-worker-1] INFO jvp - sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2) - Verbose mode set to true 16:22:17.960 [jvp-worker-1] INFO i.jvp.dbapplet.impl.DBAThreadImpl - Finish runDBWork 16:22:17.960 [jvp-worker-1] INFO i.jvp.dbapplet.impl.DBAThreadImpl - return thread Thread[jvp-worker-1,9,main],state = DESTROY T#1 back to pool 16:22:17.960 [jvp-worker-1] INFO i.jvp.dbapplet.impl.DBAThreadImpl - Thread Thread[jvp-worker-1,9,main],state = IN-POOL T#1 IN POOL WAITS FOR WORK 16:22:17.961 [main] INFO i.j.dbapplet.impl.UDRDBAContextImpl - Acquiring a jvp-worker thread: open context for deister_vldb:deister.informix.vldb.store.fs.FSBlob of thread group java.lang.ThreadGroup[name=DBAPPLET,maxpri=10] 16:22:17.961 [main] INFO i.jvp.dbapplet.impl.DBAThreadImpl - Thread Thread[jvp-worker-2,9,main],state = IN-POOL T#2 IN POOL GETS WORK 16:22:17.961 [main] INFO jvp - PerJavaUDRContext::constructor 16:22:17.961 [jvp-worker-2] INFO i.jvp.dbapplet.impl.DBAThreadImpl - start runDBWork 16:22:17.963 [jvp-worker-2] INFO i.j.dbapplet.impl.PerJavaUDRContext - Initialising JAVA UDR Context 16:22:24.827 [jvp-worker-2] INFO i.jvp.dbapplet.impl.DBAThreadImpl - Finish runDBWork 16:22:24.827 [jvp-worker-2] INFO i.jvp.dbapplet.impl.DBAThreadImpl - return thread Thread[jvp-worker-2,9,main],state = DESTROY T#2 back to pool 16:22:24.827 [jvp-worker-2] INFO i.jvp.dbapplet.impl.DBAThreadImpl - Thread Thread[jvp-worker-2,9,main],state = IN-POOL T#2 IN POOL WAITS FOR WORK
3.7 vmDump
Generate a full thread dump to JVPLOGFILE.
execute procedure vmdump();
4 Blob Offloading
BLOBs
are binary objects that are created once, read many times, never modified, and sometimes deleted.
BLOB
types include photos, videos, documents, XML data, json data and many other structured or unstructured data types.
Storage for BLOBs
must be reliable, cost-effective cloud object storage for large amounts of unstructured data.
One point of contention tends to arise with the topic of BLOB
data: Is it better to integrate BLOBs
within the database
or store them in the file system? Each method has its advantages and disadvantages. If you choose the database route,
storing BLOB
data can dramatically increase the size of your databases.
The common alternative to this technique is storing files outside databases, including in
the database field containing a file path or URL to the object. This separate storage method has some advantages over
integrating BLOB
data within the database. It can be faster, because reading data from the file system involves
a bit less overhead than reading data from a database. And without the BLOBs
, your databases are smaller.
However, a unique naming or storage scheme for the OS files is usually needed to clearly identify the potentially
thousands or even billions of files.
Deister VLDB
Datablade was designed to solve this problem by using FSBlob
and S3Blob
opaque types from the Deister
VLDB
Datablade.
Read the following presentation to get an overview.
4.1 FSBlob
The FSBlob
opaque type lets you store a BLOB
in a filesystem linked to the database instance.
Any regular or replicated filesystem configured with the appropriate size and number
of inodes can be used.
A hierarchical structure of directories will automatically be created as required. Stucture for this tree folder has three levels.
- Root directories with five digits numbering up to 32000 entries.
- Each root directory can have up to 10000 subdirectories named from 00001 to 10000.
- Each subdirectory will store up to 10000
BLOB
files. The naming convention for these files is a sequential number from 00001 to 10000 followed by ".b" representing aBLOB
file.
The maximum number of files is based in the rootdir counter: $$ maxfiles = RootDirs ^ 3 $$
4.1.1 Install FSBlob Support
Define FSBlob Opaque Type
Create the SQL routine and cast definitions for the I/O functions of FSBlob
by
executing registerJUDTfuncs
and define the mapping between a user-defined data type
and a Java™ class by executing setUDTExtName
.
execute procedure registerJUDTfuncs("FSBlob"); execute procedure setUDTExtName("fsblob", "deister_vldb:deister.informix.vldb.store.fs.FSBlob");
These commands register FSBlob
functions in your database and declare a new type of column
called fsblob. To check everything worked properly, this new type should be defined in your database as:
select * from sqlj.udtextnames;
sqlname fsblob
externalname cpe.informix.deister_vldb:deister.informix.vldb.store.fs.FSBlob
Declare FSBlob Storage
During datablade deployment, the table vldb_fsstore
will be created. This table defines
the filesystems where "fsblobs" will be stored.
Field | Type | Description |
---|---|---|
seqno | serial | Automatic sequence number to identify a filessytem. It's associated with each FSBlob
to allow you to point to data on the filesysem. |
bucket | varchar(255) | The "bucket" or filesystem path. |
closed | smallint | Indicates this filesysem is closed so it will not be selected for data placement. |
nfiles | int8 (long) | Indicates the number of files stored. This is just statistical information. |
created | datetime | Timestamp of filesystem declaration. |
To store an FSBlob
, datablade will look for the first non-closed entry in vldb_fsstore
ordered by seqno, and will store files in this directory.
Provisioning Filesystems (Buckets)
Assuming you have a /data1 filesystem available for BLOB
storage, simply do the following:
insert into vldb_fsstore (bucket) values ("/data1"); select * from vldb_fsstore;
seqno 1
bucket /data1
closed 0
nfiles 0
created 2017-06-20 17:01:56
Permissions on Blob storage directories should be 660 and owned by user Informix
and group Informix
.
List Provisioning
fsls
command will show available (not closed) provisioning filesystems by order of precedence.
execute function fsls();
/data1
4.1.2 FSBlob Type
Now you can create a table using the FSBlob
column type. In this example, we will create a simple table
to store a key
and an FSBlob
.
CREATE TABLE fstable(key integer not null, b fsblob);
The FSBlob
type structure requires 30 bytes and has the following structure:
struct FSBlob { int4 bucketID, short path1, short path2, short leaf, byte[16] md5, int4 size }
When data is deleted from a table (or table is dropped), database-linked files in the filesystem will also be removed.
Also, when FSBlob
field is set to null, data is deleted from disk.
4.1.3 Insert an FSBlob
To insert an FSBlob
into a table, you can use Blob constructors (from string or from Blob)
or directly send data to FSBlob
.
-
Inserting a Blob from a longvarchar text requires an
FSBlob
constructor (a string can't automatically be converted toFSBlob
because the built-in deserializer uses the string as Blob representation).Copy-- using constructor insert into fstable values(1, fsblob("This is text from entry 1"));
-
To insert a Blob from a Blob type, you can choose to use
FSBlob
constructor or not (you can also use a Blob type directly).Copy-- using constructor insert into fstable values(2, fsblob(FILETOBLOB("/filesrc/mydocument.pdf", "client"))); -- direct load passing a blob insert into fstable values(2, FILETOBLOB("/filesrc/mydocument.pdf", "client"));
4.1.4 Select an FSBlob
When selecting an FSBlob
, you will set its representation as:
storageId:storage folder:file path:md5:size
select * FROM fstable;
key 1
b 1:/data1:0001/0001/0001:76fb2ba51550cfbe5d19240fce7d43d3:25
key 2
b 1:/data1:0001/0001/0002:973d7111fbcf0492434e6e7d5aa15f48:191857
4.1.5 Delete FSBlob (s)
To delete a Blob:
DELETE FROM fstable WHERE key = 1;
This statement deletes rows from Informix
tables, but also deletes every file in remote filesystems pointed to by the FSBlob
columns
contained in these rows.
$ ls -l /data1/0001/0001
total 192
-rw-rw-r-- 1 informix informix 191857 Aug 31 19:53 0002
4.1.6 Update an FSBlob
You can update an FSBlob
as a regular Blob. When Blob is set to null, Blob content data is removed from storage.
4.1.7 Casting to Blob
An FSBlob
can be cast to a regular database Blob by using an explicit cast:
select key, b::blob, lotofile(b::blob, "/tmp/fsb", "client") from fstable;
key 1
(expression) <SBlob Data>
(expression) /tmp/fsb.00000000598ae7ef
key 2
(expression) <SBlob Data>
(expression) /tmp/fsb.00000000598ae7f0
4.1.8 FSBlob Metadata
You can obtain FSBlob
metadata using functions provided by the datablade. The following
code shows the bucket (filesystem), the filename (its path under filesystem),
the size and the store MD5. It can also use the MD5(FSBlob
) to compute the
MD5 hash from data (must be the same if data has not been modified externally).
select b, FSBucket(b) bucket, FSName(b) name, FSSize(b) size, FSMd5(b) md5a, MD5(b) md5b FROM fstable;
b 1:/data1:0001/0001/0001:76fb2ba51550cfbe5d19240fce7d43d3:25
bucket /data1
name 0001/0001/0001
size 25
md5a 76fb2ba51550cfbe5d19240fce7d43d3
md5b 76fb2ba51550cfbe5d19240fce7d43d3
b 1:/data1:0001/0001/0002:973d7111fbcf0492434e6e7d5aa15f48:191857
bucket /data1
name 0001/0001/0002
size 191857
md5a 973d7111fbcf0492434e6e7d5aa15f48
md5b 973d7111fbcf0492434e6e7d5aa15f48
4.1.9 Unloading and Loading FSBlobs
You can unload a table with FSBlob
using the unload command as usual.
unload to "fstable.unl" SELECT * FROM fstable;
1|1:/data1:0001/0001/0001:76fb2ba51550cfbe5d19240fce7d43d3:25|
2|1:/data1:0001/0001/0002:973d7111fbcf0492434e6e7d5aa15f48:191857|
Warning: Information unloaded for FSBlob
column is the remote file pointer but does not point to file content.
Do not trust the unload command to create a backup of your remote files.
If you want to migrate tables with FSBlobs between different servers, you can choose two different methods:
Saving FSBlob
remote files manually
- Unload table contents: unload to "fstable.unl" SELECT * FROM fstable;
- Save remote storage folder contents (/data1) or rename folder before dropping table.
- Drop table in original server: drops all database rows and linked files in remote filesystem.
- Load table contents in new database or table.
- Restore remote filesystem contents or rename remote filesystem to its original name.
unload to "fstable.unl" SELECT * FROM fstable; $ mv /data1 /data1.bak truncate fstable; load from "fstable.unl" insert into fstable; $ mv /data1.bak /data1
Fully unload database rows amd remote FSBlob
filesystem content
- Unload table contents, selecting all table columns but changing fscolumn with LOToFile function. This will take remote filesystem files and create new file copies in an unload folder.
- Drop table in original server: drops all database rows and linked files in remote filesystem. This is not an issue, because all of these files have been saved by the unload with LOToFile statement.
- Load table contents in new database or table.
unload to "fstable.unl" SELECT key, lotofile(b, "bin_file", "client") FROM fstable; truncate fstable; load from "fstable.unl" insert into fstable;
4.1.10 Copying FSBlob Metadata and Contents
Regular copies of FSBlob
columns copy only metadata pointers to remote files. Do not rely on copied pointers, as
deleting these rows can cause remote files to be erased and pointers in the original table to be orphaned.
select * from fstable into temp tmp_1; delete from tmp_1; select *, lotofile(b::blob, "/tmp/bfs", "client") from fstable;
(38000) - Unknown throwable: (java.io.FileNotFoundException: /data1/0001/0001/00
To copy contents of an FSBlob
and generate new pointers to copied data, you must use the standard
Informix
LOCOPY
function.
select key, locopy(b) b from fstable into temp tmp_1; select * from fstable; select * from tmp_1; delete from tmp_1; select *, lotofile(b::blob, "/tmp/fs", "client") from fstable;
key 1
b 1:/data1:0001/0001/0018:76fb2ba51550cfbe5d19240fce7d43d3:25
key 1
b 1:/data1:0001/0001/0020:76fb2ba51550cfbe5d19240fce7d43d3:25
key 1
b 1:/data1:0001/0001/0018:76fb2ba51550cfbe5d19240fce7d43d3:25
(expression) /tmp/fs.00000000598ae818
LOCOPY
generates a new 0001/0001/0020 file, copying the original 0001/0001/0018 file and
changing pointer into the new table (tmp_1).
When the command deletes rows from temporary table, file 0001/0001/0020 is deleted but the original 0001/0001/0018 remains and is still linked from the original FSTable row.
4.1.11 Migrate from Blob to FSBlob
Migrating from the original BLOB
or CLOB Informix
column to the new FSBlob
storage type is very easy.
As an example, let's look at an original table being migrated to use FSBlobs:
CREATE TABLE ifx_blob( key integer, bl blob ); insert into ifx_blob values(1, FILETOBLOB("/etc/hosts", "client"));
To migrate BLOB
column to FSBlob
, you can add a new FSBlob
column (fs) in the table, transfer
contents from original BLOB
column to new remote filesystem column, and also
drop original column (bl), allowing new FSBlob
column (fs) to be renamed to the original name (bl).
alter table ifx_blob add(fs fsblob); UPDATE ifx_blob SET fs = bl::fsblob WHERE fs is null; alter table ifx_blob drop(bl); rename column ifx_blob.fs to bl;
4.1.12 Integrity
As Blob data is stored outside the database, there is no way to ensure it is not modified or removed
outside database control. It's the responsibility of system administrators to keep storage strictly
coordinated with databases. Even so, FSBlob
provides some utilities to verify integrity.
Data Missing
If files in fileset storage are removed or missing, content returned to database server is null.
select lotofile(b::blob, "/tmp/fs", "client") from fstable;
(expression)
Checking for missing files in repository is faster with the FSISNULL function. This function verifies if a file exists in provisoned storage without reading the Blob.
The following SQL shows that a Blob for key
1 is missing:
select key, b from fstable where FSISNULL(b) AND b IS NOT NULL;
key 1
b 1:/data1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25
Data Consistency
We can inspect the storage where we see the two Blobs added from rows inserted into an Informix
table containing an FSBlob
column.
$ ls -l /data1/0001/0001
total 192
-rw-rw-r-- 1 informix informix 25 Aug 31 19:53 0001
-rw-rw-r-- 1 informix informix 191857 Aug 31 19:53 0002
Now modify the text file by editing it and apply a simple change:
This is text from entry 1
This is text from entry 2
Reselect the Blobs and their MD5.
select key, fsmd5(b), md5(b) from fstable;
key (expression) (expression)
1 76fb2ba51550cfbe5d19240fce7d43d3 c940678bce753fb829b53aa53d8afc13
2 973d7111fbcf0492434e6e7d5aa15f48 973d7111fbcf0492434e6e7d5aa15f48
You can see how the MD5 of key
1 has changed. Its stored value in FSBlob
metadata is 76fb2ba51550cfbe5d19240fce7d43d3
while the computed value is c940678bce753fb829b53aa53d8afc13
Data Verification
From the previous example, we can deduce a way to determine if a Blob has been unsealed
(modified outside database scope): by simply comparing the MD5 stored in FSBlob
opaque type with the computed MD5 to find modified files.
select key from fstable where fsmd5(b) != md5(b);
1
Transaction Handling
FSBlob
file references during rollback cannot be recovered. This means that during an insert, delete or update
operation, the file is affected independently of any rollback event.
4.1.13 Test load
This section shows an example to test the performance inserting a gret number of files inside the table.
Compile the following procedure and change the path with the appropiate name:
CREATE PROCEDURE test_insert_fsblob( num INT ) DEFINE i INTEGER; LET i = 1; WHILE i < num insert into fstable values(0, fsblob(FILETOBLOB("/data_fs/vldb/LOAD/master.pdf", "client"))); LET i = i + 1; END WHILE; END PROCEDURE;
Then execute the procedure:
execute procedure test_insert_fsblob(50000);
You can unload the data of the table and check the file consistency by execution of this shell.
First save the content and modify the WHERE clause and put the appropiate file path:
cat > check_unload.sh
#!/bin/sh echo "unload to 'fstable.unl' SELECT lotofile(b::blob, '/data_fs/vldb/UNLOAD/bfs', 'client') from fstable where key < 2" | dbaccess demo_vldb md5master=`md5sum /data_fs/vldb/LOAD/master.pdf | awk '{print $1}'` for i in $( ls /data_fs/vldb/UNLOAD/bfs*); do echo "Check File: $i" md5file=`md5sum $i | awk '{print $1}'` if [ "$md5master" != "$md5file" ]; then echo "ERROR ON FILE $i MD5 MISMATCH" exit -1 fi done
Execute the shell from the command line:
chmod 754 check_unload.sh ./check_unload.sh
4.2 Log to JVPLOGFILE
The log messages are writen on the JVPLOGFILE. The system automatically sends to the message log file a basic info about the registration of UDT external, the JAR addeds and the working directory. But if you need more information as the executed queries, the paths of FSBlobs and the creation of BLOB instances you can execute the procedure
execute procedure vmdebug(1);
in order to activate the extended log mode as you can see in the text below:
Installing jar s3.informix.deister_vldb... Deploying file: deploy_vldb.txt Registering UDT support functions for type FSBlob... Setting UDT external name as deister_vldb:deister.informix.vldb.store.fs.FSBlob for type fsblob... ADDING TO JARMAP: s3.informix.deister_vldb + informix.jvp.dbapplet.impl.JVPClassLoader@d8ecd5b T#13 2020/01/15 13:14:00.631 JVP#11: deister.informix.vldb.store.fs.FSPath.<init>(FSPath.java:93) - Scanning counter position for /tmp/fs1 T#13 2020/01/15 13:14:00.634 JVP#11: deister.informix.vldb.store.fs.FSPath.<init>(FSPath.java:107) - Counters [00001, 00001, 00000] 0 file(s) Installing jar s3.informix.deister_vldb... Deploying file: deploy_vldb.txt Registering UDT support functions for type FSBlob... Setting UDT external name as deister_vldb:deister.informix.vldb.store.fs.FSBlob for type fsblob... ADDING TO JARMAP: s3.informix.deister_vldb + informix.jvp.dbapplet.impl.JVPClassLoader@a3d54ce T#1 2020/01/15 13:24:47.485 JVP#11: deister.informix.vldb.vm.VMUtils.vmDebug(VMUtils.java:41) - Verbose mode set to true T#3 2020/01/15 13:24:47.488 JVP#11: deister.informix.vldb.util.jdbc.JDBCDirect.executeQuery(JDBCDirect.java:120) - EXECUTE QUERY SELECT * FROM VLDB_FSStore WHERE closed = 0 ORDER BY seqno T#3 2020/01/15 13:24:47.491 JVP#11: deister.informix.vldb.store.fs.FSBlob.<init>(FSBlob.java:74) - Create T#3 2020/01/15 13:24:47.491 JVP#11: deister.informix.vldb.store.fs.FSStore.nextUUID(FSStore.java:85) - Initialized FSPath for /tmp/fs1 T#3 2020/01/15 13:24:47.492 JVP#11: deister.informix.vldb.store.fs.FSPath.<init>(FSPath.java:93) - Scanning counter position for /tmp/fs1 T#3 2020/01/15 13:24:47.493 JVP#11: deister.informix.vldb.util.jdbc.JDBCDirect.executeQuery(JDBCDirect.java:120) - EXECUTE QUERY select count(*) from sysmaster:sysvpprof where class='jvp' T#3 2020/01/15 13:24:47.495 JVP#11: deister.informix.vldb.store.fs.FSPath.<init>(FSPath.java:107) - Counters [00001, 00001, 00000] 0 file(s) T#3 2020/01/15 13:24:47.498 JVP#11: deister.informix.vldb.store.fs.FSBlob.writeSQL(FSBlob.java:327) - 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 - using com.informix.jdbc.IfmxSQLOutStream@555c5809 T#3 2020/01/15 13:24:47.498 JVP#11: deister.informix.vldb.store.fs.FSBlob.<init>(FSBlob.java:61) - Created FSBlob T#3 2020/01/15 13:24:47.498 JVP#11: deister.informix.vldb.store.fs.FSBlob.readSQL(FSBlob.java:174) - SQLInput com.informix.jdbc.IfmxSQLInStream@57756e2d typeName fsblob T#3 2020/01/15 13:24:47.499 JVP#11: deister.informix.vldb.store.fs.FSBlob.readSQL(FSBlob.java:211) - 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 - using com.informix.jdbc.IfmxSQLInStream@57756e2d T#3 2020/01/15 13:24:47.499 JVP#11: deister.informix.vldb.store.fs.FSBlob.assign(FSBlob.java:381) - assign 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 T#3 2020/01/15 13:24:47.499 JVP#11: deister.informix.vldb.store.fs.FSBlob.writeSQL(FSBlob.java:327) - 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 - using com.informix.jdbc.IfmxSQLOutStream@3570f42a T#4 2020/01/15 13:24:47.502 JVP#11: deister.informix.vldb.store.fs.FSBlob.build(FSBlob.java:529) - Build getInputStream com.informix.jdbc.IfxBblob@7ceef038 ENV informix.jvp.dbapplet.impl.UDREnvImpl@c939ba96 T#3 2020/01/15 13:24:47.499 JVP#11: deister.informix.vldb.store.fs.FSBlob.writeSQL(FSBlob.java:327) - 1:/tmp/fs1:00001/00001/00001.b:76fb2ba51550cfbe5d19240fce7d43d3:25 - using com.informix.jdbc.IfmxSQLOutStream@3570f42a T#4 2020/01/15 13:24:47.502 JVP#11: deister.informix.vldb.store.fs.FSBlob.build(FSBlob.java:529) - Build getInputStream com.informix.jdbc.IfxBblob@7ceef038 ENV informix.jvp.dbapplet.impl.UDREnvImpl@c939ba96 T#6 2020/01/15 13:24:47.522 JVP#11: deister.informix.vldb.store.fs.FSBlob.get(FSBlob.java:349) - get 1:/tmp/fs1:00001/00001/00002.b:196571480fd64d2d598c82a4a2d2289a:192214 exists=true K-*????R0?3??r.JM,IM?u?.??(hx:?*8??%?j?r?rP???=?{?L&informix/jvp/impl/JVPFactoryImpl.class??mK?P??w.Ws?=?=g?(?hA???"*(?^n????dM?oUPnstance?{?L META-INF/???{?LMETA-INF/MANIFEST.MF?M??LK-.? ??KdPJ????[????:d?#!!)C@b?=o?3??$LK?a??h?vL~?2???Z]Sk.7?????t??1?????m?P:s? ??E?!z?-??V? @I????9#??F?.??2?????A?*?2l?h?b? .... .... ....
4.3 S3Blob
S3Blob
is an opaque type for RBS
(Remote Blob Storage). By using S3Blob
, a Blob is stored or retrieved directly from an
Amazon S3 bucket.
S3 files are autonamed using 128 bits UUIDs (122 random bits + 4 version bits + 2 bits for Leach-Salz variant).
4.3.1 Prepare S3 Account
Before using the S3 Blob, you must configure Informix
with AWS settings. Under the Informix
home directory,
create a .aws
directory.
AWS Config
The config file should contain your default S3 region. For example:
[default] region = eu-central-1
AWS Credentials
The credentials file should contain your S3 credentials. For example:
[default] aws_access_key_id = BIAIXGCB44N.... aws_secret_access_key = X3Qu7hk3HDtDFjsdhh342lovG......
AWS Client
Install the AWS client commands interface and after installation, test whether you can connect to S3 using your default configuration and credentials. For example, list your bucket by using:
$ aws s3 ls
2017-08-22 19:32:57 s3bucket1
2017-08-19 15:18:56 s3bucket2
4.3.2 Install S3Blob Support
Define S3Blob Opaque Type
Create the SQL routine and cast definitions for the I/O functions of S3Blob
by
executing registerJUDTfuncs
and define the mapping between a user-defined data type
and a Java™ class by executing setUDTExtName
.
execute procedure registerJUDTfuncs("S3Blob"); execute procedure setUDTExtName("S3Blob", "deister_vldb:deister.informix.vldb.store.amazon.s3.S3Blob");
These commands register S3Blob
functions into your database and declare a new type of column
called S3Blob
. To check everything worked properly, this new type should be defined in your database as such:
select * from sqlj.udtextnames;
sqlname S3Blob
externalname s3.informix.deister_vldb:deister.informix.vldb.store.amazon.s3.S3Blob
Declare S3Blob Storage
During datablade launch, vldb_s3store
table has been created. This table defines
filesystems where S3Blobs will be stored.
Field | Type | Description |
---|---|---|
seqno | serial | Automatic sequence number to identify a filessytem. It's associated with each S3Blob
to point to data on the filesystem. |
bucket | varchar(64) | The "bucket" on S3. |
closed | smallint | Indicates this filesystem is closed so it will not be selected for data placement. |
nfiles | int8 (long) | Indicates the number of files stored. This is just statistical information. |
created | datetime | Timestamp of filesystem declaration. |
To store an S3Blob
, datablade will look for the first non-closed entry in vldb_s3store
ordered by seqno, and will store files in this directory.
Provisioning s3 buckets
Assuming you have a /data1 filesystem available for Blob storage, simply do the following:
insert into vldb_s3store (bucket) values ("s3bucket1"); select * from vldb_s3store;
seqno 1
bucket s3bucket1
closed 0
nfiles 0
created 2017-06-20 17:01:56
You can provide more filesystems when needed and change closed state at any time. Upon each modification,
the internal filesystems cache will be reset by calling s3reset
procedure. In this way, changes
will take effect immediatelly.
To start using a new filesystem, insert new filesystem row in vldb_s3store
table and mark the old row as closed.
insert into vldb_s3store (bucket) values ("s3bucket2"); update vldb_s3store set closed = 1 where seqno = 1;
Permissions on Blob storage directories should be 660 and owned by both user Informix
and group Informix
.
List Provisioning
s3ls
command will show available (not closed) provisioning filesystems by order of precedence.
execute function s3ls();
s3bucket1
4.3.3 S3Blob Type
Now you can create a table using the FSBlob
column type. In this example, we will create a simple table to store a key
and an S3Blob
.
CREATE TABLE s3table(key integer not null, b s3blob);
The FSBlob
type structure requires 30 bytes and has the following structure:
struct s3blob { int4 bucketID, byte[16] uuid, byte[16] md5, int4 size }
When data is deleted from a database table (or table is dropped), a command is sent to Amazon S3 to remove associated data.
Also, when an S3Blob
field is set to null, linked data from S3 is deleted.
4.4 Using JDBC
FSBlob
and S3Blob
types can be used with JDBC, using prepared statements by means of the appropiate
type constructor. The following table shows mapping for different data types.
Source type | Parameter | PreaparedStatement |
---|---|---|
String |
FSBlob (?) |
setString |
byte[] |
FSBlob (?) |
setBytes |
Blob |
FSBlob (?) |
setBlob |
InputStream |
FSBlob (?) |
setBinaryStream |
Assuming the example table below:
create table fstable(c1 integer, c2 fsblob);
4.4.1 Inserting Data
private String DATA = "This is a test"; @Test public void insertTypeString() throws SQLException { PreparedStatement st = m_conn.prepareStatement("INSERT INTO fsstable VALUES (?, fsblob(?))"); st.setInt(1, 1); st.setString(2, DATA); st.execute(); st.close(); } @Test public void insertTypeBytes() throws SQLException { PreparedStatement st = m_conn.prepareStatement("INSERT INTO fsstable VALUES (?, fsblob(?))"); st.setInt(1, 1); st.setBytes(2, DATA.getBytes()); st.execute(); st.close(); } @Test public void insertTypeBlob() throws SQLException { PreparedStatement st = m_conn.prepareStatement("INSERT INTO fsstable VALUES (?, fsblob(?))"); st.setInt(1, 1); st.setBlob(2, new myBlob(DATA.getBytes())); st.execute(); st.close(); } @Test public void insertTypeStream() throws SQLException { PreparedStatement st = m_conn.prepareStatement("INSERT INTO fsstable VALUES (?, fsblob(?))"); st.setInt(1, 1); st.setBinaryStream(2, new ByteArrayInputStream(DATA.getBytes())); st.execute(); st.close(); }
4.4.2 Selecting Data
To select an FSBlob
or S3Blob
data type as BLOB
or CLOB, you must use the appropiate CAST in SQL statement.
This allows you to read data as usual with the appropiate ResultSet methods.
Type | Cast | SQL | Java function |
---|---|---|---|
FSBlob , S3Blob
|
blob | SELECT c2::blob FROM ... | getBinaryStream() |
FSBlob , S3Blob
|
clob | SELECT c2::clob FROM ... | getString() |
If you don't use a type cast,
the resulting data will be VLDB
type (FSBlob
or S3Blob
). If you try to select this type, you will get:
java.sql.SQLException: A customized type map is required for this data type.
Setting Type Map
To read the underlying data types, you must set type mapping before using database connection. You must provide the type map by setting the class to load for a given type:
HashMap<String, Class<?>> map = new HashMap<String, Class<?>>(); map.put("fsblob", FSBlob.class); // set type map to connection connection.setTypeMap(map);
Now, let's try to declare a class for the FSBlob
type on the client side (notice that VLDB
Blob types
only work on the server side, as they refer to data that is only accessible from the server).
This class will allow us to decode binary information on Blob struct stored in opaque type.
- The
FSBlob
type should extend to theSQLData
Java interface.Copyimport java.sql.SQLData; import java.sql.SQLException; public class FSBlob implements SQLData { private String m_typeName; /** * MD5 (32 hex = 16 bytes) */ protected final int OBJECT_HASH_LENGTH = 16; @Override public String toString() { return String.format("%s[%s]", getClass().getSimpleName(), m_typeName); } @Override public String getSQLTypeName() throws SQLException { return m_typeName; } @Override public void readSQL(SQLInput stream, String typeName) throws SQLException { m_typeName = typeName; m_bucketSeqno = is.readInt(); m_objectName = new FSUUID(m_bucketName, is.readShort(), is.readShort(), is.readShort()); m_objectHash = new MD5(is.readBytes(OBJECT_HASH_LENGTH)); m_objectSize = is.readInt(); } @Override public void writeSQL(SQLOutput stream) throws SQLException { throw new SQLException("Not supported"); } }
- The
FSUUID
encapsulates a file location in a filesystem.Copyclass FSUUID { public static int PATH_NUMBER_DIGITS = 5; public static String PATH_NUMBER_FORMAT = "%05d"; private String m_bucketName; private short m_counter1; private short m_counter2; private short m_counter3; public FSUUID(String bucketName, short leaf1, short leaf2, short leaf3) { m_bucketName = bucketName; m_counter1 = leaf1; m_counter2 = leaf2; m_counter3 = leaf3; } @Override public String toString() { return String.format("%05d/%05d/%05d.b", m_counter1, m_counter2, m_counter3); } // ... other methods to access files on server side (do not work on client side) }
- The
MD5
encapsulates MD5 (16 bytes) to allow string hexadecimal representation.Copyclass MD5 { private byte[] m_md5; public MD5(byte[] data) { m_md5 = data; } /** * */ @Override public String toString() { return m_md5 != null ? convert(m_md5) : ""; } /** * Convert a byte array into a printable format containing a String of * hexadecimal digit characters (two per byte). * * @param bytes * Byte array representation * * public cause required by function_crypt_digest_sha ... */ public final static String convert(byte bytes[]) { StringBuilder sb = new StringBuilder(bytes.length * 2); for (int i = 0; i < bytes.length; i++) { sb.append(convertDigit(bytes[i] >> 4)); sb.append(convertDigit(bytes[i] & 0x0f)); } return (sb.toString()); } /** * [Private] Convert the specified value (0 .. 15) to the corresponding * hexadecimal digit. * * @param value * Value to be converted */ private final static char convertDigit(int value) { value &= 0x0f; if (value >= 10) return ((char) (value - 10 + 'a')); // lowercase else return ((char) (value + '0')); } }
4.5 Blob Encryption
As Blobs are stored outside the database, VLDB
provides a way to keep them encrypted.
To initiate VLDB
blob encryption, you must start up Informix
JVM with special property
Property | Default value | Description |
---|---|---|
vldb.crypt | false | If true, enables encryption using the default or provided password. |
vldb.salt | 0 | An optional number salt between 0:32768 for random number generator. Changing this value will affect ability to decrypt previous stored values. Default vaule is 0. |
To start default crypt, edit the JVPARGS
property in $ONCONFIG
and add vldb.crypt
JVPARGS -Dcom.ibm.tools.attach.enable=no;-Xms128m;-Xmx1024m;-Dvldb.crypt=true
When data is encrypted, a special header is added to the file. This allows you to start or disable crypt at any time. Encrypted data will still be retrieved correctly, even when crypt is disabled, because its header will determine that it must be decrypted.
The password should never be changed in any situation once crypt process has begun.
5 Hash Coordinator
5.1 Data Placement
VLDB
includes algorithms that control data placement inside a distributed database. These algorithms are responsible for mapping between data items and physical nodes.
Before looking into VLDB
functions for data placement, let's introduce some hash techniques.
5.1.1 Rebalancing
Let us start with a simple protocol that aims to provide outage-free data migration between cluster nodes.
This task arises in situations like cluster expansion (new nodes are added), failover (some node goes done),
or rebalancing (data became unevenly distributed across the nodes). Consider a situation that is depicted
in section (A) of the figure below – there are three nodes and each node contains a portion of data
(we assume a key
-value data model without loss of generality) that is distributed across the nodes according to an arbitrary data placement policy:

If one does not have a database that supports data rebalancing internally, he probably will deploy several instances of the database to each node as it is shown in the section (B) of the figure above. This allows one to perform a manual cluster expansion by turning a separate instance off, copying it to a new node, and turning it on, as it is shown in the section (C).
5.1.2 Sharding and Replication
The next question we have to address is how to map records to physical nodes.
A straightforward approach is to have a table of key
ranges where each range
is assigned to a node or to use procedures like NodeID = hash(key) % TotalNodes
.
However, modulus-based hashing does not explicitly address cluster reconfiguration
because addition or removal of nodes causes complete data reshuffling throughout the cluster.
As a result, it is difficult to handle replication and failover.
There are different ways to enhance the basic approach from the replication and failover perspectives.
The most famous technique is a consistent hashing
.
The following figure depicts the basic ideas of consistent hashing
:

Consistent hashing is basically a mapping schema for key
-value store – it maps keys (hashed keys are typically used) to physical nodes.
A space of hashed keys is an ordered space of binary strings of a fixed length, so it is quite obvious that each range of keys is
assigned to some node as it depicted in the figure (A
) for 3 nodes, namely, A, B, and C. To cope with replication,
it is convenient to close a key
space into a ring and traverse it clockwise until all replicas are mapped,
as it shown in the figure (B
). In other words, item Y should be placed on node B because its key
corresponds to B’s range,
first replica should be placed on C, second replica on A and so on.
The benefit of this schema is efficient addition and removal of a node because it causes data rebalancing only in neighbor sectors.
As it shown in the figures (C
), addition of the node D affects only item X but not Y. Similarly, removal (or failure) of the node B affects
Y and the replica of X, but not X itself. However, as it was pointed in [8], the dark side of this benefit
is vulnerability to overloads – all the burden of rebalancing is handled by neighbors only and makes them to replicate high volumes of data.
This problem can be alleviated by mapping each node not to a one range, but to a set of ranges, as it shown in the figure (D
).
This is a tradeoff – it avoids skew in loads during rebalancing, but keeps the total rebalancing effort reasonably low in comparison with module-based mapping.
Maintenance of a complete and coherent vision of a hashing ring may be problematic in very large deployments.
Although it is not a typical problem for databases because of relatively small clusters,
it is interesting to study how data placement was combined with the network routing in peer-to-peer networks.
A good example is the Chord algorithm
that trades completeness of the ring vision by a single node to efficiency of the query routing.
The Chord algorithm is similar to consistent hashing in the sense that it uses a concept of a ring to map keys to nodes.
However, a particular node maintains only a short list of peers with exponentially growing offset on the logical ring (see the picture below).
This allows one to locate a key
in several network hops using a kind of binary search:

This figure depicts a cluster of 16 nodes and illustrates how node A looks up a key
that is physically located on node D.
Part (A) depicts the route and part (B) depicts partial visions of the ring for nodes A, B, and C.
5.2 Consistent Hashing
VLDB
includes a fast and simple consistent hash
implementation that can be used to generate
horizontal data shards on Informix
databases.
As seen before, consistent hashing
is a special kind of hashing such that when a hash table is resized,
only K/n
keys need to be remapped on average, where K
is the number of keys,
and n
is the number of slots.
By contrast, in most traditional hash tables,
a change in the number of array slots causes nearly all keys to be remapped because the
mapping between the keys and the slots is defined by a modular operation.
5.2.1 Define Nodes
Before using the hash coordinator, you must define the nodes in the group. The table vldb_nodes
contains the
nodes that will be used for constant hashing.
For example, assuming you have 6 nodes n1 to n6.
INSERT INTO vldb_coord_nodes (node) VALUES ("n1"); INSERT INTO vldb_coord_nodes (node) VALUES ("n2"); INSERT INTO vldb_coord_nodes (node) VALUES ("n3"); INSERT INTO vldb_coord_nodes (node) VALUES ("n4"); INSERT INTO vldb_coord_nodes (node) VALUES ("n5"); INSERT INTO vldb_coord_nodes (node) VALUES ("n6");
5.2.2 List Nodes
Once you have nodes, you can list them to see the hash ring
distribution.
execute function vldb_coord_list();
(expression) n2 46094345497948089875079948975171633019
(expression) n3 147163246512606534862005676325853375605
(expression) n5 147893521604165371384293798848531632933
(expression) n6 165390747131543269262950133155247867366
(expression) n1 237880202315960234213502779142634839473
(expression) n4 322100840680493280540581365842339131936
5.2.3 Setting Replication Factor
To provide uniform distributions of nodes, you can use a replication factor. For example, after setting replication factor to 4 (each node will appear 4 times), we can list the ring again. The change is made persistent.
execute procedure vldb_coord_setReplication(4); execute function vldb_coord_list();
(expression) n6 2985069186267451340532497363503766562
(expression) n5 20994376421658530533596250750271820153
(expression) n2 46094345497948089875079948975171633019
(expression) n4 85600711671213689823494371072541489642
(expression) n3 89907211917890554484697069576326533871
(expression) n1 95017565841949300825370391209421804053
(expression) n2 105133766731246238747349254909296008364
(expression) n6 110247696212705834631285407643141124210
(expression) n3 134388743549866618614843288510624197696
(expression) n1 143097155655279886698440465011731480610
(expression) n3 147163246512606534862005676325853375605
(expression) n5 147893521604165371384293798848531632933
(expression) n6 155541610523959574369729082648463457816
(expression) n3 161127503121771633574137585322442050689
(expression) n6 165390747131543269262950133155247867366
(expression) n2 190803495377781704784933122945842263455
(expression) n5 196511581923287593309938899129465690275
(expression) n4 199640007412686395412756985075877484263
(expression) n1 237880202315960234213502779142634839473
(expression) n5 253686919698485015154416500741485145766
(expression) n2 286760289674156152109863627639238632903
(expression) n4 322008981568208832230847066875374440544
(expression) n4 322100840680493280540581365842339131936
(expression) n1 323937041115108658234608508042942466067
5.2.4 Setting Hash Algorithms
Object.hashCode implementations tend to be very fast, but have weak collision prevention and no expectation of bit dispersion. This leaves them perfectly suitable for use in hash tables, because extra collisions cause only a slight performance hit, while poor bit dispersion is easily corrected using a secondary hash function (which all reasonable hash table implementations in Java use).
To improve hash smoothness or large datasets, VLDB
provides a number of stock hash function algorithms.
By default, MD5 is used to provide hash values. You may change it to SHA1 or even SHA-256 to increase distribution smoothness.
execute procedure vldb_coord_setAlgorithm('MD5');
5.2.5 Getting a Node for a Key
Assumming you have set the correct number of nodes and replication factor, you can start using the hash by using an Integer or a String as key
arguments.
In the following example, we can see the hash node for numbers between 1 and 20.
select tabid, vldb_coord_getHash(tabid) from systables;
tabid (expression)
1 n2
2 n2
3 n4
4 n1
5 n4
6 n2
7 n2
8 n2
9 n1
10 n2
11 n3
12 n2
13 n2
14 n1
15 n1
16 n2
17 n6
18 n6
19 n2
20 n1
6 Digest Functions
VLDB
includes a set of MD5 hash functions that may be used on BLOB
, CLOB, FSBlob
and S3Blob
data types.
Function | Return | Description |
---|---|---|
MD5(blob) | char(32) | MD5 on type java.sql.Blob |
MD5(clob) | char(32) | MD5 on type java.sql.Clob |
MD5(FSBlob ) |
char(32) | MD5 on type FSBlob
|
MD5(S3Blob ) |
char(32) | MD5 on type S3Blob
|
7 Text Search
Full text search engines and relational databases each have unique strengths as development tools, but also have overlapping capabilities. Both can provide for storage and update of data and both support search of the data. Full text systems are better for quickly searching high volumes of unstructured text for the presence of any word or combination of words. They provide rich text search capabilities and sophisticated relevancy ranking tools for ordering results based on how well they match a potentially fuzzy search request. Relational databases, on the other hand, excel at storing and manipulating structured data — records of fields of specific types (text, integer, currency, etc.). They can do so with little or no redundancy. They support flexible search of multiple record types for specific values of fields, as well strong tools for quickly and securely updating individual records.
You can use VLDB
text functions in conjunction with BTS
(Basic Text Search Datablade) to create a text search engine on top
of the Informix
database using CLucene text search capabilities.
BTS
is built on top of CLucene. CLucene is a port of the very popular Java Apache Lucene text search engine API.
Most popular search engines like Apache Solr, Lucidworks or Elasticsearch are built using Apache Lucene.
The following functions allow PDF
text extraction from Blob types.
Function | Returns | Description |
---|---|---|
getText(blob) | clob | Extract PDF text store in java.sql.Blob |
getText(FSBlob ) |
clob | Extract PDF text store in FSBlob
|
getText(S3Blob ) |
clob | Extract PDF text store in S3Blob
|
7.1 Install BTS
To install BTS
text search datablade on a database, simply run:
EXECUTE FUNCTION sysbldprepare ('bts.*', 'create');
The BTS
datablade requires a database with logging enabled.
7.2 Create a Sample Table
Let us create a sample table to store a Blob and its text content, which will be used for text search using
a BTS
index. In this sample, file_data
will hold a Blob to store PDF
documents and file_text
will hold the PDF
text.
CREATE TABLE btstable(seqno serial not null, file_name varchar(60), file_data BLOB, file_text CLOB); CREATE INDEX i_btstable ON btstable (file_text bts_clob_ops) USING bts;
7.3 Create a Trigger to Extract Text
You can create a trigger to automatically extract text from PDFs during an insert operation.
CREATE TRIGGER btstable_ins INSERT ON btstable REFERENCING new AS nxt FOR EACH ROW ( EXECUTE FUNCTION getText(nxt.file_data) INTO file_text )
7.4 Insert PDF Files
Now you are ready to insert PDF
files into a sample table. When a file is inserted,
PDF
text extraction will be triggered and it will be placed in the file_text
column.
As the column contains a BTS
index, its content is automatically indexed and it's immediately available for text search.
-- Insert Edgard Allan Poe books from http://www.freeclassicebooks.com/edgar_alan_poe.htm INSERT INTO btstable (file_name, file_data) values ("CloudPortal", FILETOBLOB('pdf/CloudPortal2013.pdf', 'client')); INSERT INTO btstable (file_name, file_data) values ("Eureka", FILETOBLOB('pdf/Eureka.pdf', 'client')); INSERT INTO btstable (file_name, file_data) values ("Le Corbeau", FILETOBLOB('pdf/Le Corbeau.pdf', 'client')); INSERT INTO btstable (file_name, file_data) values ("The Cask of Amontillado", FILETOBLOB('pdf/The Cask of Amontillado.pdf', 'client')); INSERT INTO btstable (file_name, file_data) values ("The Fall of the House of Usher", FILETOBLOB('pdf/The Fall of the House of Usher.pdf', 'client')); INSERT INTO btstable (file_name, file_data) values ("The Masque of the Red Death", FILETOBLOB('pdf/The Masque of the Red Death.pdf', 'client')); INSERT INTO btstable (file_name, file_data) values ("The Raven", FILETOBLOB('pdf/The Raven.pdf', 'client'));
7.5 Perform text search
Once data is loaded, you can perform a text search command.
7.5.1 Simple search
For example, to search for a book containing the word campanella
SELECT file_name FROM btstable WHERE bts_contains(file_text, "campanella");
The Fall of the House of Usher
1 row(s) retrieved.
7.5.2 Boolean Operators Search
Boolean operators combine terms in logical combinations. You can use the operators AND, OR, and NOT, or their equivalent special characters in the bts_contains() search predicate.
To seach books including the words campanella
or masquerade
SELECT file_name FROM btstable WHERE bts_contains(file_text, "campanella OR masquerade");
The Masque of the Red Death
The Fall of the House of Usher
2 row(s) retrieved.
To seach books including the words physical
and mathematical
SELECT file_name FROM btstable WHERE bts_contains(file_text, "physical AND mathematical");
Eureka
1 row(s) retrieved.
7.5.3 Fuzzy Search
To perform a fuzzy search on mystery~
SELECT file_name FROM btstable WHERE bts_contains(file_text, "mystery~");
Eureka
1 row(s) retrieved.
7.5.4 Proximity Search
A proximity search allows you to specify the number of non-search words that can occur between search terms. To perform a proximity search, enclose the search terms within double quotation marks and append a tilde (~) followed by the number of non-search words allowed.
Below we perform a simple search to find the term “heaven”. We see it appears in 4 documents.
SELECT file_name FROM btstable WHERE bts_contains(file_text, "heaven");
The Raven
Le Corbeau
The Fall of the House of Usher
Eureka
4 row(s) retrieved.
Now we refine the query to search for the terms heaven
and bends
within two words of each other within a document.
SELECT file_name FROM btstable WHERE bts_contains(file_text, '"heaven bends"~3');
The Raven
Le Corbeau
2 row(s) retrieved.
7.6 Range Searches
With a range search, you match terms that are within the lower and upper bounds specified by the query. Range searches can be inclusive or exclusive of the upper and lower bounds. Use brackets ([ ]) in the search predicate to specify an inclusive search.
SELECT file_name FROM btstable WHERE bts_contains(file_text, "[arabesque TO arabic]");
The Masque of the Red Death
The Fall of the House of Usher
2 row(s) retrieved.
7.7 Scoring
There is an optional argument for bts_contains() called score
. If
score
is used, the search engine uses this variable to record the
document score
it assigns to each row in the results. The score
value
is a REAL number between 0.0 and 100.0 (inclusive) that indicates the
relevance of the results to the search criteria, compared to that of
other indexed records. The higher the document score
value, the more
closely the results match the criteria.
SELECT file_name, score FROM btstable WHERE bts_contains(file_text, "heaven", score #real);
file_name score
The Raven 4,176476000000
Le Corbeau 2,610297440000
The Fall of the House of Usher 2,584062580000
Eureka 1,305148720000
4 row(s) retrieved.
7.8 Boosting a Term
The score
can be thought of as the relevance of the returned results
to what you are searching for. By default, all terms have equal value
when sorted in the search results. Boosting a term assigns more
relevance to a word or phrase. The search results are the same, but
the specified term appears higher in the results.
To boost a term, use the caret symbol (^) followed by a number for the boost factor after the term that you want to appear more relevant. By default the boost factor is 1. It must be a positive integer, but it can be less than one.
SELECT file_name, score FROM btstable WHERE bts_contains(file_text, "mask or bones", score #real);
file_name score
The Cask of Amontillado 13,54730610000
The Masque of the Red Death 1,941250320000
2 row(s) retrieved.
The word bones
appears 5 times in The Cask of Amontillado
, whereas
mask
appears only 2 times in The Masque of the Red Death
. So
score
reflects the relevance of a word in the search.
To boost the word mask
twice in search:
SELECT file_name, score FROM btstable WHERE bts_contains(file_text, "mask^2 or bones", score #real);
file_name score
The Cask of Amontillado 10,96906570000
The Masque of the Red Death 2,613902570000
2 row(s) retrieved.
7.9 Escaping Special Characters
To escape special characters that are part of a Basic Text Search query syntax, use the backslash (\) as an escape character before the special character.
The following characters are Basic Text Search special characters:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
For example, to search for the phrase (7+1)
, you could use the
following search predicate:
bts_contains(column, "\(7\+1\)")