The following section explains how to configure the BTS (Basic text search) contextual search module on an informix database / instance. BTS is a full text index based on CLucene. It takes the input text and tokenized the words in the text. The predicates search like term, phase, wildcard, proximity and fuzzy all work on the words in the text that are index and not the text value as a whole.

CLucene is a high-performance, scalable, cross platform, full-featured, open-source indexing and searching API. CLucene is a port of the very popular Java Apache Lucene text search engine API. Being written in pure cross-platform C++ code, and utilizing the flexible CMake build system, CLucene can virtually be used for any purpose, on any machine.

Elasticsearch, a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents are built on top of Apache Lucene. Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene

1 Installation

You can read the full IBM BTS documentation at Extensibility - Basic Text Search (BTS)

1.1 Register the blade

From version 12.10 of IDS onwards, the BTS module is automatically integrated into databases and it does not require further registration. To setup the BTS on a database, simply connect to q database where you require BTS and execute:

Copy
EXECUTE FUNCTION sysbldprepare ('bts.*', 'create')

For versions 11.70 of IDS, you must register the datablade BTS in each database.

Copy
$ blademgr
dbsrv1>show modules

bts.3.00

dbsrv1>register bts.3.00 demo_sports

To install BTS you require a database with log or buffered log.

1.2 Configure a sbspace

Create a sbspace to store the index created by the browser. The lists of synonims (or lists of "stopwords") are also stored in sbspaces.

For optimal performance and to facilitate the management of information it is advisable to use specific sbspaces and dbspaces for this purpose.

Copy
onspaces -c -S s_sbbts -p /Dbspace/s_sbbts_1 -o 0 -s 100000 -Df "LOGGING=ON"

1.3 Set processors

In order to be able to execute BTS queries, the configuration of specific BTS virtual processors is required. Each virtual processor executes one query at a time, so the number of virtual processors determines the number of concurrent BTS queries that can be performed.

Incorporate the onconfig parameter:

Copy
VPCLASS bts,num=5

Command "onmode -wf" cannot be used for "VPCLASS" values. If restarting the IDS is not possible, the processor can be added by:

Copy
onmode -p 5 bts

You can check if BTS processor are configured by executing onstat -g glo and looking for bts processors.

Copy
$ onstat -g glo | grep bts
bts         1         5210.06   2192.99   7403.05 
 13    8351      bts         5210.06   2192.99   7403.05   7403.05  100%

2 Using BTS Searches

To use the contextual search on a column of a table, you must create a BTS index that activates its use.

When you create a bts index, you specify the type of operation defined corresponding to the type of the column that is indexed. An operation class is a set of functions that the database uses to access the extended information generated by BTS.

Each of the column types that support the use of BTS has its own type of operation:

Copy
BLOB 	        bts_blob_ops
CHAR 	        bts_char_ops
CLOB 	        bts_clob_ops
LVARCHAR 	bts_lvarchar_ops
NCHAR 	        bts_nchar_ops
NVARCHAR 	bts_nvarchar_ops
VARCHAR 	bts_varchar_ops

To create a BTS type index, use the following syntax:

Copy
CREATE INDEX i_bts_ctercero_1 ON ctercero (nombre  bts_char_ops) USING bts IN s_sbstd;
CREATE INDEX i_bts_garticul_1 ON garticul (nomart  bts_varchar_ops) USING bts IN s_sbstd;

2.1 Search by BTS

To use the BTS engine on a column, a special SQL syntax based on the bts_contains function must be run:

Copy
SELECT codigo, nomart[1,40] FROM garticul where bts_contains(nomart, 'irv')

codigo           nomart
300090           MONO QUECHUA MORZINE IRV
278424           MONO QUECHUA MORZINE MUJER IRV
277751           CHAQUETA QUECHUA MORZINE L CD I.R.V.

You can also use the virtual score column, to sort the results according to the score that the BTS engine returns to the search result.

Copy
SELECT score, codigo, nomart[1,40] FROM garticul
   WHERE bts_contains(nomart, 'irv QUECHUA Mochilla MUJER',  score # REAL)
     AND score > 50
ORDER BY score

The Boolean operators SQL AND, OR, and NOT can not be used between bts_contains()search operations. For example, the expression, bts_contains (column, 'word1') AND bts_contains (column, 'word2') is not supported. However, the expression bts_contains (column, 'word1 AND word2') is correct.

3 Advanced configuration

3.1 Stopwords

To prevent a list of words called stopwords from being indexed by the engine, you can associate a list of words to each non-indexable index.

Copy
CREATE INDEX i_bts_garticul_1 ON garticul (nomart  bts_varchar_ops) USING bts(stopwords="(el,la,un,una,lo,le,a,al)") IN s_sbstd;

o

CREATE INDEX i_bts_garticul_1 ON garticul (nomart  bts_varchar_ops) USING bts(stopwords="file:/docs/stopwords.txt") IN s_sbstd;

3.2 Parsers

BTS is very configurable and allows to use various types of parsers. The Snowball parser indexes the different derivatives of the words. The Soundex parser indexes the words by "sound"

Copy
CREATE INDEX i_bts_garticul_1 ON garticul (nomart  bts_varchar_ops) USING bts(analyzer="(soundex)", stopwords="(el,la,un,una,lo,le,a,al)") IN s_sbstd;

SELECT score, codigo, nomart[1,40] FROM garticul
where bts_contains(nomart, 'savana',  score # REAL)
order by score

         score codigo           nomart

85,71427920000 024993           SÁBANA QUECHUA (190 X 78 CM)
99,99999240000 211804           SABANA QUECHUA TIPO MOMIA POLAR

While the same SELECT executed with a snowball type index does not return any results:

Copy
CREATE INDEX i_bts_garticul_1 ON garticul (nomart  bts_varchar_ops) USING bts(analyzer="(snowball)", stopwords="(el,la,un,una,lo,le,a,al)") IN s_sbstd;

SELECT score, codigo, nomart[1,40] FROM garticul
where bts_contains(nomart, 'savana',  score # REAL)
order by score

         score codigo           nomart

3.3 Synonyms

To create a list of synonyms, you must create a table that will contain the list of synonymous words:

Copy
CREATE TABLE mis_sinonimos(synonyms lvarchar);
INSERT INTO bts_sinonimos(synonyms) VALUES('elizabeth liz beth eliza leisal betty liza');
INSERT INTO bts_sinonimos(synonyms) VALUES('mark marc marcus marco');

CREATE INDEX i_bts_mis_sinonimos ON mis_sinonimos(synonyms bts_lvarchar_ops) USING bts(thesaurus="yes") IN s_sbstd;

To use the list of synonyms, you can indicate it when defining the BTS index:

Copy
CREATE INDEX i_bts_garticul_1 ON garticul (nomart  bts_varchar_ops) USING bts(thesaurus_index="i_bts_mis_sinonimos") IN s_sbstd;

You can update the list of synonyms without rebuilding the BTS index simply by updating the table containing the definition of synonyms.

4 Sysmaster

To know the columns has a BTS type index, you can explore sysindices and verify that the "access mode" of the index (amid) is BTS type: [100]

Copy
select tabid, idxname, cast(sysindices.indexkeys as lvarchar) from sysindices where amid = 100;

idxname           i_bts_mytable_1
owner             informix
tabid             30737
idxtype           D
clustered
levels            1
leaves            1,000000000000
nunique           1,000000000000
clust             100,0000000000
nrows             36409,00000000
indexkeys         4 [102]
amid              100
amparam
collation         en_US.57372
pagesize          2048
nhashcols         0
nbuckets
ustlowts          2014-06-05 11:29:35.00000
ustbuildduration    0:00:00.00000
nupdates          0,00
ndeletes          0,00
ninserts          0,00
fextsize          0
nextsize          0
indexattr         0