Recently our team got a requirement to improve an existing application search functionality. The solution should not only overcome current performance issue, but also needs to extend the search capabilities based on predefined business criteria.
Hibernate in combination with Oracle database was the foundation of the existing search feature, which unfortunately was not fast enough when we talk about a dataset consisting of more than 5M+ records. On top of that the search was implemented only on one column which is not flexible enough for serving the business needs.
This tutorial will present a full text search solution across multiple database table columns using in-built Oracle database feature called Oracle Text. It uses standard SQL to index, search, and analyze stored text and documents.
Implementation
Let’s have the database table BUSINESS_ENTITY with the following columns:
- NAME
- PARENT_NAME
- COUNTRY_NAME
- INDUSTRY_GROUP
- ADDRESS
- CITY
We want to provide the user with the ability to find a company while searching on different criteria of the provided entity.
Let’s get our hands dirty.
First execute the following SQL command with SYSTEM user:
GRANT EXECUTE ON CTX_DDL TO <DB USERNAME>;
From my experience working in a bespoke software development company, in order to give the ability of a case-insensitive search we need to create a dedicated BASIC_LEXER. This lexer identifies tokens for creating text indexes for English and all other supported whitespace delimited languages.This will give EXECUTE privilege on CTX_DDL PL/SQL package to the database user of our application. The package creates and manages the preferences, section groups, and stoplists required for text indexes.
Execute the following SQL query:
begin ctx_ddl.create_preference('business_entity_lexer', 'BASIC_LEXER'); ctx_ddl.set_attribute('business_entity_lexer', 'mixed_case', 'NO'); end;
Execute the following SQL query:
Next, we need to create a MULTI_COLUMN_DATASTORE. It’s definition will be consisted of all the column names of our BUSINESS_ENTITY table. This will allow us to search across all the entity columns while maintaining low complexity when generating the search SQL query later.
begin ctx_ddl.create_preference('business_entity_datastore', 'MULTI_COLUMN_DATASTORE'); ctx_ddl.set_attribute('business_entity_datastore', 'columns', 'name, parent_name, country_name, industry_group, address, city'); end;
Finally we need to create the text index. Notice that we are creating index only on one of the table columns, but providing the created MULTI_COLUMN_DATASTORE as well as the BASIC_LEXER will ensure us case-insensitive search across all the BUSINESS_ENTITY table columns.
create index be_name_idx on business_entity (name) indextype is ctxsys.context parameters ('datastore business_entity_datastore lexer business_entity_lexer')
Finally the SQL query which will be eecuted after the user’s input search string has been passed to the back-end from the input form. The idea here is that the input string is split by empty string and percentage symbol is added at the beginning and the end of every word. This will ensure the substring match. The good thing is that we can execute this SQL with provided named parameter, rather than concatenated SQL with user input which is a bad practice leading to SQL injection vulnerability.
SELECT NAME, PARENT_NAME, COUNTRY_NAME, INDUSTRY_GROUP, ADDRESS, CITY FROM BUSINESS_ENTITY WHERE CONTAINS(NAME, ‘%google% OR %ireland%’) > 1
Notice the ability of using boolean operators in concatenation with the split user input string from the search form.
Maintenance
Naturally the question “How we are going to maintain the created text indexes when new records are added/updated in the database table?” comes in every developers mind. I would not get into details here, but my suggestion is to check the “Sync (On Commit) new in Oracle 10g”. Oracle’s proposal here is to take advantage of the sync (on commit) parameter when creating the text index. Another approach is the index synchronization to be executed on predefined time period with sync (Every …) parameter.
What’s the point
Oracle Text is a great feature and you should consider it if your organization needs powerful search capabilities under the following circumstances:
- The application already uses Oracle DB in it’s persistence layer
- Adding new technology is not on the agenda
- You want to avoid the additional DevOps overhead in terms of monitoring, backup and infrastructure of a separate search engine