Intelligent Filtering of Encompassing Return Sets – yet with a Familiar Feel

Not your average search

Combating the inherent information retrieval problem one search at a time. Finally, a NAV that returns actionable results!

Solving the Greatest Science Challenge

SavantX PRO loves data, but as one might suspect, with this love comes a fairly intense issue:  how does the user navigate the large search return sets? This issue is even more pronounced than first meets the eye, as SavantX PRO not only likes to operate from a very large corpus, but also expands a typical user search query into many ‘hidden’ sub-queries. For instance, in a search against 100 million plus nuclear power plant documents, the search “valve failure” becomes 50+ sub-queries using a data science technique called stemming where each search term is expanded into all related forms of the terms, i.e. “valve” turns into “valve or valves” and “failure” turns into “failure or fail or fails or failing or failed or failures or…”

Through Advanced Analytics, SavantX PRO goes beyond traditional statistics and accomplishes something amazing: returning Different Words, Same Meaning without any training or semantic analysis.

So in our example, SavantX PRO also knows that a “failure” could also be “weep”, “stuck”, “bent”, etc.

Yet further challenging the system, stemming can double a return set.

The good news about stemming is that the user has a very good and complete return set to operate from. The bad news is that stemming exacerbates the already overwhelming problem of data overload especially since SavantX PRO loves data and data sources: it would not be uncommon for the return sets to include thousands of returns. Each return is a ‘passage’ of text, which is about 320 terms in length, and so attempting to read 200 returns is equivalent to reading an average novel! Obviously, users don’t have time to do that.

This was SavantX PRO’s greatest science challenge to be solved.

Since its first use on a Defense Intelligence Agency project some years ago, SavantX PRO has evolved to using four techniques in FlatNAV view:

1. Paging

This is a technique where large text documents are divided into separate pages. For instance, in a nuclear power plant’s Record Management System data base, it is not uncommon to find 12,000-page documents, and many 1000-page documents. These large documents contain (almost) every possible search term, so without paging, would be in every return set. Paging is a text scoping technique, and under the hood, SavantX PRO actually uses four divisions of text scoping: document, page, passage and context window. Scoping reduces the return set and increases relevancy as described next. Paging also allows SavantX PRO to tell the user the page number where their passage of interest resides.

2. Context

Just because a large document contains the word ‘valve’ on the first page, and the word ‘bent’ on the last page, does not mean that the document should be returned under the search ‘valve failure’. SavantX PRO uses the notion of a co-locational text window to determine context probability. This technique assumes that the probability of two terms having a contextual relationship falls off as their term-term separation distance increases.

3. Filtering

SavantX PRO performs all its data science at the time of search, and one thing it does is to build an ’O/E’ term list which the user can use for sub-filtering the return set. O/E means Observed over Expected ratio. The ratio is higher if an uncommon term is Observed in the return set more often than is Expected. This list allows the user to quickly understand the nature of the terms in the returned passages, and sub-select on those terms – all without having to issue another search. Another type of filtering is the Time Chart Spike filter. The user can look at the time chart to quickly identify ‘spikes’ in the corpus; which are those date(s) where a particular issue occurred. And of course, to complete the filtering features are the standard abilities to select by record date and source data base.

4. Sorting

On the SavantX PRO GUI is the ability to sort columns of information, which can be handy to rapidly bubble items of interest to the top of the heap.


HYPERNAV solves the problem of conveying relationship-rich results computed in 6 Dimensions back to our 3 Spatial Dimensions.

A NAV for the Future

Through Advanced Analytics, SavantX PRO goes beyond traditional statistics and accomplishes something amazing: returning Different Words, Same Meaning without any training or semantic analysis.

Superpower You Say?