Smart Seek

The Smart Seek feature assists in capturing data from documents that are unstructured or semi-unstructured. Smart seek uses regular expressions to search within the zone and find text that meets the search criteria.

The multi-page option tells Umango to continue searching all pages in the document when a value is not found on the zone's page. 

To use Smart Seek, create a zone (choose to make this region large or even an entire page) and then enter the regular expression (either an extraction format and/or a highlight format) that will capture the required text.

Extract

The Extract option is the most common smart seek option and is concerned only with capturing text. Once selected, an extraction format is required.

Ask AI

The Ask AI option is a very flexible and capable method of capturing text from within a zone. Once selected, a capture request is required. The capture request describes the data you would like the AI engine to find in the zone text or how you would like it to use the zone text to arrive at a result.

Highlight

The Highlight option is used to draw attention to a word or phase expected on the document. Once selected a highlight format is required. When used in conjunction with an OCR zone type, the Extract the zone value from near the highlighted text option can be enabled to capture text also.

Extraction or Highlight Format Structure

Whether extracting or highlighting using Smart Seek, the user needs to enter the format of the data to be extracted. This is done using regular expressions and Umango's Regex builder.

Simple Formatting

If you are not confident with regular expressions, do not want to take the time to learn and your data structures are basic, then simple formatting rules should be sufficient to capture your data. In any other instance we would strongly recommend using regex for your data formatting.

The key elements of these formats are: 

    • # = Number
    • A = Letter
    • X = Any character
    • ? = Optional character (ie. a character may or may not appear in the place). This should only be added at the beginning OR the end of the format - not both or within other characters.
    • Other = Entering any other character tells the engine to expect that exact character.

For example, if you are capturing a date field in the form of month/day/year then you may wish to set the format to ##/##/20##. Or a fixed-length currency value may be set as $###,###.## or ???.##.

Regular Expressions

Umango offers regular expressions as a more powerful alternative to the simple formatting syntax. To use regular expressions, the format must be encapsulated in parenthesis and preceded by the term REGEX. For example, REGEX([0-9]{3,7}) provides an alternative and better option to the format ###???

If you have limited knowledge of regex, Umango provides a regex builder and testing function. The regex builder includes an AI natural language method of creating a suitable regex for your capture requirements.

The Regex Builder Button

In addition, there are many online regular expression builders and helpers available. A few include: