Formatting and Validation
Formatting rules are based around the expected structure of data being captured or keyed by a user. Validation rules are used to ensure that values are correctly structured. Carefully setting the formatting and validation options will provide greater data integrity and ensure that data structures and types will meet the export requirements. Where applicable, these rules also assist the OCR engine and provide for greater OCR read quality. These settings tell the OCR engine what characters to expect and where to expect them in the character string.
White Characters
White characters are the letters, numbers, and symbols expected to be seen within a zone. If decimal numbers are expected, such as a subtotal or price, then the white characters may be 123456789,.$ as this will allow for results such as $1254.35 or 1000.00. Note that entering nothing will set the white characters to expect and allow any characters.
There are four white character settings:
- Any Character: This option sets the OCR engine to expect an open possibility of characters. This is the least restrictive selection, although it is the broadest setting and may lead to confusion between characters such as the number "1" and the letter "I" and the letter "O" and the number "0". This option should be used when the expected content of the zone is unknown.
- Letters: If only text characters are expected (alphabetical characters of both upper and lower case) then this selection should be used.
- Numbers: When only numeric integer characters are expected with no alphabetical characters then this selection should be used.
- Custom: This selection allows control over what characters are expected and is manually entered in the space provided.
Format Structure
Format structure identifies the placement of expected characters within the result. Format structures can be configured using a simple format (as described below) or a regular expression rule (which requires a basic understanding of regex syntax).
Simple Formatting
If you are not confident with regular expressions, do not want to take the time to learn and your data structures are basic, then simple formatting rules should be sufficient to control your data structure. In any other instance we would strongly recommend using regex for your data formatting.
The key elements of these formats are:
- # = Number
- A = Letter
- X = Any character
- ? = Optional character (ie. a character may or may not appear in the place). This should only be added at the beginning OR the end of the format - not both or within other characters.
- Other = Entering any other character tells the engine to expect that exact character.
For example, if you are expecting a date field in the form of month/day/year then you may wish to set the format to ##/##/20##. Or a fixed-length currency value may be set as $###,###.## or ???.##.
Regular Expressions
Umango offers regular expressions as a more powerful alternative to the simple formatting syntax. To use regular expressions, the format must be encapsulated in parenthesis and preceded by the term REGEX. For example, REGEX([0-9]{3,7}) provides an alternative and better option to the format ###???
If you have limited knowledge of regex, Umango provides a regex builder and testing function. This is accessible in any field that supports regex.
The Regex Builder Button
In addition, there are many online regular expression builders and helpers available online. A few include:
Validation Lookup
A zone value can be validated against values stored within an external database. This is done using ODBC lookup queries. Use the builder to create a lookup.
Lookups can validate in the following ways:
- True if the zone value is found in the database field.
- True if the zone value is NOT found in the database field.
- Display a value that is related to the zone value (e.g. Display the customer name where a customer ID is provided).
- Return a value that is related to the zone value (e.g. Return the customer ID where a customer name is provided). The returned value is used as the zone index value to be exported.
Enforce Formatting
By selecting to enforce formatting, the processor is required to ensure the index value meets the formatting criteria before the batch can be exported. This is particularly useful when exporting to a database that requires data to be parsed in a certain format.