Document AI Fields
Umango can use artificial intelligence (AI) to analyze documents and automatically capture data based on a document type.
To use AI, each job is assigned a document type. When uploading the first sample document in a job the user is prompted to decide if AI will be enabled and if so, what engine and document type should be set.
Enabling Document AI in a Job
You can choose to enable AI when prompted or enable it at any time in the life cycle of a job. AI can be disabled or enabled using the Enable AI and DIsable AI buttons located within the zones tab.
Enabling Document AI
Disabling Document AI
You may find that AI is not required or that other methods of data capture are more suited to your requirements. You may also find that your documents are not suited to capturing with AI.
Note: It is strongly recommended that AI be disabled unless you require it in your job. AI processing adds a significant time overhead to your document processing.
Read on to determine AI's suitability for your use case.
Selecting An AI Engine
There are a number of points for consideration when choosing the correct AI engine for processing documents. These include:
Consideration |
Cloud Processing |
On-Premise Processing |
Internet connection |
Required |
Not required |
Security |
Documents are temporarily and securely copied to the cloud during processing |
Documents do not leave the Umango server during processing |
Language support |
Limited (see supported language table below) |
All OCR languages* |
Accuracy |
Highly accurate |
Less accurate |
Document type |
Supports common semi-structured document types and structured documents |
Only supports documents that have a consistent structure |
Flexibility of document structure (data can be anywhere in the document) |
Highly flexible for semi-structured document types (eg. invoice and receipt document types) |
Only structured documents are supported |
Line item extraction |
Invoice line items are extracted |
Not supported |
Table extraction |
Table cells are detected and extracted and table rows. Column names are also detected. |
Table cells are detected and extracted and table rows |
Handwriting recognition |
Supported |
Not supported |
Speed of processing |
Slower (also dependent on internet connection speed) |
Faster (not dependent on an internet connection) |
Selecting A Document Type
For the purpose of AI capture, document types fall into 3 basic categories:
- Semi-structured documents: The data expected on the documents is reasonably consistent but its location may not be. Examples include invoices, business cards, receipts etc.
- Structured Documents: The appearance of the document and the data it contains is consistent. Data will be located in the same location on every document processed
- Unstructured Documents: The data on the documents will be located anywhere within the document and the type of document does not fall into one of the available semi-structured document types. These documents are not suitable for AI data capture in Umango.
AI Data Field Categories
Within the semi-structured document category, the Umango cloud AI engine supports various common document types. During processing the AI engine will search for data common to each document type. These data fields are known as "standard fields" in Umango and in most instances are preferred. In addition, the AI engine may find data that it thinks is useful and these additional data values are called "structured document fields". These structured document fields may not appear on every document unless the documents being processed are of the same structure (layout).
For example, when processing invoices, Umango will try to find an invoice number, invoice date, part numbers, item quantities and an invoice total etc. These are among the data fields expected on every invoice and will consistently be captured if present anywhere on the document. These are "standard data fields". In addition, peripheral data fields may be found. These "structured fields" are useful when all the documents to be processed in a job will be the same structure. For example, if all the invoices to be processed in a job will to be coming from the same supplier then the structured fields would be consistently captured and usable.
All data captured using the structured document type option will be captured as "structured document fields".
Languages Supported
Some document types will be more accurate than others when certain languages are contained on the documents. For on-premise AI processing, all Umango's OCR languages are supported. However, for cloud processing, it is dependent on the document type in use.
Note About Taxes
In the Invoice document type, the TaxDetails field collection is only trained for collecting tax type information for India, Germany, Spain, Portugal and Canada. In other regions, tax information is provided as a total in the TotalTax field or in some instances on the line item level in the Item.Tax field.
The AI engines are tuned for the languages/regions below:
Document Type |
Cloud Processing |
On-Premise Processing |
Handwritten Text |
Arabic, Chinese Simplified, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Thai |
Not supported |
Invoices/Purchase Orders |
Albanian, Arabic, Bulgarian, Chinese (simplified), Chinese (traditional), Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian (Cyrillic), Serbian (Latin), Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese |
Not supported |
Receipts |
Afrikaans, Akan, Albanian, Arabic, Azerbaijani, Bamanankan, Basque, Belarusian, Bhojpuri, Bosnian, Bulgarian, Catalan, Cebuano, Corsican, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Fijian, Filipino, Finnish, French, Galician, Ganda, German, Greek, Guarani, Haitian Creole, Hawaiian, Hebrew, Hindi, Hmong Daw, Hungarian, Icelandic, Igbo, Iloko, Indonesian, Irish, isiXhosa, isiZulu, Italian, Japanese, Javanese, Kazakh, Kazakh (Latin), Kinyarwanda, Kiswahili, Korean, Kurdish, Kurdish (Latin), Kyrgyz, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Maltese, Maori, Marathi, Maya, Yucatán, Mongolian, Nepali, Norwegian, Nyanja, Oromo, Pashto, Persian, Persian (Dari), Polish, Portuguese, Punjabi, Quechua, Romanian, Russian, Samoan, Sanskrit, Scottish Gaelic, Serbian (Cyrillic), Serbian (Latin), Sesotho, Sesotho sa Leboa, Shona, Slovak, Slovenian, Somali (Latin), Spanish, Sundanese, Swedish, Tahitian, Tajik, Tamil, Tatar, Tatar (Latin), Thai, Tongan, Turkish, Turkmen, Ukrainian, Upper Sorbian, Uyghur, Uyghur (Arabic), Uzbek, Uzbek (Latin), Vietnamese, Welsh, Western Frisian, Xitsonga |
Not supported |
Business Cards |
English, Japanese |
Not supported |
ID Cards |
Worldwide: Passport Book, Passport Card United States: Driver License, Identification Card, Residency Permit (Green card), Social Security Card, Military ID Europe: Driver License, Identification Card, Residency Permit India: Driver License, PAN Card, Aadhaar Card Canada: Driver License, Identification Card, Residency Permit Australia: Driver License, Photo Card, Key-pass ID |
Not supported |
Contract/Agreement |
English |
Not supported |
Structured Documents (Machine Print) |
Abaza, Abkhazian, Achinese, Acoli, Adangme, Adyghe, Afar, Afrikaans, Akan, Albanian, Algonquin, Angika (Devanagari), Arabic, Asturian, Asu (Tanzania), Avaric, Awadhi-Hindi (Devanagari), Aymara, Azerbaijani (Latin), Bafia, Bagheli, Bambara, Bashkir, Basque, Belarusian (Cyrillic), Belarusian (Latin), Bemba (Zambia), Bena (Tanzania), Bhojpuri-Hindi (Devanagari), Bikol, Bini, Bislama, Bodo (Devanagari), Bosnian (Latin), Brajbha, Breton, Bulgarian, Bundeli, Buryat (Cyrillic), Catalan, Cebuano, Chamling, Chamorro, Chechen, Chhattisgarhi (Devanagari), Chiga, Chinese Simplified, Chinese Traditional, Choctaw, Chukot, Chuvash, Cornish, Corsican, Cree, Creek, Crimean Tatar (Latin), Croatian, Crow, Czech, Danish, Dargwa, Dari, Dhimal (Devanagari), Dogri (Devanagari), Duala, Dungan, Dutch, Efik, English, Erzya (Cyrillic), Estonian, Faroese, Fijian, Filipino, Finnish, Fon, French, Friulian, Ga, Gagauz (Latin), Galician, Ganda, Gayo, German, Gilbertese, Gondi (Devanagari), Greek, Greenlandic, Guarani, Gurung (Devanagari), Gusii, Haitian Creole, Halbi (Devanagari), Hani, Haryanvi, Hawaiian, Hebrew, Herero, Hiligaynon, Hindi, Hmong Daw (Latin), Ho(Devanagiri), Hungarian, Iban, Icelandic, Igbo, Iloko, Inari Sami, Indonesian, Ingush, Interlingua, Inuktitut (Latin), Irish, Italian, Japanese, Jaunsari (Devanagari), Javanese, Jola-Fonyi, Kabardian, Kabuverdianu, Kachin (Latin), Kalenjin, Kalmyk, Kangri (Devanagari), Kanuri, Karachay-Balkar, Kara-Kalpak (Cyrillic), Kara-Kalpak (Latin), Kashubian, Kazakh (Cyrillic), Kazakh (Latin), Khakas, Khaling, Khasi, K'iche', Kikuyu, Kildin Sami, Kinyarwanda, Komi, Kongo, Korean, Korku, Koryak, Kosraean, Kpelle, Kuanyama, Kumyk (Cyrillic), Kurdish (Arabic), Kurdish (Latin), Kurukh (Devanagari), Kyrgyz (Cyrillic), Lak, Lakota, Latin, Latvian, Lezghian, Lingala, Lithuanian, Lower Sorbian, Lozi, Lule Sami, Luo (Kenya and Tanzania), Luxembourgish, Luyia, Macedonian, Machame, Madurese, Mahasu Pahari (Devanagari), Makhuwa-Meetto, Makonde, Malagasy, Malay (Latin), Maltese, Malto (Devanagari), Mandinka, Manx, Maori, Mapudungun, Marathi, Mari (Russia), Masai, Mende (Sierra Leone), Meru, Meta', Minangkabau, Mohawk, Mongolian (Cyrillic), Mongondow, Montenegrin (Cyrillic), Montenegrin (Latin), Morisyen, Mundang, Nahuatl, Navajo, Ndonga, Neapolitan, Nepali, Ngomba, Niuean, Nogay, North Ndebele, Northern Sami (Latin), Norwegian, Nyanja, Nyankole, Nzima, Occitan, Ojibwa, Oromo, Ossetic, Pampanga, Pangasinan, Papiamento, Pashto, Pedi, Persian, Polish, Portuguese, Punjabi (Arabic), Quechua, Ripuarian, Romanian, Romansh, Rundi, Russian, Rwa, Sadri (Devanagari), Sakha, Samburu, Samoan (Latin), Sango, Sangu (Gabon), Sanskrit (Devanagari), Santali(Devanagiri), Scots, Scottish Gaelic, Sena, Serbian (Cyrillic), Serbian (Latin), Shambala, Shona, Siksika, Sirmauri (Devanagari), Skolt Sami, Slovak, Slovenian, Soga, Somali (Arabic), Somali (Latin), Songhai, South Ndebele, Southern Altai, Southern Sami, Southern Sotho, Spanish, Sundanese, Swahili (Latin), Swati, Swedish, Tabassaran, Tachelhit, Tahitian, Taita, Tajik (Cyrillic), Tamil, Tatar (Cyrillic), Tatar (Latin), Teso, Tetum, Thai, Thangmi, Tok Pisin, Tongan, Tsonga, Tswana, Turkish, Turkmen (Latin), Tuvan, Udmurt, Uighur (Cyrillic), Ukrainian, Upper Sorbian, Urdu, Uyghur (Arabic), Uzbek (Arabic), Uzbek (Cyrillic), Uzbek (Latin), Vietnamese, Volapük, Vunjo, Walser, Welsh, Western Frisian, Wolof, Xhosa, Yucatec Maya, Zapotec, Zarma, Zhuang, Zulu |
All OCR languages* |
* Some languages are only supported when the extended language support option is enabled in the Umango license
This doesn’t mean Umango can’t read documents containing languages not included in the ones mentioned above (as long as they are based on the English character set) but accuracy may be diminished.
For details on configuring Umango to validate AI field data within a zone, read the zone properties section.