AI Document Types

Umango can use artificial intelligence (AI) to analyze documents and automatically capture data based on a document type. These Document Types are then assigned to jobs which utilize the AI training to understand what data to capture and how to capture it.

Pre-trained Document Types are common to many businesses and industries and are flexible with the variations and data fields they capture. The pre-trained document types shipped as standard in Umango and do not require any further training.

Custom Document Types can be trained based on your own document samples. To use custom document types you need to complete the training process before assigning them to your jobs.

Information on the available document types are listed below.

Document Type	Description	Languages Supported
Custom Trained	Train and name your own document type using your own sample documents. While creating custom document types requires completing an AI training process, the results are often significantly more accurate and reliable compared to using the pre-trained Structured Document type or relying on traditional OCR zone-based capture methods. Neural Trained: Documents can be semi-structured. Training takes 20 mins to 1hr. Signature fields not supported. Fields can overlap. Template Trained: Documents must have a consistent structured. Training takes 1-5 mins. Signature fields are supported. Fields cannot overlap.	Neural & Template Handwritten Text: English, Chinese (simplified), French, German, Italian, Japanese, Korean, Portuguese, Spanish Neural Trained Machine Text: Afrikaans, Albanian, Arabic, Bulgarian, Chinese Simplified, Chinese Traditional, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Marathi, Modern Greek, Nepali, Norwegian, Panjabi, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali (Arabic), Somali (Latin), Spanish, Swahili, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese Template Trained Machine Text: Virtually all languages. Abaza, Abkhazian, Achinese, Acoli, Adangme, Adyghe, Afar, Afrikaans, Akan, Albanian, Algonquin, Angika (Devanagari), Arabic, Asturian, Asu (Tanzania), Avaric, Awadhi-Hindi (Devanagari), Aymara, Azerbaijani (Latin), Bafia, Bagheli, Bambara, Bashkir, Basque, Belarusian (Cyrillic), Belarusian (Latin), Bemba (Zambia), Bena (Tanzania), Bhojpuri-Hindi (Devanagari), Bikol, Bini, Bislama, Bodo (Devanagari), Bosnian (Latin), Brajbha, Breton, Bulgarian, Bundeli, Buryat (Cyrillic), Catalan, Cebuano, Chamling, Chamorro, Chechen, Chhattisgarhi (Devanagari), Chiga, Chinese Simplified, Chinese Traditional, Choctaw, Chukot, Chuvash, Cornish, Corsican, Cree, Creek, Crimean Tatar (Latin), Croatian, Crow, Czech, Danish, Dargwa, Dari, Dhimal (Devanagari), Dogri (Devanagari), Duala, Dungan, Dutch, Efik, English, Erzya (Cyrillic), Estonian, Faroese, Fijian, Filipino, Finnish, Fon, French, Friulian, Ga, Gagauz (Latin), Galician, Ganda, Gayo, German, Gilbertese, Gondi (Devanagari), Greek, Greenlandic, Guarani, Gurung (Devanagari), Gusii, Haitian Creole, Halbi (Devanagari), Hani, Haryanvi, Hawaiian, Hebrew, Herero, Hiligaynon, Hindi, Hmong Daw (Latin), Ho(Devanagiri), Hungarian, Iban, Icelandic, Igbo, Iloko, Inari Sami, Indonesian, Ingush, Interlingua, Inuktitut (Latin), Irish, Italian, Japanese, Jaunsari (Devanagari), Javanese, Jola-Fonyi, Kabardian, Kabuverdianu, Kachin (Latin), Kalenjin, Kalmyk, Kangri (Devanagari), Kanuri, Karachay-Balkar, Kara-Kalpak (Cyrillic), Kara-Kalpak (Latin), Kashubian, Kazakh (Cyrillic), Kazakh (Latin), Khakas, Khaling, Khasi, K'iche', Kikuyu, Kildin Sami, Kinyarwanda, Komi, Kongo, Korean, Korku, Koryak, Kosraean, Kpelle, Kuanyama, Kumyk (Cyrillic), Kurdish (Arabic), Kurdish (Latin), Kurukh (Devanagari), Kyrgyz (Cyrillic), Lak, Lakota, Latin, Latvian, Lezghian, Lingala, Lithuanian, Lower Sorbian, Lozi, Lule Sami, Luo (Kenya and Tanzania), Luxembourgish, Luyia, Macedonian, Machame, Madurese, Mahasu Pahari (Devanagari), Makhuwa-Meetto, Makonde, Malagasy, Malay (Latin), Maltese, Malto (Devanagari), Mandinka, Manx, Maori, Mapudungun, Marathi, Mari (Russia), Masai, Mende (Sierra Leone), Meru, Meta', Minangkabau, Mohawk, Mongolian (Cyrillic), Mongondow, Montenegrin (Cyrillic), Montenegrin (Latin), Morisyen, Mundang, Nahuatl, Navajo, Ndonga, Neapolitan, Nepali, Ngomba, Niuean, Nogay, North Ndebele, Northern Sami (Latin), Norwegian, Nyanja, Nyankole, Nzima, Occitan, Ojibwa, Oromo, Ossetic, Pampanga, Pangasinan, Papiamento, Pashto, Pedi, Persian, Polish, Portuguese, Punjabi (Arabic), Quechua, Ripuarian, Romanian, Romansh, Rundi, Russian, Rwa, Sadri (Devanagari), Sakha, Samburu, Samoan (Latin), Sango, Sangu (Gabon), Sanskrit (Devanagari), Santali(Devanagiri), Scots, Scottish Gaelic, Sena, Serbian (Cyrillic), Serbian (Latin), Shambala, Shona, Siksika, Sirmauri (Devanagari), Skolt Sami, Slovak, Slovenian, Soga, Somali (Arabic), Somali (Latin), Songhai, South Ndebele, Southern Altai, Southern Sami, Southern Sotho, Spanish, Sundanese, Swahili (Latin), Swati, Swedish, Tabassaran, Tachelhit, Tahitian, Taita, Tajik (Cyrillic), Tamil, Tatar (Cyrillic), Tatar (Latin), Teso, Tetum, Thai, Thangmi, Tok Pisin, Tongan, Tsonga, Tswana, Turkish, Turkmen (Latin), Tuvan, Udmurt, Uighur (Cyrillic), Ukrainian, Upper Sorbian, Urdu, Uyghur (Arabic), Uzbek (Arabic), Uzbek (Cyrillic), Uzbek (Latin), Vietnamese, Volapük, Vunjo, Walser, Welsh, Western Frisian, Wolof, Xhosa, Yucatec Maya, Zapotec, Zarma, Zhuang, Zulu.
Invoices/Purchase Orders	Extract invoice ID, customer details, vendor details, ship to, bill to, total tax, subtotal, line items and more. Detailed tax information is included for India, Germany, Spain, Portugal and Canada.	Albanian, Arabic, Bulgarian, Chinese (simplified), Chinese (traditional), Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian (Cyrillic), Serbian (Latin), Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese
Receipts	Extract time and date of the transaction, merchant information, amounts of taxes, totals and more.	Afrikaans, Akan, Albanian, Arabic, Azerbaijani, Bamanankan, Basque, Belarusian, Bhojpuri, Bosnian, Bulgarian, Catalan, Cebuano, Corsican, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Fijian, Filipino, Finnish, French, Galician, Ganda, German, Greek, Guarani, Haitian Creole, Hawaiian, Hebrew, Hindi, Hmong Daw, Hungarian, Icelandic, Igbo, Iloko, Indonesian, Irish, isiXhosa, isiZulu, Italian, Japanese, Javanese, Kazakh, Kazakh (Latin), Kinyarwanda, Kiswahili, Korean, Kurdish, Kurdish (Latin), Kyrgyz, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Maltese, Maori, Marathi, Maya, Yucatán, Mongolian, Nepali, Norwegian, Nyanja, Oromo, Pashto, Persian, Persian (Dari), Polish, Portuguese, Punjabi, Quechua, Romanian, Russian, Samoan, Sanskrit, Scottish Gaelic, Serbian (Cyrillic), Serbian (Latin), Sesotho, Sesotho sa Leboa, Shona, Slovak, Slovenian, Somali (Latin), Spanish, Sundanese, Swedish, Tahitian, Tajik, Tamil, Tatar, Tatar (Latin), Thai, Tongan, Turkish, Turkmen, Ukrainian, Upper Sorbian, Uyghur, Uyghur (Arabic), Uzbek, Uzbek (Latin), Vietnamese, Welsh, Western Frisian, Xitsonga
Business Cards	Extract person name, job title, address, email, company, and phone numbers from business cards.	English, Japanese
ID Cards	Extract name, expiration date, machine readable zone, and more from passports, drivers licenses and ID cards.	Worldwide: Passport Book, Passport Card United States: Driver License, Identification Card, Residency Permit (Green card), Social Security Card, Military ID Europe: Driver License, Identification Card, Residency Permit Southeast Asia: Driver License, Identification Card, Residency Permit India: Driver License, PAN Card, Aadhaar Card Canada: Driver License, Identification Card, Residency Permit Australia: Driver License, Photo Card, Key-pass ID New Zealand: Driver License, Identification Card, Residency Permit
Contract/Agreement	Extract the title and signatory parties information (including names, reference names, and addresses) from contracts.	English
Structured Documents	Extract key value pairs and tables from any consistently structured forms or documents. Important! When using this document type, be sure to process documents that are exactly the same structure as the job's sample or results are likely to be poor or inaccurate.	All supported OCR languages

* Some languages are only supported when the extended language support option is enabled in the Umango license

This doesn’t mean Umango can’t read documents containing languages not included in the ones mentioned above (as long as they are based on the English character set) but accuracy may be diminished.