Understanding Document Types
Monday, July 13, 2009 at 3:25PM Standardized documents with predictable formats and information help make data recognition and extraction easier and more accurate. However, today's business documents come with increasingly varying structures to them. In fact, recent studies estimate that up to 80 percent of business data is semi to un-structured, meaning that until recently, up to 80 percent of information had to be handled manually.
Here are some characteristics for the different content types:
Structured Documents—these are document types where data is always in the same area or region of the page and usually only requires zonal OCR or best-in-class forms processing. These types include forms such as new customer applications, spreadsheets, and health claim forms, benefit forms, etc.
Semi-Structured Data—these are document types where data required from the page is the same but can vary in location from one vendor or customer to another. This document type usually requires free-form technology in order to locate and extract the data for validation and/or export. Examples of these document types are invoices, purchase orders, shipping documents, bills of lading, etc.
Unstructured Data—these are documents types where data or information is in the page but not always in the same area. This document type usually requires conversion of text into electronic format such as PDF or text recognition could be used to identify what the document is all about. Unstructured content may include emails, letters and customer correspondence.
Until recently, the only way to capture information from un-structured or semi-structured document was to have a live human read and manually index incoming information. Needless to say this is an expensive and time consuming way to process a document.
If you're already capturing structured forms and want to find out how you can integrate your non-structured content, click here or dial 1-888-726-7730 to speak with a PaperFree consultant for a free evaluation.

Reader Comments