Template Matching for Forms

Language:
EN
Product-Line:
FlexiCapture Engine
Version:
9.x, 10, 11
Type:
Technology & Features
Category:
Imaging, Recognition

The article below describes the differences between “normal” full-text OCR and forms-processing. It illustrates mechanisms how fixed form templates are matched and how optimized machine readable paper forms can be designed.

Text & Coordinates

If optical text recognition is applied on a “normal” scanned document or PDF the layouts can be very different from page to page, even in one document. To be able to “read” the text with OCR technology the right zones have to be defined or located. Recognition areas can be “drawn”:

  • manually” by an operator (rather unlikely)
  • from your programming code
    or
  • the text areas are automatically detected by intelligent algorithms that are part of ABBYY's Document/Layout Analysis

When forms are processed the layout/structure of the different forms are well known.

  • So it is possible to create a layout template that contains the coordinates of all areas that should be recognized.
    Since the extracted information of the fields, will be exported into a database or a spreadsheet application (e.g Excel™) it is important to define the correct data type.
    So if the matching form template can be detected, then (in theory) all coordinates of the relevant areas as well as the data types for the export are known.
  • In real life forms-processing different forms have to be processed with one system. Pre-sorting would be too expensive, so it is important to match the filled and scanned paper-forms with the previously defined form templates.

Machine Readable Forms

  • To enable fast and reliable template matching, machine-readable forms should contain special identifiers. They allow the processing technology to determine the different types.
  • Graphical elements are used for “identification” , they are called “Anchors”.
    An anchor is an element of a machine-readable form that is used for matching and to detect its orientation before recognition. Typically the following elements are used:
  • Additionally unique barcodes can be used to pick the right template.
    Technical note: it is important that if the form is changed in the designer, a new barcode (value) is used, otherwise the changes made can result in moved areas and only partly recognized data.

Here a screenshot from the ABBYY FlexiCapture Document Definition Editor:

In the Document Editor an administrator of the system is able to define the:

  • anchors for template matching
  • recognition areas, set the data types and text recognition settings
  • data export settings
  • logical document structure of multi-page forms with
    • repeatable pages and
    • annex pages
    • –> all this is important, because it enables you to build powerful template matching ;-)

Design of machine-readable Paper Forms

ABBYY FormDesigner is an application that is part of FlexiCapture Standalone 1) and it is used for creating optimized machine-readable survey forms. It is often used when new processes and forms are developed, to ensure good recognition rates in the earliest stage.

  • The application was developed, to make it easy to use optimized elements that are required for machine-form-recognition:
    • Anchors for template matching
    • Barcode support
    • Text fields that indicate that the characters should be written separated and big enough
  • The generated forms then can be exported for further distribution as:
    • a printout on paper
    • a searchable PDF
    • a XML Form description and
    • a Microsoft Infopath document

  • The designed forms can also be imported into ABBYY FlexiCapture to speed up the setup process for new forms.

Template Matching in ABBYY Products

  • FlexiCapture Engine (and FlexiCapture) come with built in template matching for multi-page fixed forms 2)
  • In FineReader Engine and Cloud OCR SDK the developers can / have to define zones via code, but there is no built-in matching technology.

Back to: Technology & Features

1) = included in FlexiCapture Engine
2) FlexiCapture Standalone with the Document Definition Editor and FormDesigner are part of the Developer Package
This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.