Document Classification in FineReader Engine 12 - Code Sample (Windows)

Language:
EN
Product-Line:
FineReader Engine
Version:
12
Platform:
Windows
Type:
Knowledge Base & Support
KB-Type:
Code Samples Collection
Category:
Document Classification
Coding:
C#
Image:
image: icon_classification.gif

The code sample demonstrates how ABBYY FineReader Engine can be used for document classification. It provides a ready-to-use solution for training your own classification models and classifying documents with their help.

Description

The document classification in the FineReader Engine 12 is a three-step-process:

Classification steps

Creation of the training data set

In the first step, you will need to select a representative set of images for training, with several samples of each category. We recommend saving this “training data” set for later use, for example, if you decide to add another category or improve the classification quality by adding more training images.

The training data is then used to train a classification model. You can choose various training settings:

  • set of features which will be used for classification: image features, recognized text characteristics, or both
  • languages of the documents, if the recognized text is used
  • use of k-fold cross-validation and its parameters

After you reviewed the classification results of the control document set, you can decide if the quality is acceptable and save the successfully trained classification model into a file on disk. You can use this classification model to classify documents both with the help of this sample and in any other applications using FineReader Engine API.

Training of the classification model

  1. Click Create new model. In the window that opens choose if you would like to auto-correct image orientation for training images and select the language of the documents if you are going to use the text-based classifier. Click Next.
  2. Set up the training data set by adding the categories and images for each category. Click Save to store your training data set on disk; you will be able to load it later by clicking on Load in the same window.
  3. On the next screen, set up the training parameters:
    1. select image and/or text features to be used for classification
    2. choose if you are going to use k-fold cross-validation and set the number of folds; if you prefer not to use cross-validation, the whole training data set will be used for training
    3. choose if the training process aims for high recall, high precision, or the balance between the two
  4. Click Train. If validation was performed, the validation statistics and the confusion matrix will be displayed. Review the results and save the model if the quality is acceptable.

Classify a batch of documents (in the production)

  1. On the start screen, specify the paths to your model file and to the folder with the images for classification.
  2. Set the Correct images orientation flag if needed.
  3. Click Run classification.

For the description of used objects and their methods and properties please refer to the Developer's Help of the product.


Back To:

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.