How to Extract Structured Data from Tables and Forms in Images

How to Extract Structured Data from Tables and Forms in Images

Extracting data from scanned tables or filled forms is essential for business automation, reporting, and compliance. Aspose.OCR Table to Text for .NET streamlines this process, accurately detecting cell and field structure and exporting to editable formats.

Real-World Problem

Businesses often receive invoices, reports, or forms as images or scans. Manual entry of tabular data or form fields is slow, error-prone, and expensive at scale.

Solution Overview

With Table to Text for .NET, you can extract structured data—including rows, columns, and field values—directly from images. Results can be exported to Excel, JSON, or integrated with databases and automation platforms.


Prerequisites

  1. Visual Studio 2019 or later
  2. .NET 6.0 or later (or .NET Framework 4.6.2+)
  3. Aspose.OCR for .NET from NuGet
  4. Basic C# skills
PM> Install-Package Aspose.OCR

Step-by-Step Implementation

Step 1: Install and Configure Aspose.OCR

using Aspose.OCR;

Step 2: Prepare Table or Form Images

OcrInput input = new OcrInput(InputType.SingleImage);
input.Add("scanned_invoice.png");
input.Add("filled_form.jpg");

Step 3: Configure Recognition Settings for Tables/Forms

RecognitionSettings settings = new RecognitionSettings();
settings.DetectAreasMode = DetectAreasMode.TABLE; // or DetectAreasMode.FORM
settings.Language = Language.English;

Step 4: Extract Table or Form Data

AsposeOcr ocr = new AsposeOcr();
List<RecognitionResult> results = ocr.Recognize(input, settings);

Step 5: Export Structured Output

foreach (RecognitionResult result in results)
{
    result.Save("output_table.xlsx", SaveFormat.Xlsx); // Spreadsheet
    result.Save("output_table.json", SaveFormat.Json); // JSON
    result.Save("output_table.txt", SaveFormat.Text);  // Text
}

Step 6: Handle Errors and Validate Results

try
{
    AsposeOcr ocr = new AsposeOcr();
    List<RecognitionResult> results = ocr.Recognize(input, settings);
    // Further processing...
}
catch (Exception ex)
{
    Console.WriteLine($"Error: {ex.Message}");
}

Step 7: Optimize for Table/Form Variations

  • Test on samples with different borders, fonts, or field placements
  • Tune preprocessing settings for best detection

Step 8: Automate Batch Extraction

Process all relevant images in a folder:

foreach (string file in Directory.GetFiles("./forms", "*.jpg"))
{
    input.Add(file);
}

Step 9: Complete Example

using Aspose.OCR;
using System;
using System.Collections.Generic;
using System.IO;

class Program
{
    static void Main(string[] args)
    {
        try
        {
            OcrInput input = new OcrInput(InputType.SingleImage);
            input.Add("scanned_invoice.png");
            input.Add("filled_form.jpg");

            RecognitionSettings settings = new RecognitionSettings();
            settings.DetectAreasMode = DetectAreasMode.TABLE;
            settings.Language = Language.English;

            AsposeOcr ocr = new AsposeOcr();
            List<RecognitionResult> results = ocr.Recognize(input, settings);

            foreach (RecognitionResult result in results)
            {
                result.Save("output_table.xlsx", SaveFormat.Xlsx);
                result.Save("output_table.json", SaveFormat.Json);
                result.Save("output_table.txt", SaveFormat.Text);
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
        }
    }
}

Use Cases and Applications

Invoice and Report Automation

Extract financial data for accounting or analytics.

Survey and Registration Forms

Pull structured responses for CRM, ERP, or BI systems.

Compliance and Audit

Automate extraction and validation of data from submitted forms or tables.


Common Challenges and Solutions

Challenge 1: Irregular Table Borders or Layouts

Solution: Use preprocessing and sample tuning to improve detection.

Challenge 2: Mixed Content (Text and Tables)

Solution: Run with AUTO or separate by image type for best results.

Challenge 3: Complex Forms with Many Fields

Solution: Test and tweak recognition for high-density forms.


Performance Considerations

  • Table recognition is more CPU-intensive; monitor batch jobs
  • Validate outputs for critical workflows
  • Batch export for integration with other tools

Best Practices

  1. Validate structured data on samples before scaling
  2. Secure and archive both source images and extracted outputs
  3. Update Aspose.OCR regularly for accuracy improvements
  4. Tune settings for new document layouts

Advanced Scenarios

Scenario 1: Export to Database or BI Tools

// Use JSON or Excel export for integration with data pipelines

Scenario 2: Real-Time Extraction in Web Apps

// Integrate extraction logic into ASP.NET or workflow API

Conclusion

Aspose.OCR Table to Text for .NET empowers you to automate structured data extraction from images and forms, supporting everything from business automation to compliance and analytics.

For advanced table extraction features, visit the Aspose.OCR for .NET API Reference .

 English