How to Extract Tables and Tabular Data from Images with Aspose.OCR
Extracting tables from images, forms, or scanned reports is challenging—manual retyping is slow and error-prone. Aspose.OCR Table to Text for .NET automates the extraction and structuring of tabular data from images and photos.
Real-World Problem
Financial statements, survey forms, and scientific results are often trapped in scanned tables or images. Manually recreating this data wastes hours and risks introducing errors.
Solution Overview
Aspose.OCR for .NET can accurately detect, extract, and convert tables from images or scanned PDFs into machine-readable formats—perfect for Excel, reporting, or workflow automation.
Prerequisites
- Visual Studio 2019 or later
- .NET 6.0 or later (or .NET Framework 4.6.2+)
- Aspose.OCR for .NET from NuGet
- Basic C# knowledge
PM> Install-Package Aspose.OCR
Step-by-Step Implementation
Step 1: Install and Configure Aspose.OCR
using Aspose.OCR;
Step 2: Scan or Photograph Images Containing Tables
OcrInput input = new OcrInput(InputType.SingleImage);
input.Add("table1.jpg");
input.Add("report_page.png");
Step 3: Configure Table Recognition Settings
RecognitionSettings settings = new RecognitionSettings();
settings.Language = Language.English;
settings.DetectAreasMode = DetectAreasMode.TABLE; // Key for tables
Step 4: Run the Table Extraction Process
AsposeOcr ocr = new AsposeOcr();
List<RecognitionResult> results = ocr.Recognize(input, settings);
Step 5: Export Table Data
foreach (RecognitionResult result in results)
{
result.Save("table_data.xlsx", SaveFormat.Xlsx); // Excel output
result.Save("table_data.csv", SaveFormat.Csv); // CSV output
result.Save("table_data.txt", SaveFormat.Text); // Plain text output
}
Step 6: Add Error Handling and Validation
try
{
AsposeOcr ocr = new AsposeOcr();
List<RecognitionResult> results = ocr.Recognize(input, settings);
// Further processing
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
}
Step 7: Optimize for Complex, Rotated, or Multi-Page Tables
- Preprocess images to deskew or crop
- Use high-resolution scans or photos
- For multipage PDFs, add each page as a separate input
foreach (string file in Directory.GetFiles("./scans", "*.png"))
{
input.Add(file);
}
Step 8: Complete Example
using Aspose.OCR;
using System;
using System.Collections.Generic;
class Program
{
static void Main(string[] args)
{
try
{
OcrInput input = new OcrInput(InputType.SingleImage);
input.Add("table1.jpg");
input.Add("report_page.png");
RecognitionSettings settings = new RecognitionSettings();
settings.Language = Language.English;
settings.DetectAreasMode = DetectAreasMode.TABLE;
AsposeOcr ocr = new AsposeOcr();
List<RecognitionResult> results = ocr.Recognize(input, settings);
foreach (RecognitionResult result in results)
{
result.Save("table_data.xlsx", SaveFormat.Xlsx);
result.Save("table_data.csv", SaveFormat.Csv);
result.Save("table_data.txt", SaveFormat.Text);
}
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
}
}
}
Use Cases and Applications
Financial and Scientific Reporting
Extract tables from financial statements, lab results, or research papers for instant analysis in Excel.
Survey and Form Processing
Digitize tables from scanned forms, checklists, or census records.
Workflow Automation
Feed structured table data directly into your business applications, BI tools, or databases.
Common Challenges and Solutions
Challenge 1: Poor-Quality or Complex Tables
Solution: Use high-res images and test on sample sets. Preprocess to improve clarity.
Challenge 2: Rotated or Skewed Tables
Solution: Deskew images before processing; use DetectAreasMode.TABLE.
Challenge 3: Multi-Page Reports
Solution: Add each page as a separate input for batch processing.
Performance Considerations
- Batch process for speed
- Use high-quality scans/photos
- Dispose of OCR objects after large runs
Best Practices
- Validate output before integration
- Tune table recognition settings as needed
- Back up original and digitized data
- Test with real samples before deploying
Advanced Scenarios
Scenario 1: Multi-Language Table Extraction
settings.Language = Language.German;
Scenario 2: Export to JSON for Data Pipelines
foreach (RecognitionResult result in results)
{
result.Save("table_data.json", SaveFormat.Json);
}
Conclusion
Aspose.OCR Table to Text for .NET turns images and scans into actionable, structured table data—ready for analysis, reporting, and automation.
See more table recognition code samples in the Aspose.OCR for .NET API Reference .