Text Extractor Plugin for Aspose.PDF
The Aspose.PDF Text Extractor Plugin for .NET enables developers to extract text content—structured, plain, or as-is—from PDF files. With three extraction modes, it’s ideal for document conversion, data mining, accessibility improvements, and more.
Latest Articles
Aspose.PDF Text Extractor Plugin Key Features
Multiple Extraction Modes Extract text as pure (formatted), raw (as-is), or plain (cleaned) for maximum flexibility.
Batch PDF Processing Add multiple PDFs for simultaneous extraction and streamlined workflows.
Simple .NET Integration Straightforward API—add to any C# or .NET project for rapid deployment.
Getting Started with Aspose.PDF Text Extractor Plugin
Install Aspose.PDF for .NET Add via NuGet or download assemblies to your .NET solution.
Configure Your License Activate for unrestricted processing and support.
Configure Extraction Options Use
TextExtractor
andTextExtractorOptions
classes. Set extraction mode as desired (Pure, Raw, Plain).Process and Retrieve Text Run extraction and access results through the result container collection.
Example: Extract Text from a PDF (C#)
using Aspose.Pdf.Plugins;
var extractor = new TextExtractor();
var options = new TextExtractorOptions(TextExtractorOptions.TextFormattingMode.Pure);
options.AddInput(new FileDataSource(@"C:\Samples\sample.pdf"));
var resultContainer = extractor.Process(options);
string extractedText = resultContainer.ResultCollection[0].ToString();
Console.WriteLine(extractedText);
Example: Batch Extract Text from Multiple PDFs
string[] pdfFiles = { "sample1.pdf", "sample2.pdf" };
var extractor = new TextExtractor();
var options = new TextExtractorOptions(TextExtractorOptions.TextFormattingMode.Raw);
foreach (var file in pdfFiles)
{
options.AddInput(new FileDataSource(file));
}
var resultContainer = extractor.Process(options);
for (int i = 0; i < resultContainer.ResultCollection.Count; i++)
{
string text = resultContainer.ResultCollection[i].ToString();
Console.WriteLine(text);
}
Use Cases & Extensions
- PDF to TXT Conversion: Automate conversion of PDFs to plain text for indexing, search, or archival.
- Data Mining: Extract table data, invoices, or forms for further processing or analytics.
- Accessibility: Prepare readable content for screen readers or alternate formats.
- Batch Processing: Use extraction modes for specific downstream workflows (e.g., OCR pre-processing, entity recognition).
For advanced extraction—such as handling encrypted PDFs, or customizing text output—refer to the official API Reference.
Best Practices
- Always select the extraction mode that matches your output needs (formatting, raw, or clean).
- For large document sets, batch process to maximize throughput and minimize manual effort.
- Test extraction results with real-world PDFs to ensure data accuracy.
Related Resources:
- Aspose.PDF Documentation
- Extract Text from PDF
- Latest Aspose.PDF Blog Posts
- Explore Aspose.PDF Products