How to Convert Word Document to HTML Using C#

How to Convert Word Document to HTML Using C#

In this short tutorial, we will learn how to convert a Word document to HTML using C#. This conversion can be efficiently performed on any operating system like macOS, Linux, or MS Windows where the .NET framework is configured.

Benefits of Converting Word Documents to HTML

  1. Web Compatibility:
    • HTML files can be easily displayed on web browsers.
  2. Formatting Retention:
    • Most of the formatting from the Word document can be preserved in the HTML output.
  3. Cross-Platform Functionality:
    • HTML files can be accessed from various platforms without compatibility issues.

Step-by-Step Guide to Convert Word Document to HTML

Step 1: Install Aspose.Words Library

Install the Aspose.Words library using the NuGet package manager to facilitate the conversion.

Install-Package Aspose.Words

Step 2: Load the DOCX File

Instantiate the Document class to load the Word document you want to convert.

Document doc = new Document("Input.docx");

Step 3: Specify HtmlSaveOptions Properties

Set different properties using the HtmlSaveOptions class for the conversion.

HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.CssStyleSheetType = CssStyleSheetType.Inline;
saveOptions.ExportPageMargins = true;
saveOptions.ImageResolution = 90;

Step 4: Save the Output as HTML

Use the Document.Save method to save the file in HTML format.

doc.Save("HtmlSaveOptions.html", saveOptions);

Example Code to Convert DOCX to HTML

Here’s a complete example that puts all the steps together:

// Load input Word DOCX file
Document doc = new Document("Input.docx");

// Set different properties of HtmlSaveOptions class
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.CssStyleSheetType = CssStyleSheetType.Inline;
saveOptions.ExportPageMargins = true;
saveOptions.ImageResolution = 90;

// Save output HTML
doc.Save("HtmlSaveOptions.html", saveOptions);

Conclusion

In this tutorial, we have explored how to convert a Word document to HTML using C#. By following the steps outlined, you can customize the conversion settings to meet your requirements, ensuring that your HTML retains the integrity of the original document. For further exploration, consider referring to materials on converting DOCX files to other formats like TIFF.

 English