Get Data From .docx file like one big String in C#

I want to read data - like string, from .docx file from C# code. I look through some of the issues but didn't understand which one to use.

I'm trying to use ApplicationClass Application = new ApplicationClass(); but I get t


The type 'Microsoft.Office.Interop.Word.ApplicationClass' has no constructors defined

And I want to get full text from my docx file, NOT SEPARATED WORDS !

foreach (FileInfo f in docFiles)
    Application wo = new Application();
    object nullobj = Missing.Value;
    object file = f.FullName;
    Document doc = wo.Documents.Open(ref file, .... . . ref nullobj);
    doc. == ??    

I want to know how can I get whole text from docx file?



Word.Application interface instead of ApplicationClass. 

Understanding Office Primary Interop Assembly Classes and Interfaces

This Is what I want to extract whole text from docx file !

    using (ZipFile zip = ZipFile.Read(filename))
    MemoryStream stream = new MemoryStream();
    zip.Extract(@"word/document.xml", stream);
    stream.Seek(0, SeekOrigin.Begin); 
    XmlDocument xmldoc = new XmlDocument();
    string PlainTextContent = xmldoc.DocumentElement.InnerText;

First you need to add some references from assemblies such as:


Second you should be certain of calling these using in your class:

using System.IO;
using System.IO.Compression;
using System.Xml;

Then you can use below code:

public string DocxToString(string docxPath)
    // Destination of your extraction directory
    string extractDir = Path.GetDirectoryName(docxPath) + "\\" + Path.GetFileName(docxPath) + ".tmp";
    // Delete old extraction directory
    if (Directory.Exists(extractDir)) Directory.Delete(extractDir, true);
    // Extract all of media an xml document in your destination directory
    ZipFile.ExtractToDirectory(docxPath, extractDir);

    XmlDocument xmldoc = new XmlDocument();
    // Load XML file contains all of your document text from the extracted XML file
    xmldoc.Load(extractDir + "\\word\\document.xml");
    // Delete extraction directory
    Directory.Delete(extractDir, true);
    // Read all text of your document from the XML
    return xmldoc.DocumentElement.InnerText;


The .docx format as the other Microsoft Office files that end with "x" is simply a ZIP package that you can open/modify/compress.

So use an Office Open XML library like this.


Make sure you are using .Net Framework 4.5.

using NUnit.Framework;
    public class GetDocxInnerTextTestFixture
        private string _inputFilepath = @"../../TestFixtures/TestFiles/input.docx";

        public void GetDocxInnerText()
            string documentText = DocxInnerTextReader.GetDocxInnerText(_inputFilepath);

            Assert.IsTrue(documentText.Length > 0);

using System.IO;
using System.IO.Compression;
using System.Xml;
    public static class DocxInnerTextReader
        public static string GetDocxInnerText(string docxFilepath)
            string folder = Path.GetDirectoryName(docxFilepath);
            string extractionFolder = folder + "\\extraction";

            if (Directory.Exists(extractionFolder))
                Directory.Delete(extractionFolder, true);

            ZipFile.ExtractToDirectory(docxFilepath, extractionFolder);
            string xmlFilepath = extractionFolder + "\\word\\document.xml";

            var xmldoc = new XmlDocument();

            return xmldoc.DocumentElement.InnerText;

