[Note: See the series index for a list of all parts in this series.]
To get support for the technologies we will use in this we need to add a few assembly references to our solution:
- WindowsBase.dll
- System.Xml
- System.Xml.Linq
- System.Core
Next make sure you have the following namespaces added to your using/imports:
- System.IO.Packaging: This provides the functionality to open the files.
- System.Xml
- System.Xml.Linq
- System.Linq
- System.IO
Right next there is a XML namespace (not to be confused with .NET code name spaces) we need to use for most of our queries: http://schemas.openxmlformats.org/spreadsheetml/2006/main and there is a second one we will use seldom http://schemas.openxmlformats.org/officeDocument/2006/relationships. So I dumped this into a nice static class as follows:
namespace XlsxWriter { using System.Xml.Linq; internal static class ExcelNamespaces { internal static XNamespace excelNamespace = XNamespace.Get("http://schemas.openxmlformats.org/spreadsheetml/2006/main"); internal static XNamespace excelRelationshipsNamepace = XNamespace.Get("http://schemas.openxmlformats.org/officeDocument/2006/relationships"); } }
Next we need to create an instance of the System.IO.Packaging.Package class (from WindowsBase.dll) and instantiate it by calling the static method Open.
Package xlsxPackage = Package.Open(fileName, FileMode.Open, FileAccess.ReadWrite);
Note: It is at this point that the file is opened, this is important since Excel will LOCK an open file. This is an important issue to be aware of because when you open a file that is locked a lovely exception is thrown. To correct that you must make sure to call the close method on the package, for example:
xlsxPackage.Close();
When you open the XLSX file manually, the first file you’ll see is the [Content_Types].xml file which is a manifest of all the files in the ZIP archive. What is nice with using Packaging is that you can call the GetParts method to get a collection of Parts, which are actually just the files within the XLSX file.
The contents of the XLSX if renamed to a ZIP file and opened.
The various files listed in the [Content_Types].xml file.
What we will use during this is the ContentType parameter to filter the parts to the specific item we want to work with. The second image above to identify the value for the ContentType. For example the ContentType for a worksheet is: application/vnd.openxmlformats-officedocument.speadsheetml.worksheet+xml.
Once we have all the parts of the XLSX file we can navigate through it to get the bits we need to read the content, which involves two steps:
- Finding the shared strings part. This is another XML file which allows for strings of values to shared between worksheets. This is optional for writing, to use but does save space and speed up loading. For reading values it is required as Excel will use it.
- Finding the worksheet that we want to read from, this is a separate part from the shared strings.
Lets start with reading the shared strings part, this will be basis for reading any part later in series. What we need to do is get the first PackagePart with the type: application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml
PackagePart sharedStringsPart = (from part in allParts where part.ContentType.Equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml") select part).Single();
Now we need to get the XML content out of the PackagePart, which is easy with the GetStream method, which we load into an XmlReader so that it can be loaded into a XElement. This is a bit convoluted but it is just one line to get it from one type to another and the benefits of using LINQ to XML are worth it:
XElement sharedStringsElement = XElement.Load(XmlReader.Create(sharedStringsPart.GetStream()));
Now we have the ability to work with the XElement and do some real work. In the next parts, we’ll look at what we can do with it and how to get from a single part to an actual sheet.