Skip to main content

[Note: See the series index for a list of all parts in this series.]

Clipboard08

In part III we looked the interesting part of Excel, Shared Strings, which is just a central store for unique values that the actual spreadsheet cells can map to. Now how do we take that data and combine it with the sheet to get the values?

What makes up a sheet?

First lets look at what a sheet looks like in the package:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2008/2/ac">
<dimension ref="A1:A4" /> 
<sheetViews>
<sheetView tabSelected="1" workbookViewId="0">
<selection activeCell="A5" sqref="A5" /> 
</sheetView>
</sheetViews>
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25" /> 
<sheetData>
<row r="1" spans="1:1" x14ac:dyDescent="0.25">
<c r="A1" t="s">
<v>0</v> 
</c>
</row>
<row r="2" spans="1:1" x14ac:dyDescent="0.25">
<c r="A2" t="s">
<v>1</v> 
</c>
</row>
<row r="3" spans="1:1" x14ac:dyDescent="0.25">
<c r="A3" t="s">
<v>2</v> 
</c>
</row>
<row r="4" spans="1:1" x14ac:dyDescent="0.25">
<c r="A4" t="s">
<v>3</v> 
</c>
</row>
</sheetData>
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" /> 
</worksheet>

Well there is a lot to understand in the XML, but for now we care about the <row> (which is the rows in our speadsheet) and within that the cells which the first one looks like:

<c r="A1" t="s">
<v>0</v> 
</c>

First that t=”s” attribute is very important, it tells us the value is stored in the shared strings. Then the index to the shared string is in the v node, in this example it is index 0. It is also important to note the r attribute for both rows and cells contains the position in the sheet.

As an aside what would this look like if we didn’t use shared strings?

<c r="A1">
<v>Some</v> 
</c>

The v node contains the actual value now and we no longer have the t attribute on the c node.

The foundation code for parsing the data

Now that we understand the structure and we have this Dictionary<int,string> which contains the shared strings we can combine them - but first we need a class to store the data in, then we need to get to the right worksheet part and a way to parse the column and row info, once we have that we can parse the data.

Before we read the data, we need a simple class to put the info into:

public class Cell
{
    public Cell(string column, int row, string data)
    {
        this.Column = column;
        this.Row = row;
        this.Data = data;
    }

    public override string ToString()
    {
       return string.Format("{0}:{1} - {2}", Row, Column, Data);
    }

    public string Column { get; set; }
    public int Row { get; set; }
    public string Data { get; set; }
}

How do we find the right worksheet? In the same way as we did get the shared strings in part II.

private static XElement GetWorksheet(int worksheetID, PackagePartCollection allParts)
{
   PackagePart worksheetPart = (from part in allParts
                                 where part.Uri.OriginalString.Equals(String.Format("/xl/worksheets/sheet{0}.xml", worksheetID))
                                 select part).Single();

    return XElement.Load(XmlReader.Create(worksheetPart.GetStream()));
}

How do we know the column and row? Well the c node has that in the r attribute. We’ll pull that data out as part of getting the data, we just need a small helper function which tells us were the column part ends and the row part begins. Thankfully that is easy since rows are always numbers and columns always letters. The function looks like this:

private static int IndexOfNumber(string value)
{
    for (int counter = 0; counter < value.Length; counter++)
    {
        if (char.IsNumber(value[counter]))
        {
            return counter;
        }
    }
    return 0;
}

Finally - we get the data!

We got the worksheet, then we got the cells using LINQ to XML and then we looped over them in a foreach loop. We then got the location from the r attribute - split it into columns and rows using our helper function and then grabbed the index, which we then go to the shared strings object and retrieve the value. The following code puts all those bits together and should go in your main method:

    List<Cell> parsedCells = new List<Cell>();

    XElement worksheetElement = GetWorksheet(1, allParts);

    IEnumerable<XElement> cells = from c in worksheetElement.Descendants(ExcelNamespaces.excelNamespace + "c")
                                  select c;

    foreach (XElement cell in cells)
    {
        string cellPosition = cell.Attribute("r").Value;
        int index = IndexOfNumber(cellPosition);
        string column = cellPosition.Substring(0, index);
        int row = Convert.ToInt32(cellPosition.Substring(index, cellPosition.Length - index));
        int valueIndex = Convert.ToInt32(cell.Descendants(ExcelNamespaces.excelNamespace +  "v").Single().Value);

        parsedCells.Add(new Cell(column, row, sharedStrings[valueIndex]));
    }

And finally we get a list back with  all the data in a sheet!