Skip to main content

[Note: See the series index for a list of all parts in this series.]

Clipboard08

Excel’s file format is an interesting one compared to the rest of the Office Suite in that it can store data in two places where most others store the data in a single place. The reason Excel supports this is for good performance while keeping the size of the file small. To illustrate the scenario lets pretend we had a single sheet with some info in it:

Clipboard02

Now for each cell we need to process the value and the total size would be 32 characters of data. However with a shared strings model we get something that looks like this:

Clipboard03

The result is the same however we are processing values once and the size is less, in this example 24 characters.

The Excel format is pliable, in that it will let you do either way. Note the Excel client will always use the shared strings method, so for reading you should support it. This brings up an interesting scenario, say you are filling a spreadsheet using direct input and then you open it in Excel, what happens? Well Excel identifies the structure, remaps it automatically and then when the user wishes to close (regardless if they have made a change or not) will prompt them to save the file.

The element we loaded at the end of part 2 is that shared strings file, which in the archive is \xl\sharedstrings.xml. If we look at it, it looks something similar to this:



  
    Some
  
  
    Data
  
  
    Belongs
  
  
    Here
  
Each <t> node is a value and it corresponds to a value in the sheet which we will parse later. The sheet will have a value in it, which is the key to the item in the share string. The key is an zero based index. So in the above example the first <t> node (Some) will be stored as 0, the second (Data) will be 1 and so on. The code to parse it which I wrote looks like this:
private static void ParseSharedStrings(XElement SharedStringsElement, Dictionary<int, string>sharedStrings)
{
    IEnumerable<XElement> sharedStringsElements = from s in SharedStringsElement.Descendants(ExcelNamespaces.excelNamespace + "t")
                                                  select s;

    int Counter = 0;
    foreach (XElement sharedString in sharedStringsElements)
    {
        sharedStrings.Add(Counter, sharedString.Value);
        Counter++;
    }
}

Using this I am parsing the node and putting the results into a Dictionary<int,string>.