Note: This is part of a series, you can find the rest of the parts in the series index.
One of the design decisions for Pull, is that it should just run without the need for an install. This requirement means that everything it needs to run should be available all the time and this brought an interesting problem to my design. The issue was that I needed a database to store all the info Pull uses (podcasts, episodes etc…) but I couldn’t require people to install SQL Server Express or PostgreSQL.
The solution was to use a file based database called SQLite, which requires just a few dll files to provide all the database functionality I would expect to my application to use but without the need for a database server. As this is a .NET application I used one of the wrappers for .NET called System.Data.SQLite. Usage with System.Data.SQLite can be via ADO.NET like code or via the Entity Framework.
I initially used EF for my development, but I found a mass of issues due to the assumptions that EF makes about when to open/close connections. These assumptions make sense when I have a DB server, but when I have a file which will be called from multiple threads those assumptions come back to bite and caused me no end of pain.
What I ended up doing to solve it was to build my own ORM which made use of reflection to handle mapping to classes, ala EF, but used a static instance and was very eager to close the connection to the file. These changes solved major issues with stability caused by EF while still providing a decent developer experience.
Mapping
Mapping of my classes to the database is done using a simple attribute which has a name and primary key indicator. For example the log class looks like the following. Note that I use the same attribute for columns and tables as I assume Class=Table and Property=Column. In the following all the properties names match the DB column names, but this is not a requirement.
[DataStore(Name = "Log")] internal class Log { [DataStore(Name = "PK", PrimaryKey = true)] public Guid PK { get; set; } [DataStore(Name = "Source")] public string Source { get; set; } [DataStore(Name = "Occured")] public DateTime Occured { get; set; } [DataStore(Name = "StackTrace")] public string StackTrace { get; set; } [DataStore(Name = "Message")] public string Message { get; set; } }
Using that information, I can take a object and using reflection build up the SQL needed. An example of this is how I create the update command:
private static SQLiteCommand ConvertToUpdateCommand<T>(T item) { SQLiteCommand command = new SQLiteCommand(); string insertCommandText = string.Format(CultureInfo.CurrentCulture, "UPDATE [{0}] SET ", ((DataStoreAttribute)typeof(T).GetCustomAttributes(typeof(DataStoreAttribute), false)[0]).Name); int parameterCounter = 0; object PKValue = null; string PKColumn = string.Empty; GetAttributedProperties(typeof(T), (property, attribute) => { if (!attribute.PrimaryKey) { insertCommandText += string.Format(CultureInfo.CurrentCulture, "[{0}]=@A{1}, ", attribute.Name, parameterCounter); command.Parameters.AddWithValue(string.Format(CultureInfo.CurrentCulture, "A{0}", parameterCounter), property.GetValue(item, null)); parameterCounter++; } else { PKValue = property.GetValue(item, null); PKColumn = attribute.Name; } }); insertCommandText = insertCommandText.Remove(insertCommandText.Length - 2); insertCommandText += string.Format(CultureInfo.CurrentCulture, " WHERE [{0}]=@PK", PKColumn); command.Parameters.AddWithValue("PK", PKValue); command.CommandText = insertCommandText; return command; }
Final Thoughts
I have been very happy with SQLite as a database, but the learning curve has been tough for someone coming from a database server history. Once I understood the limitations and differences to DB servers it has been a great experience with SQLite.
I’ve been very happy with my ORM as it has made the development very easy working with proper objects and really hiding all the DB functionality away. There are some performance issues in the ORM which I have not solved and I am not feeling a major rush to solve either because the sizes of data in Pull are fairly light to start with. Examples of those performance issues is
- I have no way to batch commands in one transaction. So for inserting podcast episodes, for each episode I create a new command, a new transaction, open the DB, run command, close DB and transaction. What I should have is a way to create a bunch of commands and then run them all using one transaction and one open/close of the DB.
- Updating updates every field in the object, regardless if it has changed. This is because I have no object tracking implemented to figure this out.
Despite these the system still runs really well and overall I am happy with the choice of DB.