Modeling Inheritance Resolution via Sqlite

Haritha Mohan,Sun Jan 21 2024•design architecture performance

Recently, I worked on an effort to bring intellisense support to the Xamarin.Mac/iOS SDKs and improve our documentation coverage.

The foundation of this effort rested in modeling our APIs in a SQLite Database via the .NET API wrapper for it- System.Data.Sqlite.

All was going well until it came time to model inheritance based relationships.

The resolution of such relationships was something that had to be done dynamically; we wanted to flatten the otherwise nested and interconnectedness of inheritance and provide more direct entires in the database so the process of generating intenseness from the database could be more straightforward and maintainable.

Brute Force

Approach 0 was simply just trying to brute force the thing. Iterating through the existing entries in the database and doing a mass INSERT. This was the equivalent of just a shot in the dark, in an effort to start somewhere and see where it took us.

For starters it did not address the entirety of the problem. Inheritance is not just a one and done situation…such relationships can be multi-tiered and each level should be accurately accounted for.

Moreover, it was also quite inefficient. There were no means of organization or structure to this approach, costing us a lot of precious time and resource for, again, a partially-complete solution.

Investigating!

After doing research and some experimenting, I utilized some neat features to not only thoroughly address the complexity of the issue but also making the approach as time and resource efficient as possible.

The features in discussion:

Closure table
Triggers
Custom Indexing
I/O in memory, backup to disk

In utilizing these tools, we went from a partially complete, ineffective solution that took ~1 hour to run to a complete solution that finishes in a few minutes.

Closure Table

The premise of the problem is that we wanted a clean way of modeling these hierarchal relationships in a database. Turns out there’s quite a few ways to do this: adjacency lists, nested set model, and closure tables 1.

I chose the closure table approach because it “is a design for representing trees in a relational database by storing all the paths between tree nodes” 2.

And considering this closure table wasn’t our end goal but a means of getting to what we need (after modeling the relationship successfully, we wanted to add the updated entries back into the original database), this seemed like the best approach that would set the stage for what was to follow.

Also heavy on the I adapted this approach. My implementation involved establishing two intermediary tables: one for the “children” and for the “parent” of these inheritance relationships.

The children table represents the leaves, the ends of the relationship tree aka we know for sure they are the last item to be accounted for in that specific relationship line. However, the parents table is a bit more complicated a parent in one aspect of the relationship could be considered the children of another branch in the same relationship if moved up closer to the root.

So, the parent table has an extra column to account for this added complexity: weight. As a relationship is built in these tables, the weight of the corresponding parent will increase. But the same logic will work in reverse as the relationship is addressed (the weight will decrement). This is where our next feature comes into play: triggers.

Triggers

So as relationships are added or removed from the intermediary children and parent tables, the weight column should accurately reflect that. To model this dynamic behavior, triggers proved to be the perfect tool for that 3,4.

So as the program progressed the intermediary tables would self-manage and we know the entirety of the inheritance relationships have been addressed when the intermediary tables are empty.

Now the solution had complete coverage…but performance was something to still be addressed.

Perf Optimizations: Custom Indexing and In Memory

This weighted approach is contingent upon updating the table as we progress so though batch processing is utilized as much as possible, there is still a significant increase in I/O operations- and doing it all on disk is quite the time sink.

Since I/O on RAM is a lot faster than the disk, an in-memory sqlite database is created for the program’s duration so all I/O is done in memory and we do a single write at the conclusion of the program - which results in a single write to disk instead of the many, many intermediary ones 5.

So the takeaway result remains the same but much more performant as we still have the complete database saved to disk at the end.

Another performance enhancer was implementing custom indexing mechanisms to reduce search operations from O(n) to O(log n) 6,7.

Conclusion

This was a fun investigative experience for me- a chance to explore how far I can push the boundaries of a structure to best adapt it for my needs. If you are searching for some clever ways to optimize your database performance, I hope you found this post helpful!

References

[1] https://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ (opens in a new tab)

[2] https://karwin.blogspot.com/2010/03/rendering-trees-with-closure-tables.html (opens in a new tab)

[3] https://www.sqlshack.com/learn-sql-sql-triggers/ (opens in a new tab)

[4] https://www.sqlite.org/lang_createtrigger.html (opens in a new tab)

[5] https://blog.xojo.com/2019/02/11/tip-sqlite-in-ram-to-improve-speed/ (opens in a new tab)

[6] https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-indexes-indexes-c4e175f3c346 (opens in a new tab)

[7] https://www.sqlitetutorial.net/sqlite-index/ (opens in a new tab)