Why starting off with a solid foundation is important

Haritha Mohan,debuggingdesign

Not to be dramatic but making sure your application has a strong foundation from the start is vital to its success, maintainability, and scalability..kind of like how a better design and structural foundation would have saved the Tacoma Narrows bridge (in my state of Washington, actually) from this:

Photo

Some context first...

Recently, I have been working on a command line tool to make retrieving documentation regarding our SDK possible in a clean and efficient way.

A key part of this tool involves fetching data from an external source and storing it in a local database (like a cache) to ensure that this expensive process of fetching only has to be done in a single go- upon initiation of the application.

The process is quite repetitive: We identify the entity in question that we are seeking documentation data for. Parse the data from the external data source. Insert the parsed data into appropriate columns in a table in the local database that we are using to mimic a cache.

So now it’s a matter of how can we make this repetitive process as fast and efficient as possible? Initially, one of the key tactics implemented was enabling asynchronicity. Utilizing multiple threads definitely leveled up the application. Rather than having the process sit idle as each line of data was read, parsed, and inserted much of this process flow was done asynchronously.

After the first iteration of the solution…

When the entirety of the data set of interest was parsed and inserted appropriately into the database, it took around 6 minutes to complete (In case you are curious, you can use the (“time [command]” to check how long it takes for a process to run).

So a couple things:

Getting to a functioning solution is a good start, but achieving a performant, clean, and efficient solution is even better. Had this been an application just dealing with a mere few lines of data and insertions, the effects would not have been noticeable but at this scale investing effort in making the code performant pays off significantly.

So what changed to whittle away the app’s completion time?

...to just a mere number of seconds (18 to be exact)?

IAsyncEnumerable

Implementing the class that was handling the data loading from the external source to follow the IAsyncEnumerable interface (opens in a new tab) enabled us to store the results of the data being parsed in a clean enumerator collection to reference later when inserting the data into the database.

Re-engineering the flow

Rather than opening a new connection the database for every new line of data that was parsed, and having the data flow be like:

Improved Flow:

So now, there is only 1 connection being opened to the local database for the entirety of the insertion of data process as opposed to opening a separate connection per line of data needing to be inserted (1 vs. ~190k in my case..)

Utilizing transactions

The System.Data.SQLite nuget package is used to handle the interactions with the local database. The beauty of these databases is that they are ACID- an attribute that should not be slept on.

Without employing the use of a transaction, it causes the results of each individual command execution to be returned which in this case is a lot of unnecessary overhead. So this helps to make the mass-handling of data a lot more efficient like we are now (again..190k insertions..)

Using a prepared command statement (vs. creating a new custom command/insert)

At the end of the day, the insertion process is identical for all 190k inserts with the exception of the data parameter values changing, so no need to generate a new command from scratch each time when parameterized commands exist!

It's a wrap

So the fix wasn’t just a simple “let’s change a method” or “let’s just make this async”, but looking deeper to see how the logic can be re-engineered to be more efficient. At the end of the day the tool at the beginning did get the job done- the cache database was loaded with the necessary data extracted from the external source. But by optimizing the solution, users get a faster (and simultaneously a much more pleasant) product experience and the codebase has a more solid foundation to be improvised upon in forthcoming iterations. As I mentioned earlier, this step is merely the first of many. So the more concise and well-architectured the solution can be now is paving the way and setting the expectation for future logic and capabilities to be built upon. If the foundation now is unstable, you are essentially setting the rest of the solution to follow suit. Good code now leads to better code later, bad code now leads to worse code later.

Going back to the original example of the Tacoma Narrows Bridge mentioned at the beginning, Insider (opens in a new tab) provides some interesting examples of when certain engineering decisions were made as a means of either reducing cost, taking shortcuts on labor, among others that ultimately led to the downfall (sorry this is serious, no pun intended) of massive architectural projects. Something to keep in mind when making your own engineering decisions in the future…

I thought the way this optimization was implemented was interesting to follow. Shouts out to @Manuel (opens in a new tab) for being the brains behind the fix.

© Haritha Mohan.RSS