Defending your database delivery pipeline against drift
- PART 1: How drift causes database deployments to go off-track
- PART 2: Why drift can be an unavoidable hazard in Database DevOps
- PART 3: Defending your database delivery pipeline against drift (this post)
- SUMMARY
If, after having assessed your environment , you expect that your environment will be affected by drift at some point, you might be thinking about next steps. While there are some useful techniques and tools in this space , these tend to address the symptoms of the problem rather than its underlying causes. In this post I'll look at a couple of ways of dealing with drift in an enterprise environment, including the trade-offs involved with each approach.
The Non-negotiable approach
One method of attempting to tackle drift is to simply adopt a hard-line position on all database changes, decreeing that under no circumstances should the pipeline be bypassed in favour of out-of-process deployments (be they automated or otherwise). The obvious advantage of this type of approach is in its simplicity; by making compliance a clear-cut issue, the scope for misinterpretation is virtually nil. But while clearly communicating expectations is important, towing such a strict line may not necessarily work in your favour.
Why issuing ultimatums can be an own goal
In the previous post , I outlined some of the ways in which a database pipeline can fall short of an enterprise's delivery needs. And while there is a lot of truth to the adage if it hurts, do it more often , attempting to conquer every change use case before firmly grasping the constraints of your environment may turn out to be counterproductive.
What I've witnessed with teams that follow such a strict approach is that it tends to lead to creative workarounds being deployed to production in a covert way (e.g. schema changes performed via application code). While this may reflect deeper cultural issues in an organisation, it's important to consider the team's appetite for large scale change at the outset of any process improvement. The last thing you want to hear when embarking on such an ambitious project is the phrase "let's just go back to the old way", before you've even had a chance to show the value of your pipeline.
Fortunately there is another way to deal with drift that doesn't involve taking such a hard-line stance; instead of requiring a big bang process change to take place, it favours a more flexible approach to how the pipeline is introduced.
The Tolerance-within-limits approach
Rather than seeing drift as something to be avoided at all costs, an alternative is to build a degree of tolerance into the pipeline so it can accommodate changes applied via an alternate deployment method. At first this may seem like giving up and letting chaos reign, but as I hope to illustrate there is a difference between taking a calculated decision to relax the rules versus simply abandoning them altogether.
Understanding your constraints
Say, for example, you anticipate that you pipeline will be able to handle 9 out of 10 of the database change use cases. But every now and again that pesky 1 in 10 comes along that your pipeline doesn't support, for example:
- When a data change requires capabilities that your dev tooling/migration framework doesn't have (e.g. the ability to atomically apply changes across database boundaries)
- When the data change will need to be performed in batches over a number of days (more on this below)
- When a COTS software vendor needs to apply a change to your database with a custom tool
- When you're doing almost anything with SQL replication
Until such time as the pipeline can handle every single one of your deployment use cases (assuming that goal is achievable or even desirable), adopting a more tolerant approach toward drift may be needed.
Pragmatically dealing with drift
Think of it as approaching the problem with an adaptive mindset: by taking a hard look at your pipeline's capabilities (or lack thereof), you'll be in a better position to serve the needs of each change use case that comes along. If you're just starting to think about automating database deployments, this approach also has the benefit of lowering the barrier to entry as you'll be able to start your pipeline off with a reduced scope.
However, creating a drift-tolerant pipeline is not simply about opening up the production environment to any type of ad hoc method of change. It requires assessing the risk of each candidate change use case and putting in place the appropriate checks and balances to prevent impact on the delivery pipeline.
Part 1: Defining the pipeline scope
Start with the default position that everything should go through the delivery pipeline. Then, look at each change outlier on its merits to decide whether it is an acceptable candidate for an alternate deployment method. Consider what kind of impact that drifted objects would have on the ability to effectively develop, test and release new application functionality via the delivery pipeline. For example, including table schema changes and stored procedure/view/function changes in source control can be vitally important to the calling application (not to mention for traceability), so should usually be kept within the pipeline scope.
Conversely, anything that isn't critical to the functionality of the calling application may be a candidate for an alternate deployment method. While generalising about suitable change types is difficult, there are a few specific cases that I've seen work quite well outside of the pipeline:
- Storage object changes: dataspaces like files and partitions may be critical to performance of the live environment, but functionally they may not at all be important to the calling application so can be excluded from the pipeline scope
- Data transformations: if there is a need for data movement to occur over a period of days, then running it through the pipeline might hold up other changes from getting through to Production if the pipeline is not able to handle concurrent deployments, or the tool has a hard timeout (e.g. Azure DevOps will halt a release after 48 hours ).
- Regular maintenance tasks: updating statistics and extending partition ranges are often best left to CRON jobs to perform (although the code for those jobs can always go via the pipeline)
- Security object changes: adding users/groups to roles is often better handled outside the pipeline, especially when production is located in a separate domain. Note: I'd still recommend controlling object-level permissions via the pipeline, to avoid accidental loss of access arising from
DROP
/CREATE
-style changes. Following role-based access control will help keep clear lines between pipeline and non-pipeline changes and also help to prevent permission loss.
When choosing which objects to accommodate via an alternate deployment method, there is a balance between risk and benefits to be struck. Don't be afraid to readjust the pipeline scope as developments tip the scales in one direction over another.
Part 2: Keeping the baseline in-synch
The practice of occasionally refreshing databases within the dev, QA and staging environments with sanitised clones of the Production database is as old as time. Whether or not you consider this a good practice, keeping lifecycle environments in-synch is generally accepted as being critical to reliable test outcomes (i.e. if it worked in dev, it should also work when released to prod). Given that drift has a tendency to arise even when the standard process is stridently followed, refreshing the lifecycle environments in this way is often a necessary evil.
And if it's necessary to refresh lifecycle environments, then it follows that refreshing source control should also be done. Especially when you consider that developers need to be able to rely on the baseline to be an accurate starting point for new changes.
Steps for migration frameworks
If you're using a migration framework for database database (e.g. Flyway, DbUp, Redgate SQL Change Automation etc), the process for synchronizing the baseline will be along these lines:
- In source control, create a new branch at the tag or commit corresponding to the most recent production deployment
- Use your migration framework to deploy the schema from scratch to a test SQL instance
- Use a comparison tool, like Visual Studio Schema Compare or Redgate SQL Compare, to compare the production environment (from a snapshot file, preferably) to your test SQL instance
- Select the drifted objects that you wish to import back into source control, and save the generated change script as a new migration
- Modify the migration to ensure that it is idempotent, in order to prevent the script from attempting to perform the changes in production
- Commit and merge the migration file to trunk
Steps for desired-state tools
If you're using a desired-state tool for database development (e.g. SQL Server Data Tools or SQL Source Control):
- In source control, create a new branch at the tag or commit corresponding to the most recent production deployment
- Use Visual Studio Schema Compare/Redgate SQL Compare to compare the production environment to the project folder containing your object file declarations
- Select the drifted objects and click the Update/Synchronize button
- Commit and merge the file(s) to trunk
Steps for tools that support an external baseline
Tool vendors are starting to realise the value of simplifying the way that drift is managed. One notable example of this is Redgate SQL Change Automation's integration with their SQL Clone product which allows the baseline to be sourced from a full snapshot of the production database, as opposed to storing/maintaining the baseline as a set of migrations (Disclosure: I was previously employed by redgate). Read more about how to setup this integration .
The possibility of using an external baseline is an exciting development that I'll explore more in a later post.
Conclusion
The inevitability of drift requires an open mind about the types of solutions that can help mitigate its effects. Protecting the integrity of the pipeline starts with getting to grips with the limits of its capabilities and then channelling efforts into either addressing those limitations or allowing certain types of changes to take an alternate path to production.
While it might feel unsatisfying to let the occasional change type run outside of the pipeline, remember that it is important to not let perfect be the enemy of the good. A lot of value can be gained through automation even when it is limited in scope, and having the flexibility to choose the right tool for the job can provide significant value in of itself.
Image credit: Albert Pego