Why drift can be an unavoidable hazard in Database DevOps
Drift in database delivery blog series
- PART 1: How drift causes database deployments to go off-track
- PART 2: Why drift can be an unavoidable hazard in Database DevOps (this post)
- PART 3: Defending your database delivery pipeline against drift
In the previous post , I wrote about the effects of schema and data drift and how to detect it. In this post, I'll go into some of its underlying causes, including the various risk factors that make a deployment pipeline (i.e. migration framework and CI/CD tool) become prone to drift and ultimately lead to release failure.
Drift, which occurs when the state of the target database diverges from that of the source control baseline, can happen as a result of changes being applied directly to the database (target drift) or even as a result of changes to code under source control itself (source drift). Categorising the different types of drift in this way can be useful because, even though their effects may be similar, their origins may be vastly different.
When people simply refer to "drift" (at least in the SQL Server world), they're often talking about the type that arises when someone or something makes a change directly to a database. I call this target drift because it occurs in your deployment pipeline's target environment.
Target drift can occur when a change is made to schema or data within a live database, such as:
- When an administration tool used by the business updates or inserts reference data like feature toggles, tax rates or product metadata
- When a security team member grants access to objects to individual users or groups, instead of following role-based access control (RBAC)
- When a DBA adds a performance index or enables certain features in Production only (e.g. replication or change tracking)
- When a periodic maintenance job makes a change to a partition scheme, an index or a sequence object
- When a business support process results in a new table being created for a customer data hotfix
Source drift is distinct from target drift in that it originates in source control, rather than in the target database. In many ways, it can be harder to anticipate as often the changes are not made intentionally but rather are a side-effect of a shortcut or a workaround.
Source drift can occur:
- When someone writes a migration that results in a slightly different schema being generated in dev vs prod, e.g. omitting a name for primary key constraint, which in SQL Server results in a random-generated name being assigned at deploy time
- When migrations are deployed to Production out-of-sequence: if
005.sqlis deployed before
004.sqlis even merged to trunk, drift may arise if the scripts change the same object (e.g. as with
ALTER VIEWstatements with differing bodies)
- When someone makes a change to code that has already been deployed to prod. I've seen this happen when someone has trouble geting the database to deploy locally, e.g. they work around a missing engine feature or server-level dependency on their machine by deleting migrations until the deloyment succeeds, and then accidentally commit the changes)
Erosion of the single source of truth
The fact that drift can originate in either during development or in the Production environment itself demonstrates that everyone involved in the delivery and operation of a system can have an effect on the health of the database pipeline.
In how drift causes database deployments to go off-track , I covered what can happen to a delivery pipeline when drift starts to take hold. Drift can be especially insidious in that the point which drift occurs can be so widely separated from the failed deployment it inevitably causes: like a leaking pipe in a bathroom wall, the effect of drift may only be known once the rotten floorboards of your source control baseline finally give way after years of decay.
Without an understanding of the causes of drift, it's effects can be difficult to foresee and even harder to prevent.
A good place to start to understand where drift comes from is to consider the motives of someone wanting to bypass your pipeline in the first place. This desire may originate due to one or more of the following factors:
- Pipeline limitations: Perhaps a change needs to happen in multiple stages (e.g. separate SQL scripts to be run before and after app deployment), or maybe your migration framework is only capable of running SQL code, but the change requires bash or powershell to be applied. Or perhaps the change is part of a larger data migration involving multiple databases that will take place over days. What is the change author to do in this situation?
- Baseline limitations: When you originally created your source control baseline, did you include schema and data, or just schema? Did the data include both reference data and any supporting seed data? Think about what will happen if the change author needs to write a migration that performs an
INSERTto a table where a foreign key constraint is present. If you chose to only include schema in your baseline, consider how the change author will make this work in both a 'clean' environment as well as during the Production deployment.
- Platform limitations: Anyone who has ever tried to script changes to SQL replication objects will know the pain of trying to automate changes to schema where there are publications bound to it. In spite of the best efforts of the change author, it may be hard to justify spending inordinate amounts of time squishing the square-shaped task into the round-shaped hole of the pipeline.
- Rollout limitations: Is everyone who is involved in making database changes on-board with the pipeline process and tools, and (for commercial change automation tools) are there enough licences to go around for everyone?
If it turns out that not all types of changes can be deployed through your pipeline, then it's inevitable that out-of-process changes will start to happen and the baseline may quietly begin to erode.
Quick drift assessment
To assess how likely your pipeline is to be bypassed in favour of an out-of-process deployment, consider the following:
- How well have you understood all the various ways in which your database will need to have changes applied to it?
- Has everyone in your organisation agreed to use the pipeline for all such changes? If not, what is preventing them from doing so?
- Are there any external vectors of change? (e.g. a COTS software vendor or third-party tool with privileged access to your environment)
- Is everyone involved in making changes to the database well versed in the pipeline? Do they understand the concepts behind your chosen migration framework and know how to use Git/CI/CD effectively?
- Do the schedules of all change types align with your release cadence? i.e. if you usually release on Wednesdays, but a data change must go in on a Monday, will the pipeline (including the associated branching strategy) cater for an out-of-cycle deployment?
Sometimes peeking behind the curtain and seeing what's really going on can be overwhelming. But rather than be demoralised, seeing more clearly as to how drift is originating in your environment is an important first step in deciding how to deal with it.
In the next post , I'll explore a way of dealing with drift that might seem counter-intuitive at first. It involves a shift in mindset from thinking of drift as something to be avoided at all costs, to seeing it as something to be tolerated within well-defined limits.
Image credit: Yuri Arcurs