The concept of a DBA – database administrator, has practically gone out of style as a full time job. DBA work, if it is being done, is handled by someone or something else, perhaps in a more vanilla way that works well enough for most systems. However, if the DBA work is being ignored then technical debt is silently piling up.
There are still plenty of DBAs (about 120k) in the basements of large companies and government entities. It is just that growth in this area has really dropped off in spite of the fact it is a critical function of any data driven application. This is due to technological advancement (getting easier to run databases), and lack of standards (not anyone’s problem if the data is wrong from time to time). Still I think the Bureau of Labor Statistics is overly optimistic in its estimate that DBA positions will grow 11% between 2014-2024. Compare that to the 1.1M software developer positions with estimated growth of 17% between 2014-2024.
So what exactly was a DBA and what did they do?
In the old days (80’s, 90’s, 00’s) the database administrators (DBAs) were in charge of all things database related.
- They made sure the database server was running correctly.
- They controlled how the data itself was structured so it could be stored efficiently.
- They made sure the maintenance scripts, upgrades and backups were running correctly.
Those three tasks are still pretty darned important to a successful system! Given that DBAs are not part of most software teams anymore, are we doing it right?
1) So who is now making sure the server is running correctly?
It used to be a lot harder to make a databases run smoothly. RAID arrays had to be custom configured. There were many arcane commands just to get the database to run on the network. Default block sizes had to align, custom configuration for memory, IO, etc.
Database setup has for the most part been handed to IT / DevOps / The Cloud. AWS RDS, for one example, makes databases an on demand commodity service (Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server). In this regard, life couldn’t be easier for application developers to spin up a production ready database.
The default configuration is still something that needs to be checked over and tuned based on the amount of memory on the box but that can be done pretty easily in an afternoon by an application developer or other IT staff member.
2) So who is now making sure the data is ‘structured efficiently’?
In today’s world the words ‘structured efficiently’ amount to a loaded term. It could mean anything to anyone. But if you look around the room and don’t see anybody doing this, then it’s probably a good idea to log a task under technical debt and check into it.
It used to be that data storage was fairly expensive. Application developers were aware of the issue but not necessarily responsible for optimizing the cost of the system. DBAs had control here, and they earned their keep on that front alone.
With cheap storage, lots of RAM, and virtually unlimited bandwidth, efficiency at the level of the 1s and 0s is not as critical as it used to be for a general run of the mill application. To many business level decision makers, what makes something efficient is how soon it can launch. In today’s world the cost of data storage is essentially a rounding error in a company’s overall budget. So again, no real pressing need for a DBA. However, I think that without someone who knows their stuff at the wheel to make sure the data is sane, problems can crop up especially with maintainability.
There are a number of best practices to follow when it comes to storing data. It really varies by platform, for example with a relational database, indexes, normalization, and eliminating stale data is super important. In the NoSQL world though duplication of data is expected and plays into performance goals.
One thing I see all the time in naive database designs is treating what is really historical data as a source of authority. It leads to huge problems. For example, let’s say you have an invoice from last week that links to a customer. What if the customer changes their address tomorrow. Should the invoice change? No! The invoice is a historical record and still needs to reference the customer, plus the address they had at the time they placed the order. Noticing slowly changing dimensions is another way DBAs earned their keep. Unfortunately slowly changing dimensions isn’t taught in school or isn’t appreciated from the application development perspective.
Another thing that sometimes worries me is handing over the responsibility for the database schema to a web framework. Without a DBA application developers are expected to handle database design and schema changes. Frameworks like Rails and Django come with built in tools to do it for you. That approach works up to a point. But in fact, many frameworks will do really stupid things when it comes to JOINs or the ability to build reports that a true DBA would laugh at. In my opinion it is best to fully understand what is going on under the hood when a framework takes over the responsibility of your database structure.
Sometimes the DBA is helpful, but sometimes they are just doing stuff to make their job seem important. I recall one DBA required that all tables be prefixed with ‘t_’. All fields had to be prefixed with ‘f_’ and end with an underscore and a character that denoted the field’s data type (i for int, c for character, d for date time, etc). The DBA also wanted views prefixed with ‘v_’ and stored procedures prefixed with ‘sp_’.
For example, let’s say there was a user table with columns user id and username, the DBA wanted to see:
Compared this to how I’d typically do it, no prefixes or suffixes:
The column [f_user_id_i] is the kind of fussy standards DBAs were getting paid to enforce corporate wide in the late 90’s early 2000’s. I can appreciate standards, but only when they add value. Adding readily available meta data to the name of a thing is redundant. It also makes application code and SQL painful to read. So in this respect, I’m really happy I now get to use the shorter, more concise form [user].[user_id] in my code without a DBA lording over it.
3) Who makes sure maintenance scripts, upgrades and backups are running correctly?
Sadly, this one is often neglected unless there is a dedicated sysadmin / IT / devops team watching over it. Again if you look around the room and nobody is doing it, then you probably need to get it on the schedule fast. When it comes to an out of date database engine – what you have is technical debt of the most easily combustible kind.
It is becoming more and more popular for application developers to be responsible for running the systems they write. This is the approach I advocate for with my customers because I want to be responsible for what I create. I want to fix bugs first. I want to keep the system running perfect. Tasking application developers with db maintenance also allows the business to eliminate the DBA position. Sad, but again systems are getting easier to maintain. AWS RDS does maintenance patches for you in your sleep (for the most part).
Unfortunately a lot of business people look at a software system like a fridge – something you plug in and leave for 20 years. The reality it software systems are more like custom race cars, one of a kind, built for a specific purpose, and high maintenance. The data in your application is sort of like the oil in the car, it flows through the system, needs to be cared for, and should not be ignored or left leaking!
The DBA may be gone, or at least morphed into a part time system admin part time DBA, but someone needs to tasked with treating the data like gold.