> These days, the technology decision maker is the dude with Sublime Text open and a cloud control panel up in Chrome.
And when he is successful and gets clients, and a few thousand rows in his database, he realizes that he needs someone to keep that database alive. He needs someone to figure out how to make the cartesian product queries he's written into efficient queries.
At first, he hires a consultant for a few one-off gigs. However, then he's paying someone $200[1] an hour, typically with 8-16 hour engagements. After getting sick of that cost, and still lacking any kind of long term caring about his product, he comes to our team, and hires us to be his DBA, albeit remotely.
Business as a DBA is booming. Nobody thinks they need a DBA, but the reality is that you really can't afford to not have a DBA. We have customers coming on board with no backups, no high availability plans, no disaster recovery plans, queries that are performing cartesian products (and thus taking minutes against very small datasets), and no monitoring. (And yes, a good portion of users come to us while using the "solution" proposed by the OP (like AWS RDS), for many the same problems.)
We set them up with comprehensive backups, automated failover solutions, and 24x7 monitoring. Suddenly, their DB is no longer the primary source of downtime. They're no longer loosing customer engagement because their frontend takes seconds to render. They're no longer in the position of loosing their entire company because some junior developer accidentally dropped their users table in production.
In short, DBAs are a required part of your business, if you're using a database. You just haven't been burned bad enough by a poor database setup to realize it.
[1] Actual hourly rates for a planned engagement. Emergency rates are closer to $450 an hour. Why so much? You can't get a DBA from a college, from a technical school, or from any other form of formal education. Most DBAs these days are grown internally from developers or system administrators who decide to (or are forced to) specialize while on the job. There are single-digit thousands of us world wide, and we're in high demand.
So perhaps the role of the DBA isn’t necessarily dead, it’s just moved
to its new home at the datastore-as-a-service provider. The successful
DBA will understand that this new world means handling petabytes of
data and billions of operations on thousands of logical databases.
They will cope with less mature database technologies in increasingly
difficult workload environments. They will automate or die.
Long live the DBA.
The whole post was making the exact same point as you in the end, the author actually runs a datastore-as-a-service business.> The successful DBA will understand that this new world means handling petabytes of data and billions of operations on thousands of logical databases.
is untrue. Most of our customers have DBs that are in the GB size. A few have TB size DBs, and none are on that scale.
datastore-as-a-service doesn't replace DBAs - we make good money being remote DBAs for people who are using datastore-as-a-service providers, because they still run into the same problems as everyone else.
I wish our DBAs were like that. I am a developer with Sublime Text. I have to make the queries fast. I have to design good indexes.
They can complain if a query is slow, but they never actually help to fix it. They only have to make and restore backups when disks die.
Sorry to hear it, either way.
> You can't get a DBA from a college, from a technical school, or from any other form of formal education. Most DBAs these days are grown internally from developers or system administrators who decide to (or are forced to) specialize while on the job.
I can't say how much truth is in this statement. This stuff if learned organically by doing it on real world projects. It's scary at times to think that you just can't teach this stuff.
Yup. And with the supply being so low, it's hard to get a DBA (you'll probably have to steal one from another business), so people are flocking more and more to DBaaS, and DBaaS providers are more than happy to propagate the fiction that "you don't need a DBA, you have us!".
It honestly doesn't bother me much that they make these statements; it's marketing.
On the other hand, believing those statements harm our customers; they spend time and money to migrate to these providers and find out the hard way that they still need someone who can handle their DBs for them. That does bother me.
Databases are becoming pretty good at managing themselves and the marginal performance gains from tuning usually are easily offset by throwing bigger kit at the problem or throwing more cash at the plan you are on.
More memory? Increase the buffer pool size.
Faster HDD? Tweak the settings that determine how many disk operations are attempted every second.
Bigger CPU? Figure out the point of diminishing returns on the number of CPU cores for your DB, and start sharding onto multiple DBs to make sure you can use all of the cores.
SAN? But I thought you wanted performance. ;)
Plus, what gives you the best DB kit for the buck? I could probably tell you that (I am a DBA, and get paid to answer those questions), but do you know? Do you know where to find out?
But Oracle needs the finetuning, and I can't see that changing anytime soon, with Oracle some queries basically require that you use a IOT, other some are better with partitioned storage, and so on.
So your premise is extremely dependent on what DB you use.
Is ETL considered a DBA task?
Do you think the overlapping speaks for the decrement of either profession specialization?
In other words, efficiency and performance both depend heavily on the machine your DB is running on.
As such, a good DBA needs to be able to do sysadmin tasks. The business won't care that it was the sysadmin's fault for not realizing that a battery had gone dead on the raid controller, and a DBA shouldn't care either. Their purvue is the database, and everything that it entails.
I agree with you on above statement. Though my conclusion is different from yours. I think in the future the line among DBA/SysAdmin/Developers will become even more blurred , developers will be trained/required to take over more and more work from DBA and sysadmin (DevOps anyone?); consequently, the demand for dedicated roles such as DBA and sysadmins will diminish. Hope I am wrong though.
Given that background I still think that the topic of databases is just too deep for a generalist. I know a lot about the MySQL database (and a little about PostgreSQL), enough to write failover software, automate deployments, write guardian crons which slap down problematic queries & pre-emptively, automate backups, do vip failover and haproxy configuration... and I still have to go to my boss for most of the hard questions.
His knowledge encapsulates 12 years of working with and around MySQL, and it's proven invaluable to our customers. Knowing when to force certain optimizations, how to make subqueries run O(1) vs O(n), how to rebuild a complete database from binary logs, how to configure MySQL to work with SSD caches... these problems don't come up often, but when they do, not having a DBA available to you means contracting out to one at exorbitant rates.
It's the difference between a few minutes of downtime when the proverbial dung hits the fan, versus a few hours or days.
Start with the book High Performance MySQL. [1]
Follow up with the whitepaper "Causes of Downtime". [2]
Then find a copy of the IMDB dataset, put that in a database, and write an app against it. Make that app perform well, then simulate load against the app (pretend it hit the top of Reddit and Hacker News simultaneously), and keep it performing well.
After that, it's a matter of practical practice.
[1] http://www.amazon.com/High-Performance-MySQL-Optimization-Re...
[2] http://www.percona.com/redir/files/white-papers/causes-of-do...
Thus your databases will be in-house, thus you will hire one or more DBAs and depend upon their expertise.
The regulations will change with the times, hopefully.
We stored little more than names/emails and non-identifiable/non-sensitive data at my last job. The 'security auditor' for one client wouldn't let them sign with us because our servers were hosted at Rackspace, so the servers were not under our control and a Rackspace employee could access our data, since they managed our servers.
And who wants to do that?
It seems cloud/SaaS is the new answer to throwing hardware at the problem. Something about the inefficiency grinds at me. I do get at certain scales, it makes sense.. But it's not always obvious where that line is.
Moving to BYOD will continue to reduce the numbers of administrators as well. Eventually you'll have two groups of IT staff, a very small group of high-level engineers who build and implement everything, and then a very large group of low-skill helpdesk type people who reset your accounts and fill in your login information on your device.
Amazon's Mayday support service shows this already. Soon, a form of this will be on every product you can think of. Office, Windows, every tablet and computer in your office.
There will always be a place in the world for engineers who understand how to get the most of out an internal combustion engine and ways to improve it, just like for database engineers. But we don't all need our car in the shop twice a month having parts tuned. Nor do the vast majority of databases hitting performance limits that would require full-time DBAs.
Consulting is a great business decision when you reach hundreds of GB in data. In-house when your tossing around Tera or Peta-btyes. By the way, I can whole-heartedly recommend http://www.pgexperts.com if your having Postgres issues. After speaking with a few of their core engineers at DjangoCon 2013 I would consult with them anytime I had PG performance issues.
Um, it would only be surprising if they did, that would surely indicate that his company wasn't doing a good job.
But most of these folks didn't have DBA's or even OPS engineers before they became customers.
Mostly they are teams of developers, they have developed the application and have (I think correctly) worried about the customer experience, code base, IP, performance, etc.
They do want the database driver API (in this case MongoDB) to just work beyond the interface. Backups, scaling, filesystems, etc they want to be part of the service, and just handled.
This is true for new business and it's true for businesses that are developing new apps or new projects or even projects that can't scale out on other infrastructure.
Full disclosure: I wrote the article in question ;-).
I somewhat agree with the conclusion which essentially says having your on premise DBA is on the decline.
Hiring DBAs has always been tricky for employers anyway. It's a position of responsibility (to protect the business' data) that is hard to know you've hired the right person for, difficult to replace, and difficult to allow to take any vacation/leave since there will seldom be more than one.
What I don't agree with in the conclusion is that by moving off premise it's all moving to datastore-as-a-service. Large amounts will also move to remote DBA as a service (consulting firms).
I have been Informix DBA for years and I could tell that we were the strongest guys in a team, because in order to do our job we had to understand (abstract out of running system) the data-flows, access pattern (and especially locking issues), actual server's topology (disk controllers, channels, hard-drivers) data partitioning (where this or that table-space lives, what's else on this volume, this channel, this controller) what are the access patterns for each table, how indexes are utilized, etc.
We also have patched, compiled and installed all the required software (have you ever tried to compile Informix support into PHP4? you definitely should.)) and to teach coders how to use it, and then deal with access patterns of silly scripts, etc.
The claims that some crap like MongoDB (of all things!) service could replace skilled, productive, (but, yes, quite expensive) professionals is, of course, utter nonsense (what else we could expect from MongoDB?).
DBAs and Sysadmins (real ones, not these clowns who use nothing but chef or puppet and doesn't know how ./configure && make works) are becoming extinct purely from economical reasons, and all these cloud services, ironically, require even more knowledge to deal with, because all that virtualization crap messes everything up even more (google for redis on EC2 for a change).
Sadly, idiots are taking over the world slowly but steadily,)
As long as we're insulting people ... it's going to be 2014 in three weeks. Who compiles anything for PHP4? Boasting about an ability to handle other people's messy scripts sounds like an anti-pattern to me, and your recommendation to your org should be to develop a plan to refactor anything dependent on a library that has been deprecated for 6 years and actively out of development for 5.
I also feel like the vast majority of folks claiming merit badges as DBAs in 2013 are the product of failed technology roadmaps and technical debt with no plans to pay it off. I'm sure there are still places where true, full-time DBAs are worth their paycheck, but in my experience these are far and few between with the current status of OSS database options and hardware performance.
People may assume whatever they wish.)
> Who compiles anything for PHP4?
It was long ago, but you probably wouldn't believe how many people are scared to touch anything, leave alone to perform even a necessary security update - "what if it stop working?!"
Unfortunately, frustration, which sometimes influenced wording of some of my comments, is based on a quite long time in the industry, and I never asked for it.
The claim isn't that the service replaces the skilled, productive and expensive folks entirely, those professionals are still needed. They just work for the service provider.
The business entity that creates the actual product can focus on just that, and not complicated DBA tasks. It's a developers world. ;-).
Also, it's not a MongoDB centric concept. Rackspace and Amazon have multiple data products now, with more coming, and they all fall under this concept in my mind.
The last thing you want is a full transaction log during peak hours.
My sublime text is open and I have a cloud control panel open on chrome, am I the technology decision maker? Nobody told me that here on the company.
Wait... maybe nobody told me because... I'm the decision maker D=
DBAs are not going to go anywhere. Sure, you can scale the DB in the cloud quite a bit in and perform well but it's not free :) In the cloud, you would be literally paying for your bad design decisions in terms of hard dollars rather than performance issues.
Most start-ups or in house apps are fine while the salary(dba) => cost of cloud. However, that magic condition starts returning false pretty quickly if you are doing any non-trivial data management.
The banking world doesn't give their data to others and organization of ATM data, market data, customer account data, etc is only getting more complex and requiring excellent organization and management for terra data analysis to protect cusomters for fraud and worse.
Pick any data storage system you like - MongoDb, Redis, Riak, whatever.
Now, you get to place a bet. In the future (let's say 25 years) what will still exist: your choice of NoSQL, or the standard database with schema, SQL, access, and so on? If you choose wrong, you die.
Now you get to see the future and see if you die. Which are you betting on?
But in all seriousness, if your app uses a database, you are incompetent not to employ an expert to help with the database, whether advising on the query plan of those non-performant queries, or what is the best setup for the current stage of the business and app, they are very useful.
DBs performance is complicated, yes, but the vast majority of it is extremely simple. It's just none of this simple stuff has to be learnt until it's too late and the cost of fixing it has dramatically increased.
There's a certain level where you need a DBA and that bar has been getting higher for years.
What we really need is to demystify DB performance, which for the most part is fairly simple.
What you need to do is teach your developers how to read those query plans. Show them how to find the expensive queries. To give them tools to easily see what queries their ORM is spitting out. To show them how to use a SQL profiler.
Tell your developers how query plan caching actually works. How a clustered index works and what you should and shouldn't put it on. Explain how indexes work. Explain how DB pages actually work and then it's obvious why certain indexes are a bad idea. Explain how relational keys are very important for the DB engine and leaving them off is not an 'oops', it's a serious mistake with long term consequences.
And that's at most a few days work. So why do you need that DBA?
Aside from that you need someone who knows how to maintain a DB, but again that's not particularly complicated and once it's done you can forget about it apart from the occasional sanity check that it's all working properly.
That's a pretty easy statement to defend. But, I'll respond by saying that most Companies running Oracle 11g with more than a couple terabytes of databases, require a competent DBA, particularly if Disaster Recovery/Transaction Rollback is important.
" the DBA often never has sufficient domain knowledge of the problem "
Of the half dozen or so truly high level DBAs I've worked with (and managed on occasion), I can say they had incredible domain knowledge of Oracle Database Server, and worked extraordinarily hard to have next to zero knowledge of the application running on it. Their focus was to keep the database running, defend it from engineers and users, and recover it when things went really awry.
"DBs performance is complicated, yes, but the vast majority of it is extremely simple."
Any time you see the phrase, "Extremely Simple" when discussing a domain in which the expert practitioners routinely make $250K/year or more without any form of market manipulation, you need to reconsider why, exactly, these technicians are being paid so much to do something, "Extremely Simple."
"What you need to do is teach your developers how to read those query plans. "
Completely agree here, but, there are two perspectives on this topic. There is the "Engineers are ultimately responsible for the efficiency of their query plans, and should be educated/trained to take that responsibility" and then there is, "We can't train our engineers to be query plan experts, just keep them from shooting themselves in the foot, and let Query Optimizer handle the rests - it's up to the DBA to manage stats gathering to keep DBM_STATS healthy"
I think we tend to see the second approach more frequently in the enterprise, where your engineers are likely making less money, and the company is keen to leverages their many 10s of millions of dollars of Oracle Technology.
Finally, when a company is paying 10s of millions of dollars a years in Oracle licenses, they consider it a worthwhile investment to have a few high-level DBAs to fully leverage that investment.
Without getting into the rest of your argument, I'd like to quickly address this.
No, it really isn't.
In your typical MySQL database, your performance for a simple "select * from x where y" is going to go through a lot of complicated machinery (most of which can be tuned for performance), a few points of which I will enumerate below.
1) Acquire a query cache lock & see if this query is there
2) Run the query through the optimizer
2a) Perform multiple shallow dives into a table to look at the cardinality of the filtered columns
2b) Identify the best indexes based on the shallow dives
2c) Create a query plan
3) Push the query plan down into InnoDB
4) Load the index into memory, if its not already there
5) Load the potential rows into memory, if they are not already there
5a) If there's not enough memory, load a few into memory, and be ready to push those rows out of memory in favor of more rows when needed
5b) Load rows that are still in the insert tree but not yet part of the regular buffer pool or pages on disk
6) Loop through the candidate rows for matches to the filter
7) Return the data to MySQL
8) Acquire the query cache lock & update it
9) Return the data to the client
Any and all of these can (and often should) be tuned. There are 600+ page books and very old (and oft updated) blogs dedicated to this topic... it's not something you can teach a developer in a couple of days.As an example, I attended an introductory course to being a MySQL DBA; it lasted 5 days of 8-5 teaching & running examples. And it only scratched the surface of what I do on a daily basis.
"often never"?
I've met some DBAs who were quite versed in the business domain. And some that weren't. This is no different to the majority of the developers I run in to, so I'm not sure why you're drawing a line there, except that the article was about database stuff.
Given modern databases and modern hardware, I think the vast majority of applications never reach the point where query performance is an issue. For many small software companies hiring an expert in databases makes no more sense than hiring an expert in operating systems or networking - unless your needs are very specialized, these things work well enough out of the box.