A Blog by Jonathan Low

 

Aug 2, 2016

Why Technology Has Enhanced, Not Eliminated, the Database Administrator

Oh. Right. Management! What a concept. JL

Sean Gallagher reports in ars technica:

Virtualization, cloud data storage, micro-services, the "DevOps" approach to building and running applications, and a number of other factors have significantly changed how organizations store and manage their data. While there is definitely more automation and tooling, the counter to that is that many of the newer technologies are less mature and require more care and feeding.
For those of us who have been in the information technology realm for too long, the title "database administrator" conjures up very specific images. We picture someone pulling hair out over issues with backups or snapshots not happening, schemas growing out of control, capacity plans blown up by new application demands, sluggish queries, and eternal performance tuning.
That old-school role of the DBA still exists in some places, particularly large enterprises where giant database clusters still rule the data center. But virtualization, cloud data storage, micro-services, the "DevOps" approach to building and running applications, and a number of other factors have significantly changed how organizations store and manage their data. Many of the traditional roles of the DBA seem to be moot in the shiny, happy world promised by the new generation of databases.
"NoSQL" databases don't require a pre-defined schema, and many have replication built in by default. Provisioning new servers can be reduced to clicking a few radio buttons and check boxes on a webpage. Development teams just point at a cloud data store such as Amazon Web Services' Simple Storage Service (S3) and roll. And even relational database vendors such as Oracle, Microsoft, and IBM are pushing customers toward data-as-a-service (DaaS) models that drastically simplify considerations about hardware and availability.
You might expect this to mean that DBAs' jobs are getting easier. If so, your expectations would be wrong.
"I think [DBAs'] roles have become much more complex," said Chris Lalonde, vice president and GM of Data at Rackspace. "While there is definitely more automation and tooling, the counter to that is that many of the newer technologies are less mature and require more care and feeding. I would say that many of the traditional tasks of DBAs still exist today or need to exist."
In fact, all these great new database technologies highlight the data professional, whether that person is called a DBA, data architect, data engineer, or, in some cases, data scientist. "Data is even more important today," said Kenny Gorman, a database veteran and co-founder of the real-time data service company Eventador. "Businesses used to rely on databases to be sound, run smoothly, and give good reporting. But now, data actually makes you more competitive, and there are more job titles working with data and more technologies that use it. And the database professional is at the core of that."

One step forward...

Non-relational platforms offered a promise to reduce the workload of DBAs, and in some ways they do. Ravi Mayuram, senior vice president of products and engineering at CouchBase, compared the shift in what DBAs have to do to how driving a car has changed over the years: once upon a time, "to drive one you had to essentially be an engineer, and when something went wrong you needed to pull to the side of the road and get under the hood." Now most things take care of themselves, he said, "but I can't do anything myself to fix it."
Databases such as MongoDB and CouchBase, while not relational, support SQL queries, and they have other aspects that make them approachable to experienced DBAs. But they also allow for "dynamic deployment decisions, which you couldn't do with relational systems," Mayuram claimed. "Adding new data structures used to require a schema change and downtime."
Data as a Service has been embraced for "a fraction of what companies do," Mayuram said. "Most companies don't have mission critical information in the cloud."
While a big relational database system requires an understanding of everything about the hardware and software stack, "the next generation of DBAs will be less involved in that," Mayuram explained. "There will be a requirement for a DBA—someone who is more database intensive, but not exclusively," to focus on tasks like capacity planning. The DBA of the future will need to know when to provision more servers and when to retire them.
That sort of dynamic scalability is what has driven the adoption of cloud-based data services based on specialized databases and “Data-as-a-Service” schemes (either built in-house or hosted with a vendor). In either case, provisioning services can take care of setting up hardware, network, and storage. In theory, the DBA focuses on figuring out when applications will need more database capacity. "This is a DevOps sort of role, dealing with dynamic provisioning—it's a slightly different profile," Mayuram said. "With more efficiency, they'll need a fraction less of DBA skills, but that requires them to be more capacity planners and understand the development side better."
For those unfamiliar with the term, DevOps is a practice now used widely in Web and service development. It describes application development teams working in collaboration with IT operations staff to continually improve performance, automation, and scalability of software and systems. The DevOps approach has been a major driver of the adoption of NoSQL databases and other non-traditional data storage and query technologies. DevOps has driven the development of Data-as-a-Service—largely because of the need to automate the scaling up and down of database capacity. But even in the purely relational world, the shift toward turning databases into a cloud service is reducing the need (and the ability) for DBAs to have fine-grain control over hardware configuration.
So far, Data as a Service has been embraced for "a fraction of what companies do," Mayuram said. "Most companies don't have mission critical information in the cloud." Early adopters, he noted, are taking a hybrid approach, with some creating internally hosted DaaS platforms based on cloud computing platforms within their own data centers. But other companies are largely keeping their critical relational systems as they are and using cloud approaches for new projects. "They still have DBAs taking care of existing apps and have DevOps teams handling database deployment in a micro services environment—services that don't need to be in a relational system," Mayuram explained.
With companies preserving their relational databases and increasingly needing to bridge the gap between the old and the new, things got more complex instead of simpler. And even when organizations completely outsource the applications that have typically placed the biggest demands on DBAs, they're still left with the need for some sort of data professional to make sense of what they've gotten themselves into.
The database schema for MediaWiki, the platform used by Wikipedia. Some specialized databases don't explicitly require schemas, but the schema lives on in other ways--and understanding data structures remains important.
Enlarge / The database schema for MediaWiki, the platform used by Wikipedia. Some specialized databases don't explicitly require schemas, but the schema lives on in other ways--and understanding data structures remains important.

The news of death greatly exaggerated

In December 2013, Kenny Gorman wrote an article provocatively titled, "The Database Administrator is dead." But he concluded that article with the declaration, "Long live the DBA."
"The point of that piece was to say that the DBA is still important," Gorman told Ars recently. As a long-time Oracle database administrator and data architect at companies that included PayPal and eBay, Gorman found himself diving into MongoDB at Shutterfly and becoming a NoSQL believer. In the article, which he wrote while chief architect at the data-as-a-service provider ObjectRocket, he noted, "most of our customers don’t have DBAs on staff." But that didn't mean the job had gone away.
"As we move to the cloud," Gorman explained, "with data services, micro-services, the whole 'serverless' movement (services like Amazon Web Services' Lambda and Google Cloud Functions), the whole data landscape is continuing to evolve. And it has changed the role of the DBA—it's no longer the guy who manages the Oracle server in the data center for a particular company. Now there's database storage technologies that exist all over the cloud in various forms that they have to manage."
Even though many of these new database technologies automate much of what DBAs used to do, it doesn't mean that there's a reduction in DBAs' workload. "I believe automation has reduced the need for traditional Ops folks as they help scale the hardware and therefore the volume of queries," Lalonde said. "But there aren't many tools around finding and fixing slow queries or picking the best shard keys," he explained. "I believe automation allows for larger scale with fewer resources, but ultimately you still need an expert around."
Gorman believes the complexity of the new data environment is making DBAs' jobs even harder than before—not easier. That's in part because DBAs can't be as specialized as they once were. "I ran database servers for PayPal and eBay way back in the day," he explained, "and we had one or two data technologies, not fifty. If you knew Oracle, you could probably figure out Microsoft SQL Server—those technologies are complementary." That's not the case anymore, Gorman asserted. "These days, you have to understand the difference between Elasticsearch, Hadoop, [Apache] Kafka, and Oracle and how they are different and why is one better for a particular use case at hand than another."
Because of the pace of change in data storage and query technology, it's not even clear what a database is anymore. And many of the technologies being passed off to data professionals, regardless of their title, bear little resemblance to anything they've worked with before.
"Our careers evolved from managing systems and storage around databases," Gorman said. "Oracle was clearly a database. But these days, the very notion of what a database is has changed. Like, is Hadoop a database?" At ObjectRocket, Gorman built data services around MongoDB. "That's pretty clearly a database," he said, "but our new startup is based on [Apache] Kafka—is that a database?" (Kafka is a data broker that provides streams of data from query "subscriptions" for real-time applications.) "Well, it has properties of a database. It's meant to shuffle data, real-time data. So, it's this crazy evolution where we don't even know if a data product or data infrastructure is really a database anymore. The waters are so muddy. Now it's just really data systems—they all have their own nuances and components."
Some things haven’t changed, even with the new technologies. "Optimizing queries and moving data around have not gone away, and neither has the need to monitor and maintain these databases," said Lalonde. "And those 'schema-less' databases really do have schemas as it turns out—they're just more loosely defined."
As a result, Lalonde explained, the DBAs "have to have the same skills they've always had. Obviously modern DBAs have to be more flexible, understand a breadth of technologies, and work well in agile environments. What we look for in general is someone who really understands database fundamentals, because understanding those fundamentals translates across technologies well."

What's a DBA, anyway?

The shift in data technologies and how they're deployed hasn't just added more work onto the DBA's role—it has also redefined who fills it. With more of the operational tasks around databases shifting toward the operations side of "DevOps," the DBA role is much more closely linked to the application development process. And the skills usually associated with DBAs are now much more important across the development and operations teams.
"I believe automation has reduced the need for traditional Ops folks as they help scale the hardware and therefore the volume of queries... I believe automation allows for larger scale with fewer resources, but ultimately you still need an expert around."
"I think the roles of the developer, the DevOps guys, and data guy—maybe it's a data engineer, maybe a DBA, maybe a data scientist—those roles have to cope with a myriad of new technologies," said Gorman. "Each of those technologies has their own spectrum of maturity, features, and capabilities." And that means that each of those roles now requires at least some of the skills of a DBA.Whoever ends up in the DBA role for these systems doesn't just need to have a general understanding of them—they need a much more nuanced understanding of what's going on inside their systems than they may have ever needed with relational databases. Just as the behavior of SQL queries can be tuned to some degree for each relational database, getting the best performance out of newer non-relational systems requires DBAs to have a deep understanding of their inner workings.
"How a database 'behaves' is highly dependent on a few choices that the developer of the database makes at the lowest levels," Lalonde said. "If you know what those choices are and how this particular database has made those choices, then you can get a good idea for how it will behave, generally speaking."
That's a level of familiarity that was once the domain of the most experienced database administrators and programmers. But just as data has become increasingly decentralized, it has spread the demands of the DBA role throughout the IT organization. Given how much time many people spend managing their personal stack of structured and unstructured data, we may all very well be DBAs at this point

0 comments:

Post a Comment