DataOps is an interesting use case for the DevOps approach. After all, data holds so much promise. It might be more valuable than oil. It might make our jobs easier. It might help us make better decisions. But for it to deliver on any of its promises, data must be harnessed and managed as the asset that it is. And that’s where DataOps comes in.
Let’s take a closer look.
What is DevOps?
When it came on the scene in the early 2010s, DevOps was heralded as an innovative way to accelerate the software development process, bring siloed teams together, and automate best practices in continuous integration and deployment. Since then, it’s found its way into nearly every IT organization — over 80% of companies report implementing DevOps practices with some level of success.
Now a mainstream approach, many other teams have adopted DevOps-like methodologies to move more quickly and deliver higher quality outcomes. As such, an evolution of disciplines have emerged, coming in the forms of DataOps, CloudOps, PlatformOps, AIOps, BizOps, and even MarketingOps.
What is DataOps?
Most people would simply define DataOps as DevOps for data. However, this definition doesn’t properly recognize the true purpose behind data operations methodologies: feeding the enterprise’s insatiable hunger to transform raw data into insight-driven decisions. Data analytics experts at the Eckerson Group have appropriately expanded their definition of DataOps as follows:
DataOps is an engineering methodology and set of practices designed for rapid, reliable, and repeatable delivery of production-ready data and operations-ready analytics and data science models.” – Eckerson Group
Who are the DataOps Experts?
Lots of roles work with data:
- Developers make the applications that make the data usable, using DevOps methodologies.
- Data analysts, data scientists and engineers transform the data into information and insights.
- Business users act on those insights.
However, none of these roles actually focus on the data itself. Data operations methodologies bring together data architecture, data engineering, and data governance roles with IT Operations to efficiently manage and improve the flow of data – at scale. They don’t make the data; they make the data work better.
DataOps vs DevOps: What Really Sets Them Apart
Ultimately, DataOps applies DevOps-like methodologies to manage and automate their processes. But that’s really where the similarities end. DevOps manages software development; DataOps manages data pipeline development. DevOps delivers applications and services; DataOps delivers usable data. DevOps sees data as in input or output; DataOps sees data as a strategic asset. DevOps seeks to offer jobs-as-code, while DataOps aims for data-as-a-service.
What is Data-as-a-Service?
Data-as-a-service makes on-demand data available to decision makers throughout the enterprise.
This is so much more challenging than it sounds. Data operations teams must integrate multiple platforms and technologies — across a complex data toolchain and often throughout a hybrid IT environment — to bring data to end-users in a way that’s easily accessible by all who need it. This integration of the data pipeline must also be consistent, reliable, and scalable. There’s only one way to make it happen: automation.
Automation Keeps the Data Pipeline Flowing
DataOps is not practical without automation… [It’s] essential to DataOps, with eight points at which test automation is critical, two points of deployment automation, and two points of operational orchestration. – Eckerson Group
The Eckerson Group identifies multiple points of automation in the DataOps dev-test-prod cycles for both big data pipelines and data science and analytics. A few examples include:
- CI/CD for data science and analytics: model execution and orchestration, build and train models, unit and user tests, integration tests, pre-deployment tests, deployment, post-deployment tests
- CI/CD for data pipelines: pipeline execution and orchestration, build pipelines, unit and user tests, integration tests, pre-deployment tests, deployment, post-deployment tests
Many providers in the data toolchain offer some native automations and integrations within their platforms. However, these tend to be bare-bones and overly simplistic. For this reason, enterprises often turn to an evolving category of solutions known as service orchestration and automation platforms (SOAPs). SOAPs are equipped to coordinate, schedule, and manage the complexities inherent in data operations.
The Data-as-a-Service Solution: Service Orchestration and Automation Platforms
SOAPs help data teams efficiently orchestrate the automated processes required to operationalize end-to-end data pipelines. Think of SOAPs as meta-orchestrators, meaning that they don’t replace any part of the traditional data toolchain. Instead, they rise above it to:
- Centralize management and observability across the entire data pipeline.
- Visually create automated workflows between data sources, ETL and/or ELT platforms, data warehouses and lakes, visualization tools, and more.
- Automate data streams in real-time to keep up with the pace of business.
- Put the Ops in DataOps by helping data teams collaborate with developers and IT operations to achieve scale.
- Approach data pipelines with DataOps methodologies to achieve continuous integration and continuous deployment (CI/CD). Many SOAPs empower users with dev/test/prod lifecycle management functionalities. This in-built feature enables end-users to apply data operations practices directly within the platform, making it a key to orchestrating the data pipeline in a way that keeps it fresh and working without error.
With this level of orchestration, you gain single-platform control over all the various tools used within each stage of the pipeline. You monitor the logs and manage governance and security in one place. And you receive alerts when there are potential issues — before your business users even know about it.
Summary and Next Steps
The last two decades have seen exponential growth in the volume, complexity, and importance of data. DataOps has emerged as a way to harness its potential, supported by service orchestration and automation platforms like the Stonebranch Universal Automation Center. To learn more about SOAPs, read Gartner’s 2021 Market Guide for Service Orchestration and Automation Platforms or check out the Stonebranch DataOps solution.
If you’d like to dive deeper into these practices, the Eckerson Group offers a wealth of knowledge. We recommend starting with DataOps: More Than DevOps for Data Pipelines and Architecting and Automating Data Pipelines.