Orchestrate Complex Data Pipelines with UAC
Good morning, good afternoon, and good evening to the global audience we have joining us today, and welcome to our third session of StoneBranch Online twenty twenty one, a global IT automation and orchestration forum to get you ready for whatever comes next. I'm Lauren Tanzini, marketing manager at StoneBranch, and I will be your moderator today. We will show you the practical application of how that can be done with StoneBranch Universal Automation Center. For those of you who attended the Eckerson Group session, you might recall that over seventy percent of poll respondents admitted to facing complexities of data management. Additionally, only one third of that are currently orchestrating their data pipelines today. That lands us at about two thirds of organizations not having any type of orchestration strategy in place. So this session, we will help you fill this gap as we talk about how to use StoneBranch Universal Automation Center to orchestrate your big data pipeline. With that said, it's my pleasure to introduce this session's speakers: Scott Davis, Vice President of Global Marketing at StoneBranch, who is located here in the United States and Moritz Russe, Director of Solutions Engineering in EMEA at StoneBranch, who is located in our Frankfurt, Germany office. Let's go ahead and get started. Scott, take it away. Hey, thank you, Lauren, and welcome to the audience. We're excited to have you here. We've got some really cool stuff to talk through today. I think what you'll be most excited about is the demo that Lawrence is going to walk us through. So I'm going to try to burn through some of the setup slides. We'll get to his demo. He'll spend a good bit of time there. And then we'll go into Q and A so that you can ask any questions you may have. First, just to orient everybody to the Universal Automation Center platform, this is a slide that many of our customers have seen, and we show this in just about every deck that we present. But for those of you who are not familiar with it, these are some of the main pillars that the Universal Automation Center covers. Today, we're going to be focused on that bottom pillar, which is managing data pipelines. And the tool, the platform itself has a ton of capability here and has significantly improved in its ability over the last two years, mostly because our customers have been asking for it. Another orientation slide is this one. This is a good representation of what I would call a simple data pipeline. In this case, we're looking at an analytic pipeline. You have five stages. You have your data sources, your data integration stage, your data store or data storage stage, your analysis stage, really your computation stage. This is where a lot of the machine learning and the predictive analytics happens with the data scientists. And then finally, delivery stage. Keep this image in mind. We'll show it a few different times to explain what we're doing in the context of the tool and how our customers are using this solution today. Using this same model, though, let's expand on the solutions that are there. Sorry, my phone's ringing. If you're a practitioner or you use any sort of data pipeline or trying to build a data pipeline, this slide shouldn't look crazy, right? I mean, it looks crazy because it is crazy, but it shouldn't look out of the norm, meaning that there's all kinds of different software and solutions anywhere from the sources that you're pulling data from all the way through to the tools that you're using to finally deliver it, the different dashboards, text, custom apps that you're building. But in between there, these tools range from simple to complex, and there's a lot that's changed over the past, let's say, fifteen years. So I spent most of my career working for business objects, which was a notable business intelligence and analytics company that was acquired by SAP, and then working in that stream of products, same the business intelligence products for about ten years. And one thing that I would notice that has changed a lot over this time is that in past fifteen years ago, most companies standardized on big behemoth vendors. You would have all of your, you know, it'd be like just using SAP tools. You have an ERP tool, you have the ETL, you know, SAP version of that, you have business objects database, you have Crystal Reports or some other business objects, BI tools you push those through, and all those things were interconnected. Today, it's a lot different. As you can see on this slide, you wind up having best of breed vendors coming into this space and companies are not standardizing on one Bohemian vendor anymore. They're not doing that. I mean, there's still some that are all SAP shops or all Oracle shops, so on and so forth. But just like in all forms of software, companies have moved to this best of breed standpoint. So very much what you see is as a result of that, not all your analytics, your data pipeline sitting on prem in an SAP world. You see it merging between SAP and Informatica and Snowflake and Tableau, probably much to the strength of SAP and Oracle and those types of companies out there. The other thing that changed a lot is that, you know, a large portion of the sources and a large portion of the tools on this list are now in the cloud. So being able to figure out how to orchestrate the data pipeline across on prem and cloud is what needs to happen today, and that's pretty difficult to pull off for a number of reasons, but we'll talk about that more in just a few minutes. So let's go to our first poll. I'm going pass this off to Lauren. Lauren, why you walk us through this? Alright, I'm going to start the poll now. You should see a pop up. Alright, the first poll. Which methods do you use to connect tools across your data pipeline? A, point to point integrations, B, custom scripts, C, a mix of point to point integrations and custom scripts or D, we do not, we do it manually. So what you can do is on the right side you can see the poll results coming in after you've completed it. Let's give about five more seconds and then we'll close the poll. So right now we see A, point to point integration is thirteen percent, B, custom scripts is eighteen percent, c, a mix of point to point integrations and custom scripts is sixty five percent, and d, we do not, we do it manually, is only six percent. How does that sound, Scott? So let's go to this slide. It sounds about right. It's interesting. I was speaking with a Gartner analyst a few weeks ago and my first draft at this slide that you're looking at now really just listed point to point integrations and custom scripts or a mix of both. And I was trying to explain to him that that's how people work, and he said, you know, there's another category. I said, what? He said, they don't. They do it manually. And so I'm glad to see that on this webcast there's some people that are, just a few people that are doing it manually, but that's the legitimate thing. It's still flat files and manual transfers of data in a lot of these cases. But point to point integrations and custom scripts, it's about even with most people using both. That makes a lot of sense to me because the reality is you don't have point to point integrations between all tools. They're not natively integrating with each other because the world is moving so fast. What makes custom scripts easy to do in the cloud environment are APIs, and so it's a logical thing to build those custom scripts. And so why are people integrating? What do they gain from this orchestration? At the end of the day, they're trying to get to some sort of centralized view of all of the integrations and all of the data movements and the logs and ultimately observability of data, they need to be able to root cause issues as they're happening. This is something that when you have a proper orchestration solution, can get to very quickly. It's the single view that allows you to go and drill in and say, hey, this part broke so I can fix it. They want to get proactive in support. And when I say that, what I really mean is, you know, think of a world where maybe you're living today, right? Many people are. Where you don't know something's broken until your customer, that customer could be the CEO, it could be a marketing manager, it could be, you know, a literal customer sitting outside of your organization is screaming because their dashboard isn't working or the data isn't updating on the website or whatever it is. And so you wind up finding out afterwards. True orchestration solution is going to give you this detail ahead of time. Like the second it happens, you'll get an alert. You'll be able to go in and fix it. And in an ideal scenario, you're fixing it before it becomes an issue. And then what we ultimately see with our solution, because there's other orchestration solutions out there, there's other scheduling tools, but when people are coming to us, they're at the point where they need to achieve scale, right? And let's talk about those other solutions real quick. So from an automation standpoint, if you're trying to automate this data pipeline, one of the first questions I got from analysts when I was out there was, well, why don't you just use Informatica? And so I wanted to find out for sure. I thought I knew, but I went and spoke with some customers that are using our Informatica integration. I looked online, I did some research in forms, and the answer is Informatica is a great ETL tool. I don't want to take away from it, but its job scheduler lacks functionality, right? And it only does job scheduling within its own tool. So an orchestration tool is going to reach into every tool on your data pipeline and integrate with it. And then it's going to run the jobs in each one of those tools as part of a workflow that keeps your whole pipeline running. So you can't use Informatica's inbuilt job scheduling capabilities to do the whole pipeline. You can do Informatica, you might even be able to reach right or left to make data trigger to come through Informatica, but certainly not along the whole pipeline. Other organizations, and this is the most common in the data world, is they'll go to open source tools. So they're going to use Airflow, right? And they like Airflow because data people understand it, they figure it out, and Airflow is a pretty good scheduling tool. It's the point where data teams go to flip it to operations. They build the data pipeline and they say, okay, ops, go ahead and run this for us. We ask people who are like, oh, hold on, this is not something that we can scale with. This is not your tool that we can all collaborate in. This is not something we can install and run easily ourselves at scale. And so organizations wind up looking for something that has support that is enterprise grade, so on and so forth beyond open source schedulers. The other category, which is one that's really evolved over the last couple of years, are cloud schedulers. So there's a lot of companies that want to go and use AWS's scheduler or Azure scheduler, Azure as Databricks. It's a pretty cool scheduler, but all of these AWS, Azure, Google Cloud, they only work in their own ecosystem. So they're trying to pull off what the SAPs and the Oracles of the world did fifteen years ago, where they want to lock everybody into their own ecosystem. So a lot of their tools only work inside of their cloud ecosystem. So when you're on Azure using Databricks, you can't reach over to Google and automate something for them, or you can't reach out to Snowflake easily, right? You have to have some other application that's in the middle. And then finally, the other big pain point that we see in this market is just about every business has a workload automation tool or a traditional scheduler, job scheduler at workload automation scale in their business. Most of these tools in the market today are legacy. You know, they were built thirty years ago and they work wonderfully well on prem, but they don't do as well in the cloud. So this drive that you're getting from using data sources that may be on prem or data sources that may be in the cloud and tools that are in the cloud or whatever, they're really struggling to keep up with this aspect. And this is where our customers come to us. They're looking for that capability, the hybrid IT capability that the Stonebranch Universal Automation Center is so good at. So let's go to the next slide now that we understand the problems and talk specifically about doing data pipeline orchestration with the Universal Automation Center. So first, everybody remembers this data pipeline slide from earlier in the deck. The big thing here is how do we accomplish real time automation and the file transfers needed to manage the entire data pipeline? Well, what winds up happening with the Universal Automation Center is we actually sit, you could put it above, you could put it below, you could put it straight through this pipeline, we choose to put it above, but we sit as a layer that sits right above all of these tools. And our application, our platform will centrally schedule and orchestrate all the automated processes within each tool and get a pipeline. So the way we do that is we reach in and for on premise tools, even mainframes, distributed servers, we have what's called an agent. That agent in other terms or other industries might be called a bot, but we take this little piece of code and we install it on your mainframe, and that allows Universal Automation Center to connect and speak with the mainframe in this case. We also use agent less connections. Those are done via APIs. So just like most of the organizations building software out there, we're able to connect to our partners, our third party partners and non partners even via these API connections. And so what happens is our tool sits at this top layer and then it reaches down and integrates with all of these other data tools and data sources within your data pipeline that allows you to do the central controlling and to run the automation within those tools. So there's a foundational element to this that needs to happen like that, and that integration is key. So what you wind up getting once you're integrated with all these tools and once you're able to centralize the orchestration is observability. And observability is becoming a really big deal. It is a big deal, but it's becoming more and more a big deal. So this is getting access to log data that you otherwise wouldn't have. You know, you might be running something like stream data through Kafka or even jobs through your ETL program, whatever that is, maybe it's Informatica. And sometimes there's logged in and sometimes inside of it, sometimes there's not, but it's not all connected. Like you can't see that this job ran here and here and here and here and what times it ran in a centralized place. What you get with a tool like Stone Ranch is that centralized log data across the entire pipeline. And so that helps with governance, helps security, helps with compliance, also helps with group causing issues. The other thing, really thinking back to the session that we had on Thursday of last week, and a lot of what we're hearing in the marketplace is this data ops term. So data ops is new. If you're not familiar with it, it's similar to DevOps in that you're trying to build your CICD pipeline, so to speak, or your lifecycle, but you're doing it with data. So within our tool, for those of you that are using it for DevOps today, there's some really cool ways that you can achieve that, and you can apply those same principles with promoting your workflows and your code between your dev, test, and prod environments. And so this allows you to do simulations. It allows you to test. It allows you to make sure everything works before you put it into production. The next thing on this list is this idea of centralized control and visibility. So the best way to describe this is in contrast to this bullet. So let's think about what it's like if you don't have centralized control. Basically, you may have twenty tools in your pipeline, you may have five live tools in your pipeline, whatever it is, but you have to go into each one of those tools, and usually these tools have some sort of batch scheduler, know, they'll do time based stuff. And you have to go in and run the scheduler in each one of those tools, and the bigger that pipeline gets, the more tools you're using, the more difficult it is to create that red thread that runs all the way through. And so with the Stone Branch Universal Automation Center, what you're really getting is a single place to build these workflows and to connect the automation points between all of these different tools. And this is what Lawrence is going to demo. He's going to show you an output of what that looks like. And then finally, I talked about a little bit in the last slide, but I'll just mention here again, one of the reasons that people are coming to us and we hear over and over again is this ability to figure out what broke, right? When you have that centralized view, when you have the log data, when you have a tool or a platform that gives you the real time alerts when something does go down, you're able to figure it out very quickly and that saves you a lot of time, but it also helps you be proactive versus reactive to hearing about it when, you know, your customer X lets you know that something's broken. And I mentioned integration a little bit ago, none of this is possible. And where I think that StoneBranch has a huge competitive advantage, so to speak, in the market is with all of the integrations that we have along the data pipeline. So this is some of them here, but you've got to be able to tap into the cloud providers, the data sources, which most vendors out there do, but then some of the tools like Databricks and Snowflake, SaaS, these are tools that other vendors don't tap into easily. And we'll talk a little bit more about integration just a little bit. Let's do a quick survey again. Me pass it over to Lauren. All right. I'm gonna post our second poll. How often do you add or remove data sources or data tools along your average data pipeline? A, monthly B, quarterly C, annually D, constantly ad hoc based on line of business asks or E, never. So we'll just take a few seconds to finish up this poll. Alright, let's see what the results are. So A, monthly, we got eleven percent B, quarterly, we have seventeen percent C, annually, we have thirteen percent D, constantly is sixteen, or sorry, fifty three percent. And E, never is seven percent. How does that sound, Scott? Exactly what I was expecting it to be in that, you know, majority of people are frequently changing these tools, and that creates a lot of flux in your data pipeline, right? And so this is, I talked about integration before. One of the things that people come to StoneBranch for are these integrations. And just during this series, StoneBranch Online, last week, we announced the launch of our new integration hub. And for those of you that are long time StoneBranch customers, you'd be familiar with our old marketplace. The marketplace was serviceable. We were able to deliver integrations that way, but it was hard to find them. It wasn't very modern tech. We went through a big exercise where we updated and refreshed the entire thing, and we added a whole bunch of new integrations to the Integration Hub. And so you can find the Integration Hub in the handout section. There's a link there. You can go explore the integrations that are currently published. But the big thing about the Integration Hub is that not only can you find integrations that we've built, you can also find integrations that your peers have built. So there's a place for our customers or our partners to build integrations, submit them to us, we vet them fully, and then we put them out in the marketplace. So it's kind of a place for people to share what they've built. The other thing about just integration in general with StoneBranch is that the hub is the place we put integrations once they're done. But the beauty of what we call our universal integration platform, it's actually in its second version, so universal integration platform v two, is that it's incredibly easy to build integrations, especially if there's APIs involved. So I told this anecdote last Thursday, I'll tell it again because I love the story. Last year we had a prospect come to us and they wanted to orchestrate the data pipeline, but one of the tools they needed was Tableau. And we know everybody uses Tableau or a lot, it's in the upper right hand quadrant, the magic quadrant for business intelligence tools. So we knew we needed to build it. We turned it around in two days, right? We had some developers go on it and we were able to build that in just two days and get it back to the customer or prospect at the time, now customer, and it worked out great, right? And that's the speed at which you can build integrations with our integration platform. And of course, that was Stonebranch building it, but we also have customers that are building integrations on their own because we have good documentation. We have, with the rollout of seven point one, which we'll have a session on later this month, additional development resources for our customers and our partners that can go out there and build these on their own. And another story that I love telling is I was speaking with a customer, it would have been two weeks ago now, and when they were first looking at our tool, what they realized when they saw the universal integration platform is that they could build these integrations very easily. So they have a developer in house that's just pumping these things out. And, you know, one of the ones that jumps out at me that we haven't built, I got to go back to the customer and ask her if we can put it on our integration hub, is DataStage for IBM, which is an ETL tool. And so you see our customers building these things, they're templated. If you know Python or have somebody who knows Python on the team, he uses modern language like that, and it's really easy to do. But even if you don't, come to us, More than likely we'll build it for you because our goal is to, we have this great t shirt, it's funny to bring it up, but orchestrate the universe. Like that is what we believe. We want to orchestrate everything, which means we have to tap into it. So reach out to us if there's something not in the list and we'll work with you to get it done amongst other things that we're working on right now. Okay. Speaking of integration and using this tool in general, we believe very much in this citizen automator approach. And so I wanted to flash this screen. I like it because it talks about collaboration. These are integrations we have with Microsoft Teams, Slack, and ServiceNow. And the importance of these integrations is it makes it easier for end users to tap into automation. These are potentially end users that never want to go into the universal automation center, right? Because it's just yet another tool. But you may have a data scientist or a business user that's speaking from my own heart in marketing that wants to update a dashboard or update the data pool from the data lake into the data warehouse or whatever predictive tool they're using or their machine learning tool, and they can go into Slack for instance, and trigger that workflow to run, or maybe that workflow is regularly running and they can go into Slack or Teams and check to see if it did run, right? And if it didn't, why didn't it run? And if it didn't run, Slack or Teams can send a notification, open a ticket with ServiceNow. And Moritz will actually demonstrate this as part of his demo, but I wanted to really make a point about it because a lot of people, there's this, we talked about in our openings kickoff session, there is this shift from automation tools like StoneBranch being used only by sort of automation people in the IT ops team. Two, automation being democratized and being used by developers, being used by business users, being used by data teams. And so one of the things that you really get with the StoneBranch application is this collaboration environment where all these teams can work in a single tool. And this is where when I talked about operationalizing and scaling beyond open source, that's one of the things that people are looking for. They're looking for a tool that, you know, IT ops people can use, DevOps people can use, and data ops people or data people can use, and they don't all want to use a GUI or all want to use just as code sort of model, which brings me to my next slide. So when we talk about data ops in particular, there's a very foundational way that we help data teams and developers do this. We use different basically instances of what we call our universal controller, and you may have a development controller, you may have a test controller, you may have a production controller, and inbuilt inside of the universal automation center is what we currently call bundle and promote. And so what you would do is just build the workflow, you bundle it, and you promote it to the next environment. It's all seamless. Now this can be done two ways. It can be done the way that I just talked about, which is the inbuilt way. So you might have a web GUI that you're using. It could be that your IT ops team, they don't want go in and write code, right? They just want to drag and drop kind of the way you'll see more into the demo. But you may also have developers that are working on this and developers and even some data people, they don't want to drag and drop, they want to write the code, they want to use their IDE, they want to whatever. And so StoneBranch offers the as code or jobs as code version of this where you can upload the code to a third party repository. GitHub's one that is very popular. We have an integration for that on the integration hub, and they'll just move the data back and forth between environments that way. So when we talk about putting the ops and data ops, we're truly operationalizing this data ops methodology of moving across your sort of continuous improvement and continuous deployment. And you can do it via the web GUI that we have with a tool, you know, like most people would use it, or you can do it via code, which is something we're seeing asked for a lot more often today. So I'm going to turn it over to Moritz. Moritz, why don't you take us on a wonderful demo and show us and the audience everything we got. Thanks a lot, Scott. Hello, everyone. This is Moritz, and today I'm going to show you how your data pipeline could be orchestrated using the Universal Automation Center. I say it could be orchestrated because, as you might recall, one of Scott's earlier slides, the one where he said, it looks kind of crazy is, if you recall that one, then you know that your data pipeline can consist of various tools. And there's quite a lot of tools on the market that are quite good at what they're doing and could be already part of your data pipeline. So, in order to create a relatable pipeline, for you today, we, picked a bunch of common tools. And, in our Universal Automation Center workflow, I'm going to show you how orchestrate the orchestration of these tools could look like. So we have a couple of sources here on the cloud and also on prem and applications. So in reality, data sources consists of all three types. Right? We have cloud data sources on premise and also your applications that you're running that provide some of the data we need to move through our pipeline. For the ingestion part and the transformation part, we have Informatica in our workflow as well as Azure Data Orchestrator. For the data stores, we have Azure Blob, Snowflake, and also what's not mentioned here, Amazon Web Services s three. For the delivery, we use Tableau, but it could be, in reality, any kind of BI tool that you use, which your data consumers in the end or business users use to view the data that is ingested ingested and moved through a data pipeline in order to get some information out of it. And so I'm gonna share my screen, in a couple of seconds, you should be seeing a nice Tableau dashboard. Right. So, in the same way that your data pipeline can make use of various tools, there are also various people and groups involved in either the creation of the data pipeline, the maintenance of the data pipeline, or also just some people that want to get some information out of the data pipeline, then they are not even aware that there's a whole data pipeline in the background that is running, right? So, to talk about two viewpoints on on such players in the in your for your data pipeline, we created the dashboard here in Tableau. So this is something your data consumer, in this case, a business user, is seeing. And all he sees is the data that he currently has access to. And in our case, there's some data missing. So as you can see, there is data divided by regions. In reality, this is some sales data from a company that acts nationwide. And we are missing some sales data from our central region. So, the data consumer, in this case, only sees his dashboard and he knows, okay, the central region has not provided its data yet, so he has to wait. Right? The other point of view I would like to show you is, it could be a data architect or a data engineer. There are all kinds of data related roles that work on a data pipeline or at least responsible for it in some way, either maintaining it or monitoring it or fixing it, right, when something goes wrong. And what you see here is our universal controller, which is our orchestrator, so to say, of the Universal Automation Center. And what I just refreshed here is a dashboard, which we created specifically for someone who is responsible for maintaining the orchestration of our data pipeline. Right? So, if I switch to our default home dashboard, you would see a lot of other stuff going on, because the controller can be used for various use cases. For someone who is only interested in the data pipeline, we mute all this noise that is in the system to concentrate only what we want to see, in this case, all our jobs related to our pipeline. If we take a look at what we're seeing here, so at first, we see the different types of jobs that make up our data pipeline. So we have a couple of platform related server jobs. We have email jobs. We have Tableau and Snowflake. So all the tools I talked about earlier are represented here. We also see a couple of SLA related things here. So somebody may want to see when the data pipeline is late in some way or it doesn't meet SLAs that we have defined initially. So, in order to react quickly, we created parts of these dashboards to show exactly this data. We also see all kinds of alerts going on related to our data pipeline orchestration. So, what kind of emails went out, what kind of approvals went out. But without further ado, let's jump into the actual data pipeline orchestration. So, going to our workflow that is running and going to our workflow monitor, you will see the following. Maybe something important to distinguish before we move further. So, there's actually two things to a data pipeline that can be visualized. One is the data flow itself, right? So, data flows from your sources all the way to your presentation layer. And the other, which is what we are seeing here, is the control flow, or let's call it orchestration flow. Right? So, this is not one hundred percent exactly how the data flows through the systems, but it's how we orchestrate the systems involved to make use of the data. So, of course, there's a little bit of overlap between how the data flows and how we control the tools that manipulate the data, move the data. But especially towards the end, there's a yeah. These two views, they are a bit apart, and I'm gonna talk about that in a second. So our data pipeline is waiting. And what is it waiting for? So in order to show how event based the universal controller can be, we added a monitor at the beginning of our data pipeline orchestration, a monitor that is monitoring an SQS, which is the messaging queue on AWS, for incoming messages. It could also be a webhook that we hooked into, so the data pipeline is not running all the time or on waiting status, So, we only launch the execution of the pipeline when the data arrives. And this could be event based, right? So, we either monitor for arriving data, we hook into a messaging queue, or we have a Kafka in the background that tells us when we should start. Right? To simulate trigger of our data pipeline, I opened my POST verification here to post a new message to the queue that we are monitoring in our controller. So as soon as as soon as I send a new message into the queue, we will see in our workflow monitor that the monitor will go to success immediately. So it picked up the new message and he knows, okay, the data from the central region may have been collected or maybe the day has closed or the week has closed in the business there. So we can now extract the data. We have two data sources here. One is SAP, so we get some fresh data out of our SAP application. And also, we run a Windows based job to get some data out of a SQL Server database. Before we move the data to a central directory, we want to check that there is sufficient space. Right? So in order to prevent any kind of errors or we transfer files only partly due to missing space, we do a check, and we put this one on hold so the whole pipeline doesn't go through in a minute. So as soon as I release this one, our flow continues, and you see that this one, this space checking is going to success, meaning there is sufficient space available in our target directory. So we can do some preparation of the directory. We remove some old data on a file system accessed by a Linux machine. And once everything is in order, we actually transfer the data. So in this case, this is also the data flow. Right? So we because we have control of the data, we, in this case, move our SAP extract from the SAP application server or a remote machine to a central directory, which we previously cleaned. Right? And want on the one hand, we are doing this with our UDM, which is a StormBrecht proprietary file transfer solution, right, for very secure and fast file transfer between agents. So we're using UDM for this. And for the Windows SQL Server database extract, we're using traditional SFTP just to show that we can do both. Right? But in this case, we're actually moving the data ourselves. Before the the the next part of our pipeline should start, we have an approval task in between. So we want to involve somebody to take a look and approve before it continues, because there could be some computational heavy stuff going on with transforming the data, analyzing the data, and before all these resources are being used, we want to make sure that it's worth it, right? So we have somebody approving the continuation. And there's a variety of ways we can get this approval. The one I wanted to show you is using traditional instant messengers that you may already have in use in your company to do this approval. So this could be Slack, which is quite common, or Microsoft Teams, which is even more common, which we also support. So, in the case of Slack, the approval could look like this: somebody who is subscribed to this channel, or it could be a direct message to somebody who doesn't really know what's going on in this data pipeline orchestration, right? He has his responsibilities and he doesn't want to log into the controller to approve it. He wants to do it right from within his messenger, right? So, we added the ability here to approve the continuation of this workflow, in this case, by just a click of a button. Alright. So we approved the our orchestration, and we can already see our approval task, recognize the approval, and already continued with the subsequent tasks. So what we're doing here in the background is we actually upload the data that we extracted from our sources to cloud storages. And in the case of the SAP extract, we upload it to Azure Blob Storage. In the case of the SQL Server extract, we upload it to S3. Once this is done, we trigger our Informatica ETL tool to transform the data. So something interesting here about the workflow that I wanted to, give some details on. So Informatica has its own integrations to Azure and S3, right? Because Informatica also has a workflow engine. There are subsequent tasks running in Informatica, and Informatica knows how to get its data from a cloud storage in this case. Right? So, in this case, we don't want to push the data all the way to Informatica for it to use. We want to make use of the existing integrations, in this case, point to point integrations, to get the data from our Azure to our Informatica so the transformation can start. The same goes for Snowflake, which is something starting after the transformation has been done. We want to load the data into our data warehouse. So in this case, Snowflake also has integrations to Azure. So what is actually happening in the background or when if we want to say the the data flow is like this, we upload it to our cloud storage. Informatica takes it or pulls it from from our cloud storage. So the arrow is first in this direction, then back. It transforms it. It puts the data back. And then our Snowflake also takes it from Azure. So the data flow would look a little bit different. And this is just just one example how data flow and control flow, could be different. In our case, we only care about the orchestration. Right? So this is how we orchestrate those tools. Okay. So while I was talking, the most important task has been finished already. So what happened in the background is we transformed the data from our sales central region. We uploaded it to our data warehouse, which Tableau is actually pointing to to get its data, and we sent an email out to our consumer, our business user, that, hey, by the way, the central data is not now available. You can go back to your dashboard and hit the refresh button. As he refreshed the dashboard, he sees, Okay, there's actually the new central data is available now. And he's happy. Right? And in order to save time, we also sent him in the meantime a very simple email to say, hey, by the way, the central region data was successfully ingested. Please check your dashboard again. It could also look different. This is just one way to notify him, so he doesn't have to call anyone to know, hey, when is new data available? He gets notified immediately and everyone is happy. Something that happens on the site here is to just show how also different teams can be part of the same orchestration flow here is maybe some data analysts also want to use the data that we ingested or transformed to do or to train their machine learning algorithms, for example. So we also have integrations to Data Factory and also to Databricks. So the data analytics team, in this case, uses another tool to, yeah, to manipulate the data and to train their algorithms. And we just fed this data to their compute cluster of Databricks. So the analytics team uses a different BI tool. Then in Tableau, they use Power BI, and we deliberately put a failure to this task to show how alerting could look like. Because in this case, our our business user is happy. Right? He has his data, but the other team didn't get their data yet because something went wrong. So in this case, I manipulated the domain here, so it fails on purpose. And, someone who's managing this data pipeline gets a notification. In this case, we added a ServiceNow incident ticket to it. So, as you can see, just a couple of minutes ago, when it was started, we raised an alert in ServiceNow, and we even attached the output. Okay. This one I have to keep. So just to give you a quick look into what the operator in this case sees from this failure. So in this case, of course, this would be automatically assigned to someone who is responsible for Power BI. Right? So he would see immediately what's wrong. In this case, the domain that we are trying to reach does not exist. So he would then go directly to the task here and fix it. In this case, we just updated this value. And then you would, right from this monitor, rerun the Power BI task, and then you can see it's now running. This will take a couple of minutes for it to finish. Maybe it will finish in time so I can show you the end result. But moving back to our dashboard, we will see a lot of things happened. In the meantime, we only have a couple of more tasks running for data pipeline to finish. Some notifications went out, so the data architect or engineer who is responsible knows what has happened. Everybody is on the same page, and there is no inefficiencies in communication since there are so many different actors involved. Right? Okay, this one keeps running for a bit longer. Would say I hand over back to you, Scott, to wrap it up. So I love that demo. It's one my favorite because it really illustrates not only the data pipeline, but the use of our sales, our teams and Slack integration, Slack specifically at that one, the ServiceNow integration, all the different communication methods. This is how teams are collaborating to not only build, but then manage these pipelines as they continue to run. There's a funny anecdote that I'll add in here real quick. And at the end there, you hit rerun on the Power BI piece. And I was speaking with a customer recently, and they liked the rerun feature so much that they've actually, they talk about rerun as if it's like a person, like, oh, just go talk to rerun, rerun it. It's like a proper noun for them now because it's such a cool feature that they enjoy, and it's like a person that they go interact with to make things rerun. So I got a kick out of that, and I thought I'd share it. But moving on to this data pipeline again, there's a couple of just integration, new integrations that I wanted to highlight. So first is the one on the left here. We call it inter cloud or multi cloud data transfer between any cloud storage providers. This is a really cool integration if you haven't gotten it yet. And what this allows you to do is move data as it implies between any cloud storage provider. And your traditional way of doing this, if you're on different cloud storage service providers, like let's say you have Azure and AWS, is you have to use an intermediary storage system. With this integration, that intermediary storage system disappears. You're just directly streaming that data based on events or anything else, time based batch or event based real time between these providers. And it's got this one integration has not just the ones I list here like a dupe or SharePoint or Dropbox or AWS and those sorts of things. It's got like a huge long list. Now they're not all active yet, but it's growing and it's one that I would recommend going and checking out. And if you go to the integration hub in the handout section, I think it's one of the first ones you see on the page, on the home page there. The other one I wanted to mention is Kafka. So we've gotten a request for Kafka a few times. For those of you not familiar with Kafka, this is it's really a streaming data tool. And so this is going to be ready by the end of October. For all intents and purposes, it's ready now. If there's anybody that's dying to try it, reach out to me using the email address you see at the end, and maybe we can get some beta testers on it or something like that, but we'll see if we can connect you and get you going. But it's going to publish events to Kafka. It's going to be an event monitor for Kafka and it's going to trigger and wait. The wait piece is critical, right? Because Kafka works, it's just streaming. It's always there waiting just like the Postscript monitor that more it's used in the workflow that he showed at the beginning of the workflow. So that's a really cool one. And I want to make time for Q and A, so I'm going to burn through this slide as a summary. So I think the biggest points are these. The Stonebranch orchestration tool is best for companies that have existing data tools that are not centralized on any single platform and they want to keep, right? We're going to connect the dots between vendors, between tool types, and it helps people graduate from that open source scheduler to an enterprise grade platform. People are coming to us because they want that single orchestration view across all these tools, and they want to be able to implement data ops methodologies with the dev test prod lifecycle promotion. If you're constantly changing tools within your pipeline, Think back to the survey we did where fifty six percent of people were constantly changing, and then most of everybody else was changing quarterly or monthly. This is a platform that allows you to either build integrations or get them from us or have us build them very quickly. So you're not waiting months for somebody to maybe have an integration on the roadmap that might come out at some point. You will build it, you will build it, it will be part of your pipeline before you know it. And then, you know, as the added bonus for all of this, data pipeline orchestration is one part of a much larger platform. So, you know, one group of people may be doing data pipeline orchestration, another may be doing cloud automation, or another may be doing traditional workload automation. All of these things can be done within the same platform and scale at an enterprise level. So I'll stop there. Let's go to the Q and A. Let me turn it over to Lauren and we can get some questions answered. All right. If anyone has any questions, just type them into the Q and A field and we'll go ahead and pass them over to Scott and Moritz. Okay. Our first question, I think this one's for you, Moritz. When you are extracting the information from each system, where is it stored? In the source system or in some repository in StoneBranch? That's a good question. So depending on how you extract it. Right? So there's various ways of extracting data from your sources. So it could be locally, with the application server or in in your cloud storage, which you then, via APIs, move to an a common on premise location, which acts as a distribution hub maybe for your data. So it really depends on where your data sources are located, but with UIC, there's various, yeah, ways to integrate with it and to pull the data from it either agentless or agent based. Okay. Our next one is, we use Azure Databricks but can only orchestrate within its ecosystem. Will your method allow us to orchestrate the tool in Azure as well as outside of Azure? Yeah. That's a good question. And we talked about it a little bit earlier. I'll I'll reference back to a success story that I talked about on Thursday. And the success story itself was with a giant food manufacturer in Europe, one of the top ten in the world. And they were standardized using Databricks in the Azure world, but they also had Informatica, they also had Snowflake, they also had Databricks. They had all kinds of tools that they wanted to connect to, Tableau being one of them. And what they wound up doing was using our tool to continue to run the Azure stuff, so we just tapped into Azure and ran stuff using the Azure tools. We just orchestrated them, and then we also tapped into the other tools the same way. So you can continue to use the Azure schedulers, the Google schedulers. Instead of having to go into those schedulers to do the scheduling, you can schedule it from our tool, which gives you that centralized place to do it all from. Okay. Our next question, if I already have UAC, is this included in what I have, and how do I get started with it? That's a great question. So if you already have UAC, then you're in luck because all what I have shown is core functionality of the product. Except maybe some of the integrations that we, especially recently, have published. However, the way of getting those integrations, meaning new job types, new task types to your controller is by going to the integration hub and downloading them and then importing them to your controller. So you have all the different job types you need for your data pipeline. And that's a pretty easy thing to do. We've made it so that you just download a file. If you're on something earlier than seven dot o, you have to add that file to your server, kind of the way we've always done it. But in seven dot o and above, with from within the GUI, you just input it. You upload it into the controller directly, and you can start using it right from there. So it's a That's the benefit of the universal integration platform. K. We have another question. I saw your SAP integration in the demo. What version do you integrate with? So you wanna take that one, Laurence? Sure. So the most common versions that our customers currently use with our SAP connector are ECC based on Netweaver. However, some of our customers are also slowly moving to the S4HANA platform. Right? So we with our current SAP connector, we are certified for the Netweaver platform, which also covers S4HANA, so we are able to run SAP jobs on S4HANA systems. Yeah. So so we are basically covering R3, meaning ECC and S4HANA. And Fiori. Right? So Fiori is also another one. So the what we're seeing within the SAP installed base, just because we're on it, is that there's a lot of people that are sort of in that in between stage where they're still running stuff on ECC, they're still running, they're moving over to S4HANA. And what our tool or our platform winds up being is this great in between orchestrator. Because a lot of the orchestrators that are on ECC or third party ones like Redwood's tool that a lot of people have, it only works on one or the other. Right? And so you can schedule both sides of that transition through Stone Branch's single application. Okay, well that's going to conclude our session for today. I want to thank you all for joining us today.
Data analytics is one of the fastest-growing areas of automation. The UAC is perfectly tuned to help you orchestrate the automated tasks and jobs required to architect, engineer, and manage complex data pipelines. So, if you’re limiting UAC to simply manage traditional jobs, you’re only using a fraction of the UAC’s power.
Watch this Stonebranch Online webinar recording to see the Stonebranch data pipeline orchestration solution in action and learn how to:
- Orchestrate complex data pipelines
- Operationalize and scale automation with workflows that connect IT Ops to data teams and developers
- Apply DataOps lifecycle methodologies (dev-test-prod) using the no-code UAC interface or data repositories like GitHub
- Find and use integrations to support the many disparate tools used throughout your data pipelines
Duration: 58:05