From the course: Azure Data Engineer Associate (DP-203) Cert Prep: 4 Monitor and Optimize Data Storage and Data Processing

Schedule and monitor pipeline tests

- [Instructor] Schedule and monitor pipeline tests. It's really important to know how to schedule data enduring jobs because of the fact that there are so many resources associated with it. And there's some critical features necessary in scheduling reoccurring jobs. Like for example, do you have the largest cluster you need, or do you need to use a smaller cluster? Are you able to generate the report in time for when you have a meeting? These are all important considerations. So let's go ahead and take a look at Databricks here and see how you can schedule a reoccurring job inside of a notebook. I would just go through here and collect this notebook into a schedule and then name it something. In this case, I will name this job "Monthly." And I could toggle between either Manual, where I would just manually run it once, or Scheduled and I could again go through here and toggle when I would want it to run. So every month, and I could specify at what particular time that I would want this to run. Now, next up under Cluster, it also can add a brand new cluster here as well. And so in this particular scenario, if I wanted to spin up a very large RAM cluster with 36 cores and a specific version of Spark and Scala, this would be one of the ways that I could do it. Another alternative would be that I could actually tell it to run on the existing cluster that's running. Let's say you have a jobs cluster that's specifically designated for running jobs. Maybe it's actually a very low powered cluster because many your jobs are just reporting jobs and they don't need really huge machines. This is a great way to save money. And then you could go through here and add some parameters. Like for example, the department, you know, would be sales. This is the monthly report, and you could add that in. And then finally, you could go through here and add the alerts to a particular email address. Like in this case, swapping out the example@example.com, adding maybe the Start, the Success, putting those all in. And then when you go through and you say Create you can see there, here we go. It's going to schedule on the second day of the month, monthly, the job was created successfully. And then you can actually if you want, even test it out for the first time. And then what you would do after that would be to toggle back and forth with the actual metrics of the system and make sure that the job has enough resources. And this is a great way to kind of go back and forth and make sure that you're really getting the best performance possible from your jobs is to look at the monitoring for the cluster as well. Finally, if you don't need this job anymore you can just go through here and you can actually either pause it or delete it. In this case, I'm going to go ahead and delete that job.

Contents