Data Engineering Fundamentals with Prefect Workflow
Data Engineering Fundamentals with Prefect Data pipeline using Oracle Cloud Infrastructure - VM and Autonomous DB
Description
Data engineering is the process of designing and building systems that let people collect and analyze raw data from multiple sources and formats. These systems empower people to find practical applications of the data, which businesses can use to thrive.
Companies of all sizes have huge amounts of disparate data to comb through to answer critical business questions. Data engineering is designed to support the process, making it possible for consumers of data, such as analysts, data scientists and executives, to reliably, quickly and securely inspect all of the data available.
About a decade back, the data analysis was merely on the structured data available on the a Relational data base or in ERP system and any decision was made based on analysis of the historic data and tools like ETL (extract, Tranform & load) was used for datawarehousing system. However in this dynamic ever changing world, non relational data base information need to used for quick analysis.
So apart from transactions in database, the other source of web information from CSV, webhooks, http & MQTT need to taken care as appropriate.
Further more, the process of ETL as evolved into Data pipelines. A data pipeline is a method in which raw data is ingested from various data sources and then ported to data store, like a data lake or data warehouse, for analysis. In data pipe line task dependency can be build with different task. These task can be also based on some events happening like Order booked or Issues raise which can trigger a task. For this concepts of Webhooks are used.
Prefect is one such newly evolved data pipeline or workflow tool, in which one can build not only static task dependency, but these task dependency can be built based on some event happeningas well.
This course uses the cloud version Prefect worflow tool which can be invoked from a cloud based virtual machine. Knowledge of Python & shell scripting is essential.
This course covers following topic:
•Difference between Data Engineering Vs Data Analysis Vs Data Science
•An Overview about Data Science, Machine Learning & Data Science.
•Extract, Transform, Load vs Data pipeline.
•Provisioning Oracle Linux Virtual machine On Oracle Cloud Infrastructure.
•Prefect Cloud Data pipeline and Client VM Set up.
•Documentation reference - Prefect Workflow / Data pipelines.
•Hands-on Demonstration of Perfect Flow with Tasks dependency.
•Building Prefect dataflow pipeline for Oracle Database extract using Python.
•Introduction to Webhooks and Hands-on Demonstration with Prefect & Github.
•Career Path for Data Engineers
Happy Learning!
What You Will Learn!
- What is Data Engineering and its difference with Data Analysis and Data Science
- Provisioning of Virtual Machine and Oracle Cloud Autonomous Database in Oracle Cloud Infrastructure
- Introduction to Data Pipeline workflow tool - Prefect.
- Demonstration fo Prefect client with prefect Dash Board & its integration
- Building up and executing tasks using Python prefect libraries, task dependencies, views in Perfect dashboard
- Demonstation of Webhooks with Prefect.
Who Should Attend!
- Computer science students
- IT consultants