Applied ML: Intro to Analytics with Pandas and PySpark
Hands-on training to analyze and prepare data for Machine Learning using Pandas, Pyspark and SQL
Description
Exploring and preparing data is a huge step in the Machine Learning and Data Science lifecycle as I've already mentioned in my other course "Applied ML: The Big Picture". Being such a crucial foundational step in the lifecycle, it's important to learn all the tools at your disposal and get a practical understanding on when to choose which tool.
This course will teach the hands-on techniques to perform several stages in data processing, exploration and transformation, alongside visualization. It will also expose the learner to various scenarios, helping them differentiate and choose between the tools in the real world projects.
Within each tool, we will cover a variety of techniques and their specific purpose in data analysis and manipulation on real datasets. Those who wish to learn by practice will require a system with Python development environment to get hands-on training.
For someone who has already had the practice, this course can serve as a refresher on the various tools and techniques, to make sure you are using the right combination of tools and techniques for the given problem at hand. And likewise, be extended to interview preparations to refresh memory on best data practices for ML and Data Science in Python.
What You Will Learn!
- Get hands-on experience with the data preprocessing
- Understand the practical differences between tools such as Pandas and Pyspark
- Understand when to use Pandas vs PySpark
- Understand the exploration steps required for Data Science and Machine Learning
Who Should Attend!
- Developers and Analysts curious about the various data maniputation, transformation and analytics tools available in Python