Big Data in Construction. Extract Data from PDF.
Convert PDF documents to Text and Graphics. Data Visualisation. Python OCR. Practical Step-by-Step Course for Beginners.
Description
This course is intended to be an initiation to learn #BigData and #MachineLearning with #Python programming for absolute beginners that have no background in programming.
In this course, we will step by step, using the example of real data, we will go through the main processes related to the topic "Big data and machine learning". Since the material turned out to be voluminous, I divided the course into five parts.
⇉ This part - the first part is devoted to the collection and extraction of data from documents. In this course, you will learn how to extract data from PDF documents, drawings and any other documents in PDF format.
⇉ We will work on real data. We will have two sets of data consisting of PDF files that we will transform to the text and to tabular form. We will visualize the received data on the Kaggle platform using python libraries, which will help us to depict our received data in a graphical format.
⇉ During the training process, we will install Python and such libraries as Pandas, seaborn, matplotlib and others. We will upload the received data to the Kaggle platform and here using the “Jupiter Notebook” we will visualize our data and at the end, we will upload our data to the GitHub platform.
⚐ Topics covered in this course:
Lecture 2. Python. Choosing python IDE. Anaconda. Install Python.
How to convert a PDF to text?
Python or Anaconda?
What is the best Python IDE for beginners?
How do I install VS Code?
How do I install Python?
How to run Python in VS Code?
How does Python interpreter choose VS code?
Lecture 3. 1st Dataset. PDF files. Tika OCR. Extracting content and metadata.
How do I convert a PDF to TXT in Python?
How can I iterate over files in a given directory?
Install Apache Tika on Windows.
How to split a string into a list?
Remove blank strings from a list?
Lecture 4. Regular Expression in Python. Pattern matching in Python.
What is regular expression with example?
How to match regular expression in Python?
Debug a regular expression in Python?
What is the regular expression for date format?
How do you check if an array contains a regular expression?
Create loop with regular expression.
Lecture 5. Array und Function in Python. Add data to Array. Create function.
How do you add a string to an array?
How do you find the index of an element in a list?
How can I extract the date from a string?
How to declare and add items to an array in Python?
How do you write a function in Python?
Lecture 6. Pandas DataFrame. Two-dimensional size-mutable, tabular data structure.
Install pandas on Python
How do I create a pandas DataFrame?
Reduce number of columns in a pandas DataFrame
Combine column values into a list in a new column
How to convert array into DataFrame in Python?
How to change column names in pandas Dataframe?
Save a Dataframe as CSV table
Lecture 7. Kaggle. Jupiter Notebook. Create an account. Plotting with matplotlib and seaborn.
Upload a file to kaggle kernel
How do you use kaggle dataset?
Run Jupyter notebook using Kaggle kernels
Convert a CSV to dataframe in Python Jupyter Notebook
How to use the functions of Pandas Dataframe?
Change the date format of a column in pandas
How do I convert a string to datetime Objects in Python?
Calculate Difference Between Two Dates in Pandas Dataframe
How do I delete a column in pandas DataFrame?
Add columns in pandas DataFrame?
How do you visualize a dataset?
How do you plot a DataFrame in pandas?
Lecture 8. 2nd Dataset. Task. Data from PDF. Getting data from PDF drawings.
Independent Work Tasks
Learn to Code - on real data (16 PDF files to chart)
A brief overview of the data in the task
Lecture 9. 2nd Dataset. My solution.
This is my solution.
It may seem very simple and perhaps not the most effective.
Lecture 10. GitHub. Desktop GitHub. Store and manage code
What is GitHub and how do you use it?
What can I use GitHub for?
How do I upload files to GitHub?
Install GitHub Desktop
How to sync with a remote Git repository?
Adding a repository from your local computer to GitHub
⇛ This is a practical course where we will analyse the process of data extraction step-by-step. In this course, you will go through all the steps from installing python to data visualization on the Kaggle platform.
When I got acquainted with the topic “Big Data and Machine Learning” myself, I often came across problems when installing software and errors while installing various libraries and tools for working with big data.
⇉ It took me a lot of time to find the right solutions and I would like to save this time for you.
⚐ To understand the topic, I had to look for a large number of questions for which I received non-targeted answers. In this course, you will find answers to basic questions that are related to the topic of Big Data and Machine Learning. Part 1: Extract Data from PDF.
What You Will Learn!
- How to convert a PDF to text?
- How do I install Python?
- How do you visualize a dataset?
- What is GitHub and how do you use it?
- What is regular expression?
- How do I install VS Code?
- How to run Python in VS Code?
- How do you use kaggle dataset?
- How to install pandas on Python?
- How do I convert a PDF to TXT in Python?
- What is the best Python IDE for beginners?
- How can I iterate over files in a given directory?
- How to Install Apache Tika on Windows?
- How to split a string into a list?
- How do I remove blank strings from a list?
- How does Python interpreter choose VS code?
- How to match regular expression in Python?
- How can I debug a regular expression in Python?
- What is the regular expression for date format?
- How do you check if an array contains a regular expression?
- How to create loop with regular expression?
- How do you add a string to an array?
- How do you find the index of an element in a list?
- How can I extract the date from a string?
- How do you write a function in Python?
- How do I create a pandas DataFrame?
- How to reduce number of columns in a pandas DataFrame?
- How to convert array into DataFrame in Python?
- How to change column names in pandas Dataframe?
- How do I save a Dataframe as CSV table?
- How do I upload a file to kaggle kernel?
- How to run Jupyter notebook using Kaggle kernels?
- How to convert a CSV to dataframe in Python Jupyter Notebook?
- How do I change the date format of a column in pandas?
- How do I convert a string to datetime Objects in Python?
- How to Calculate Difference Between Two Dates in Pandas Dataframe?
- How do I delete a column in pandas DataFrame?
- How do I add columns in pandas DataFrame?
- How do you plot a DataFrame in pandas?
- What can I use GitHub for?
- How do I upload files to GitHub?
- How to install GitHub Desktop?
- How to sync with a remote Git repository?
- Ho adding a repository from your local computer to GitHub?
Who Should Attend!
- Practical Step-by-Step Course for Beginners.
- Beginners who are interested in Big Data and Machine Learning using Python
- This course is for beginners so you do not need any special programming knowledge.
- This course can be opted by anyone (students, developer, manager) who is interested to learn big data.
- Designer
- Architect
- BIM Manager
- BIM Engineer
- BIM Specialist
- Professionals in the AEC industry
- Professionals in the construction industry