Pandas Tutorial

data science lessons

Pandas Tutorial - Learn how to manage and analyze data using Python

Pandas is the most popular and powerful tool available to perform the entire Data Analysis Life Cycle.
That is, gathering, preparing, analyzing, and presenting data. With Pandas you can gather data from flat files like CSV, text, Excel, and JSON.

You can also read in data from the various popular databases like Microsoft SQL Server,
SQLlite, MySQL, Oracle, etc.

Munging or cleaning data is a breeze in Pandas. From functions to clean up strings to functions
that aggregate data, handle missing values, and provide descriptive statistics for further insights into your data. On top of all this you can present your data using tables and visually impressive charts.

In the following sections we will take you through all the steps to get started coding in Python and Pandas. Don't worry if you have no programming experience, we have written this guide assuming you are starting from ground zero.

Let's Get Started!

Open a Jupyter Notebook and import the Pandas library as shown in the steps below. If you you have never used or even heard about the Jupyter Notebook, please go to my Jupyter/IPython Notebook tutorial before continuing.

Click the Windows button and start typing command prompt as shown below.

open cmd prompt

Please click on the icon as shown below to start the Command Prompt.

run cmd prompt

Using the Command Prompt, type jupyter notebook as shown below to start the Jupyter application. Note that you need to have Python and all associated Jupyter libraries installed for this to work. Again, if you are confused, please go to my Jupyter/IPython Notebook tutorial before continuing.

start jupyter

If all goes well, your browser will open the Jupyter application and you can then proceed to start a new notebook (starting a new notebook was covered in a previous tutorial). If you have made it this far, you are already way ahead of the game, good job! Import the Pandas library as shown below.

import pandas


Now that we are ready to go and we have the Pandas library loaded, let's talk about the basic building block of Pandas. The Pandas dataframe is the basic and most likely the most common data structure you will use while working with Pandas. As a Data Scientist, your main goal is to get your data into a Pandas dataframe. When you get your data into a dataframe, you can then make use of the vast features the library has to offer. Just like you need to get your data into an Excel file to actually do something with Excel, this is the same reasoning for getting the data into Pandas.

The code below shows you how to create a very basic dataframe consisting of two columns using some made up data. Congrats! your very first dataframe.

create dataframe
Where Do I Go From Here?

Enroll in the Free Pandas Tutorial Email Course at the top of this page

There is a lot more to learn about Pandas than just the dataframe. That is why I created the an 13 part Tutorial series that will show you everything you need to become a ninja with Pandas. All you need to do is go to the top of this page and sign up for the course to get started today.