Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
June 17, 2021 11:56 pm GMT

Pandas Concepts: Introduction

My Assumptions before you continue...

  • You know basic data structures(list, dict, tuple and set) in python.
  • You are familiar with NumPy Basics. If not, check out my colab notebook where I have explained it from the ground up :)
  • You have already setup your Data Science environment.

Why Pandas if we already have NumPy?

  • Numpy has the following limitations:
    • No support for column names.
    • datatype of all elements must be the same.
    • No pre-built methods for common analysis tasks.
  • Pandas can handle a large amount of data at ease!

Pandas library overcomes the limitations of NumPy, and sometimes it is also referred to as a swis army knife of Data Analysis!

And don't you worry about losing the vectorization power, Pandas is built upon NumPy so internally, it makes use of NumPy code extensively!
imageBy the way, the above image is not an over-exaggeration of pandas' capabilities!

Enough of the theory, let me show you some code otherwise you might leave this blog So fire up your Jupyter notebook/lab or whatever IDE you use for Data Science and let's start!

Lastly, we are going to use a dataset on pokemon to keep things fun and interesting You can find it on kaggle.

Introducing the Pandas Dataframe

It is the primary data structure provided by pandas. Formally, a Dataframe is a 2-Dimensional labeled tabular data structure. It is a 2D NumPy array on steroids in a way.

Here's how you read a CSV file as a pandas Dataframe by passing the file path to the read_csv() function.

#it's a convention to import pandas as 'pd'import pandas as pd#read a dataset on pokemon statspokemon_stats = pd.read_csv("pokemon_stats.csv")pokemon_stats #since it's jupyternotebook, no need of print

image The file path can be relative or absolute.

You will see a table of this sort in your jupyter notebook. (Not all the rows(1028) and columns(51) so that your entire screen is not occupied!)

There's also a read_excel() function if your dataset is in excel spreadsheet form.

#read excel file as a dataframedf = pd.read_excel("path/to/my_dataset.xlsx")

whenever I say df, I am referring to Dataframe.

Displaying first and last few rows

It is a common practice to have a quick glance at some rows from beginning or end.
For this, we have df.head(n=5) and df.tail(n=5) methods where n is number of rows to display and its default value is 5.

#display first four rowspokemon_stats.head(4)

image

#display last four rowspokemon_stats.tail(4)

image

We'll go into the anatomy of Dataframe in the next post. You will get to know that it's more than just a table. Until then, enjoy data science!

Bonus: Where Pandas shine

It is useful whenever the data is structured - data stored in CSV files, excel files, database tables, or simply whenever there is a notion of rows and columns.

There was a bit of jargon involved in the explanations but don't worry we are going to visit these concepts again and again in forthcoming posts :)


Original Link: https://dev.to/kathanvakharia/pandas-concepts-introduction-4n64

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To