Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
September 23, 2021 07:05 pm GMT

Product developers guide to getting started with AI Part 2: Surfing through dataframes

TLDR

Surfing through data is a quick and simple way to put all relevant information on the top. To go from a searching experience to a surfing experience, all it takes is a strong filtering, sorting, and grouping procedure.

Outline

  • Introduction
  • Before we begin
  • Filtering dataframes
  • Conditions
  • Grouping dataframes
  • Sorting dataframes
  • Conclusion

Introduction

In the past, people had to use the Dewey Decimal System in order to discover new information to answer questions and perform research. It was a tedious and monotonous process that required you to pick a topic, then look through each in an unchanging order. Following the Dewey Decimal System, the search results for a person looking for answers on How to cook is the exact same for another, even when they are looking for a different answer.

Alt Text(Source: Reddit)

Nowadays, when people have a question or want to learn, they surf the web to find an answer. You found this article and clicked it probably because it has the answers to what youre looking for. Search engines visit sites that collect tons of data and employ techniques to decide which piece of data goes on top.

Alt Text

To surf through a dataframe follow these 3 steps: filter, order, and group. This transforms your user experience from searching to surfing.

Before we begin

In this guide, well be using the Titanic dataset along with Google Collab. Well import the dataset, look at the metadata to find the best filters, then slice groups of data down in order to cleanly surf. If you need a refresher on anything mentioned above, please refer back to part 1 to learn how to set up the environment and view metadata.

Alt Text

Filtering Dataframes

Back on the Titanic, everyone is boarding the ship and you find a ticket on the floor. But it doesnt say anything else about the name of the person on it. From the writing you can see the 1st 4 digits, 3734, with the remaining being too dirty to figure out. Being the good samaritan that you are, you tell the staff and begin your search for the owner.

Alt Text

Lets examine entries of the dataframe for the ticket and the name.

To select specifically two columns we can use what we learned in part 1 with loc and iloc, or we may use items. In Pandas, the filter method takes in the items method in order to display columns. For the case of the Ticket, we want to display the Name and Ticket Column. If you are unfamiliar with this, refer back to part 1 and use info to find the metadata.

Alt Text

When comparing strings a great filter to use is contains. In this case we check if the ticket number contains 3734.

Alt Text

We see that the owner is a young man named William Henry Allen.

Alt Text

Great job! We found him, but he looks lost. We chat with William, and he has a request for us. He got separated at the checkpoint and needs help locating his business partners. He forgot their names, but recalls that theyre a group of 4 middle-aged men embarking in group Q.

Alt Text

We have a new problem, but unlike the ticket, this time all we know is that were searching for middle-aged men in group Q. None of these values are unique so we cant only filter. Well need to start expanding our search repertoire with conditionals.

Conditions

Similar to other programming languages, Pandas supports filtering with conditional operators. There are 2 types of operators we will be focusing on, relational and logical. In Pandas, to apply a condition to a dataframe, the syntax is df[(conditions)].

The relational operators are > (greater than), >= (greater than or equal to), < (less than), <= (less than or equal to) , == (equals), != (not equals) and are used to search through a dataframe by comparing all values in a column with a fixed value. Then, to compare multiple columns we chain the conditionals of relational operators together with logical operators & (and), | (or) , ~ (not).

To find all middle-aged men in group Q, lets break it down into 2 parts. The 1st part is to capture all the people in group Q. We can use a single relational operator on the embarked column, df[Embarked] == Q. For the males, df[Sex] == male. Alternatively, we can use not to invert it as df[Sex ~== female. Finally we chain the conditionals together using & (AND) to get all males in group Q.

Alt TextAdditionally, use these (see info) boolean operators to compare dataframes.

Grouping Dataframes

The data is reduced. We begin analysis on how many are in our search space that match what were looking for. In this case were searching for the right age range.

Groupby is useful to get the count of how many values match, and from there we can begin slicing the data with head. Start by breaking down the ages into 3 groups, adolescent to adulthood, middle-aged, and senior. Then define rules for the 3 groups. The ranges will be 031, 3255, and 56+ respectively. We know that age must be defined, so we clear up all empty age values. The [ is inclusive and ( is exclusive.

Create a range of bins and pass it as a parameter for groupby. Use groupby to find the rows of middle-aged and extract.

Alt TextCut down the data into chunks of 031, 3255, and 56+

Sorting Dataframes

The 4 fellows we are looking for are in the middle of the dataframe. We sort it in order and slice specifically starting from 9 to 13 (9+4) to find only the middle-aged men in group Q.

Alt Text

There we have it! Weve reduced the search space from 891 passengers aboard the ship all the way down to 4. Inspecting the dataframe, we see that there are 4 possible people.

Alt TextMr. Patrick Dooley, Mr. John Bourke, Mr. James Farrell, and Dr. William Edward Minahan

Conclusion

William should be able to reunite with his partners now. Just in time as the foghorn blows and the ship begins to sail off. Stay tuned for what happens to the Titanic as it embarks across the sea. Well take a deeper look, in part 3, Terraforming Dataframes.

Alt Text(Source: dearworldlovehistory.com)


Original Link: https://dev.to/mage_ai/product-developers-guide-to-getting-started-with-ai-part-2-surfing-through-dataframes-c7d

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To