Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

March 13, 2013 10:09 pm GMT

Relational Databases for Dummies

Web apps can be split into two major components: a front-end that displays and collects information, and a back-end for storing the information. In this article, I’ll demonstrate what a relational database is, and how to properly design your database to store your app’s information.

A database stores data in an organized way so that it can be searched and retrieved later. It should contain one or more tables. A table is much like a spreadsheet, in that it’s made up of rows and columns. All rows have the same columns, and each column contains the data itself. If it helps, think of your tables in the same way that you would a table in Excel.

Fig. 1

Data can be inserted, retrieved, updated, and deleted from a table. The word, created, is generally used instead of inserted, so, collectively, these four functions are affectionately abbreviated as CRUD.

A relational database is a type of database that organizes data into tables, and links them, based on defined relationships. These relationships enable you to retrieve and combine data from one or more tables with a single query.

But that was just a bunch of words. To truly understand a relational database, you need to make one yourself. Let’s get started by getting some real data with which we can work.

Step 1: Get Some Data

In the spirit of the Nettuts+ Twitter clone articles (PHP, Ruby on Rails, Django), let’s get some Twitter data. I searched Twitter for “#databases” and took the following sample of ten tweets:

Table 1

full_nameusernametextcreated_atfollowing_username“Boris Hadjur”“_DreamLead”“What do you think about #emailing #campaigns #traffic in #USA? Is it a good market nowadays? do you have #databases?”“Tue, 12 Feb 2013 08:43:09 +0000″“Scootmedia”, “MetiersInternet”“Gunnar Svalander”“GunnarSvalander”“Bill Gates Talks Databases, Free Software on Reddit https://t.co/ShX4hZlA #billgates #databases”“Tue, 12 Feb 2013 07:31:06 +0000″“klout”, “zillow”“GE Software”“GEsoftware”“RT @KirkDBorne: Readings in #Databases: excellent reading list, many categories: https://t.co/S6RBUNxq via @rxin Fascinating.”“Tue, 12 Feb 2013 07:30:24 +0000″“DayJobDoc”, “byosko”“Adrian Burch”“adrianburch”“RT @tisakovich: @NimbusData at the @Barclays Big Data conference in San Francisco today, talking #virtualization, #databases, and #flash memory.”“Tue, 12 Feb 2013 06:58:22 +0000″“CindyCrawford”, “Arjantim”“Andy Ryder”“AndyRyder5″“https://t.co/D3KOJIvF article about Madden 2013 using AI to prodict the super bowl #databases #bus311″“Tue, 12 Feb 2013 05:29:41 +0000″“MichaelDell”, “Yahoo”“Andy Ryder”“AndyRyder5″“https://t.co/rBhBXjma an article about privacy settings and facebook #databases #bus311″“Tue, 12 Feb 2013 05:24:17 +0000″“MichaelDell”, “Yahoo”“Brett Englebert”“Brett_Englebert”“#BUS311 University of Minnesota’s NCFPD is creating #databases to prevent “food fraud.” https://t.co/0LsAbKqJ”“Tue, 12 Feb 2013 01:49:19 +0000″“RealSkipBayless”, “stephenasmith”Brett Englebert“Brett_Englebert”“#BUS311 companies might be protecting their production #databases, but what about their backup files? https://t.co/okJjV3Bm”“Tue, 12 Feb 2013 01:31:52 +0000″“RealSkipBayless”, “stephenasmith”“Nimbus Data Systems”“NimbusData”“@NimbusData CEO @tisakovich @BarclaysOnline Big Data conference in San Francisco today, talking #virtualization, #databases,& #flash memory”“Mon, 11 Feb 2013 23:15:05 +0000″“dellock6″, “rohitkilam”“SSWUG.ORG”“SSWUGorg”“Don’t forget to sign up for our FREE expo this Friday: #Databases, #BI, and #Sharepoint: What You Need to Know! https://t.co/Ijrqrz29″“Mon, 11 Feb 2013 22:15:37 +0000″“drsql”, “steam_games”

Here’s what each column name means:

MySQL is used at just about every Internet company you have heard of.

full_name: The user’s full name
username: The Twitter handle
text: The tweet itself
created_at: The timestamp of the tweet
following_username: A list of people this user follows, separated by commas. For briefness, I limited the list length to two

This is all real data; you can search Twitter and actually find these tweets.

This is good. The data is all in one place; so it’s easy to find, right? Not exactly. There are a couple problems with this table. First, there is repetitive data across columns. The “username” and “following_username” columns are repetitive, because both contain the same type of data — Twitter handles. There is another form of repetition within the “following_username” column. Fields should only contain one value, but each of the “following_username” fields contain two.

Second, there is repetitive data across rows.

@AndyRyder5 and @Brett_Englebert each tweeted twice, so the rest of their information has been duplicated.

Duplicates are problematic because it makes the CRUD operations more challenging. For example, it would take longer to retrieve data because time would be wasted going through duplicate rows. Also, updating data would be an issue; if a user changes their Twitter handle, we would need to find every duplicate and update it.

Repetitive data is a problem. We can fix this problem by splitting Table 1 into separate tables. Let’s proceed with first resolving the repetition across columns issue.

Step 2: Remove Repetitive Data Across Columns

As noted above, the “username” and “following_username” columns in Table 1 are repetitive. This repetition occurred because I was trying to express the follow relationship between users. Let’s improve on Table 1‘s design by splitting it up into two tables: one just for the following relationships and one for the rest of the information.

Fig. 2

Because @Brett_Englebert follows @RealSkipBayless, the following table will express that relationship by storing @Brett_Englebert as the “from_user” and @RealSkipBayless as the “to_user.” Let’s go ahead and split Table 1 into these two tables:

Table 2: The following table

from_userto_user_DreamLeadScootmedia_DreamLeadMetiersInternetGunnarSvalanderkloutGunnarSvalanderzillowGEsoftwareDayJobDocGEsoftwarebyoskoadrianburchCindyCrawfordadrianburchArjantimAndyRyderMichaelDellAndyRyderYahooBrett_EnglebertRealSkipBaylessBrett_EnglebertstephenasmithNimbusDatadellock6NimbusDatarohitkilamSSWUGorgdrsqlSSWUGorgsteam_games

Table 3: The users table

full_nameusernametextcreated_at“Boris Hadjur”“_DreamLead”“What do you think about #emailing #campaigns #traffic in #USA? Is it a good market nowadays? do you have #databases?”“Tue, 12 Feb 2013 08:43:09 +0000″“Gunnar Svalander”“GunnarSvalander”“Bill Gates Talks Databases, Free Software on Reddit https://t.co/ShX4hZlA #billgates #databases”“Tue, 12 Feb 2013 07:31:06 +0000″“GE Software”“GEsoftware”“RT @KirkDBorne: Readings in #Databases: excellent reading list, many categories: https://t.co/S6RBUNxq via @rxin Fascinating.”“Tue, 12 Feb 2013 07:30:24 +0000″“Adrian Burch”“adrianburch”“RT @tisakovich: @NimbusData at the @Barclays Big Data conference in San Francisco today, talking #virtualization, #databases, and #flash memory.”“Tue, 12 Feb 2013 06:58:22 +0000″“Andy Ryder”“AndyRyder5″“https://t.co/D3KOJIvF article about Madden 2013 using AI to prodict the super bowl #databases #bus311″“Tue, 12 Feb 2013 05:29:41 +0000″“Andy Ryder”“AndyRyder5″“https://t.co/rBhBXjma an article about privacy settings and facebook #databases #bus311″“Tue, 12 Feb 2013 05:24:17 +0000″“Brett Englebert”“Brett_Englebert”“#BUS311 University of Minnesota’s NCFPD is creating #databases to prevent “food fraud.” https://t.co/0LsAbKqJ”“Tue, 12 Feb 2013 01:49:19 +0000″Brett Englebert“Brett_Englebert”“#BUS311 companies might be protecting their production #databases, but what about their backup files? https://t.co/okJjV3Bm”“Tue, 12 Feb 2013 01:31:52 +0000″“Nimbus Data Systems”“NimbusData”“@NimbusData CEO @tisakovich @BarclaysOnline Big Data conference in San Francisco today, talking #virtualization, #databases,& #flash memory”“Mon, 11 Feb 2013 23:15:05 +0000″“SSWUG.ORG”“SSWUGorg”“Don’t forget to sign up for our FREE expo this Friday: #Databases, #BI, and #Sharepoint: What You Need to Know! https://t.co/Ijrqrz29″“Mon, 11 Feb 2013 22:15:37 +0000″

This is looking better. Now in the users table (Table 3), there is only one column with Twitter handles. In the following table (Table 2), the is only one Twitter handle per field in the “to_user” column.

Edgar F. Codd, the computer scientist who layed down the theoretical basis of relational databases, called this step of removing repetitive data across columns the first normal form (1NF).

Step 3: Remove Repetitive Data Across Rows

Now that we’ve fixed repetions across columns, we need to fix repetitions across rows. Since the users @AndyRyder5 and @Brett_Englebert each tweeted twice, their information is duplicated in the users table (Table 3). This indicates that we need to pull out the tweets and place them in their own table.

Fig. 3

As before, “text” stores the tweet itself. Since the “created_at” column stores the timestamp of the tweet, it makes sense to pull it into this table as well. I also include a reference to the “username” column so we know who published the tweet. Here is the result of placing the tweets in their own table:

After the split, the users table (Table 5) has unique rows for users and their Twitter handles.

Edgar F. Codd called this step of removing repetitive data across rows the second normal form (1NF).

An Interest In:

Web News this Week

Some of Our Sources

Help Webnuz

Relational Databases for Dummies

Step 1: Get Some Data

Step 2: Remove Repetitive Data Across Columns

Step 3: Remove Repetitive Data Across Rows

Step 4: Linking Tables with Keys

Relational Database Management Systems

Structured Query Language (SQL)

Conclusion

TutsPlus - Code