Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
July 19, 2022 04:15 am GMT

Advanced SQL: Going beyond the Basics

Structured Query Language (SQL) makes a great addition to your toolbox. It may not be as popular as Python or JavaScript, but knowing SQL, especially advanced SQL, offers a number of benefits.

For one thing, knowing advanced SQL techniques can help you get better insights from your data, so that you can communicate your ideas more effectively. And since SQL is subject to the same performance bottlenecks and performance issues as other languages, a deeper knowledge will help you optimize your queries.

Finally, additional knowledge will help you stand out from other developers, who might know CRUD operations but not advanced SQL concepts. This gives you a career advantage.

This article will introduce you to some advanced SQL concepts, including subqueries, running totals, and common table expressions (CTEs). To follow along, you can use any database that supports SQL and an editor of your choice. You can also use an online SQL editor.

The following examples use an online SQL editor along with an SQLite database. If you are working with a database such as PostgreSQL, you might have to change the syntax; however, the underlying concepts are the same.

UNION, INTERSECT, and EXCEPT Operators

You can use sample data from this resource. In this example, all the commands have been combined into a single script in this repository. Run the script and it will create three tables with sample data. Below is the schema of the tables:

Agents table schema
Image courtesy of w3resource

The first five rows of each table are shown below:

First five rows

UNION, INTERSECT, and EXCEPT operators help you combine results from two queries. Operators in SQL are similar to math operators while working with sets. You must remember the following rule:

The number of columns and the order of the columns in the table returned by the first query must be equal to the number of columns and the order of the columns in the table returned by the second query. Moreover, the data type of the corresponding columns must be compatible.

Below is the general syntax for using operators in SQL:

SELECT * FROM TABLE1OPERATORSELECT * FROM TABLE2

UNION Operator

The UNION operator returns a combination of the tables from the two queries. By default, the result will not have duplicate values. However, you can use the UNION ALL operator if you want to keep the duplicates.

The following command will return a list of all distinct cities:

SELECT working_area FROM AGENTSUNIONSELECT cust_city FROM CUSTOMER

The following command will return a list of all cities, including duplicates:

SELECT   working_area FROM   AGENTS UNION ALLSELECT   cust_city FROM   CUSTOMER

Below is an image comparing the outputs of UNION and UNION ALL:

UNION vs. UNION ALL

As you can see, in the right table, London and Bangalore are repeated.

INTERSECT Operator

The INTERSECT operator returns the common elements in the tables returned by the first and second query.

-- Returns a list of common cities in both columnsSELECT   working_area FROM   AGENTS INTERSECT SELECT   cust_city FROM   CUSTOMER

The result is shown below:

INTERSECT operator

EXCEPT Operator

The EXCEPT operator returns all the elements in the table returned by the first query that are not also in the table returned by the second query.

SELECT   working_area FROM   AGENTS EXCEPT SELECT   cust_city FROM   CUSTOMER

The above command will return a table with a single row containing Vancouver. Youll note that Vancouver is the only city that is in CUSTOMER.cust_city but not in AGENTS.working_area.

Subqueries

In SQL, you can have a query inside a query, also known as a subquery. A subquery can also have a query inside it. Subqueries take three main forms:

  • As part of a SELECT statement
  • After FROM
  • After WHERE

Subqueries help you break down complex joins and unions into smaller parts to make your query more readable and maintainable, especially since you can use results from existing queries. Subqueries also make debugging easier, because you can debug each query independently. Understanding subqueries also helps you write recursive SQL.

Subquery with SELECT

Look at the tables named AGENTS and ORDERS. You are tasked with returning a table with two columns. The first column should contain the agent_code from the AGENTS table, and the second should contain the sum of the ORDERS.ord_amount column. The second column should have the same value, or the sum repeated multiple times.

First, return the sum of all order amounts:

SELECT   SUM(ord_amount) FROM   ORDERS

The above command will be your subquery. You could use it as shown below to obtain the required result:

SELECT   agent_code,   (    SELECT       SUM(ord_amount)     FROM       ORDERS  ) total FROM   AGENTS

The command in the second line is the subquery. It returns the sum of the ORDERS.ord_amount column. You can also name the second column, either by using AS or by including your preferred name after the subquery. Below is a screenshot of the partial result:

Partial result

Subquery with FROM

The following SQL command returns a table with the total number of order amounts, an average of order amounts, and the number of orders sold by an agent to a customer:

SELECT   agent_code,   cust_code,   sum(ord_amount) total_amnt,   avg(ord_amount) avg_amount,   count(*) AS num FROM   ORDERS GROUP BY   agent_code,   cust_code

You must use the result to find the total number of orders sold by each agent. You can use the above command as a subquery along with FROM:

SELECT   agent_code,   count(agent_code) AS count FROM   (    SELECT       agent_code,       cust_code,       sum(ord_amount) total_amnt,       avg(ord_amount) avg_amount,       count(*) AS num     FROM       ORDERS     GROUP BY       agent_code,       cust_code  ) GROUP BY   agent_code

The above command returns the total number of customers that each agent sold to. Instead of selecting from an existing table, you select from the table returned by your subquery. Below is the partial result:

Partial result

Subquery with WHEN

Now use the AGENTS and CUSTOMER tables to return a table that only contains common names shared by agents and customers.

First, return a table with the names of every customer, using this command:

SELECT   cust_name FROM   CUSTOMER

Now use the IN operator and the table returned by the above query to find the common names. Below is the command, using the previous command as a subquery:

SELECT   agent_name FROM   AGENTS WHERE   agent_name IN (    SELECT       cust_name     FROM       CUSTOMER  )

The WHERE clause checks if the AGENTS.agent_name is present within the table returned by your subquery. It will return a table with two rows: Benjamin and Charles.

Common Table Expressions (CTEs)

In SQL, you create temporary variables to store the results from your subqueries. These variables are only accessible within the outer query. For the earlier example, in which you returned a table with the number of orders sold by each agent, you can store that result as a CTE to make your queries more readable.

WITH CTE_AGENT_CUSTOMER AS (  SELECT     agent_code,     cust_code,     sum(ord_amount) total_amnt,     avg(ord_amount) avg_amount,     count(*) AS num   FROM     ORDERS   GROUP BY     agent_code,     cust_code) SELECT   agent_code,   count(agent_code) AS count,   AVG(total_amnt) AS AVG,   SUM(total_amnt) AS SUM FROM   CTE_AGENT_CUSTOMER GROUP BY   agent_code

CTEs are required when working with recursive SQL.

Recursive CTEs in SQL

Recursion in SQL is like recursion in any other language. You divide a bigger problem into a smaller problem and keep on solving the smaller problem till you reach the terminating condition. If you dont provide a terminating condition, it will send your SQL command into an infinite loop.

The following are two common problems that can be solved using recursion.

First N Numbers

Your task is to use recursion to print a table with numbers from one to twenty. Below is the recursive code:

WITH NUMBER_CTE AS (  SELECT     1 AS num -- Starting Table  UNION     -- Combine Starting Table with Table from Below Query  SELECT     num + 1 as num -- Update Rule  FROM     NUMBER_CTE -- CTE calling itself, ie recursion  WHERE     num < 20 -- Terminating Condition    ) SELECT   * from   NUMBER_CTE -- Displaying the recursive CTE

Consider the first and the second query of the UNION operator separately.

SELECT   1 AS num -- Starting Table

The above is a straightforward query that returns a column called num with the value of 1 in it.

SELECT   num + 1 as num -- Update RuleFROM   NUMBER_CTE -- CTE calling itself, ie recursionWHERE   num < 20 -- Terminating Condition

The above query increments the value of column num by one as long as the original value is less than 20.

In the first run, the result from the first subquery will be:

num------1------

And the result from the second query will be:

num------2------

The UNION operator will combine both tables and update your CTE to the following table:

num------1------2------

In the second query, since you select from your CTE, the two queries along with the UNION operator will be executed again. The first query will again return:

num------1------

The second query will use the CTE from the previous run and increment each row by one, returning the following:

num------2------3------

The UNION operator again combines the tables from both queries. The updated CTE will be:

num------1------2------3------

This process will keep going until the column contains a row with the value 20. In this case, the WHEN condition wont be satisfied and you will stop the recursion process.

Factorial

Now try a SQL command that creates a table with factorial values. Try to find the factorial of 5.

WITH FACTORIAL AS (  SELECT     5 AS NUM,     1 AS PRODUCT -- Starting Table  UNION     -- Combine Starting Table with Table from Below Query  SELECT     NUM - 1 AS NUM,     PRODUCT * NUM AS PRODUCT -- Update Rule  FROM     FACTORIAL -- CTE calling itself, i.e recursion  WHERE     NUM > 1 -- Terminating Condition    ) SELECT   * FROM   FACTORIAL

Your initial table should have two columnsone to store the current multiplier and one to store the current product. The product will be 1 while the first multiplier will be 5.

The result from the first query will be:

num    product------ ---------5        1------ ---------

The result from the second query will be:

num        product--------   --------------5-1 = 4     5*1 = 5--------   --------------

The new CTE will be:

num    product------ ---------5         1------ ---------4         5------ ---------

In the next run, the result from the first query will be the same:

num    product------ ---------5        1------ ---------

The result from the second query will be calculated based on the CTE from the previous run:

num         product---------   ---------5 - 1 = 4   1 * 5 = 5---------   ---------4 - 1 = 3   5 * 4 = 20---------   ---------

The UNION operator will combine both tables. The updated CTE will be:

num     product------  ---------5         1------  ---------4         5------  ---------3        20------  ---------

This process will keep going until you reach the terminating condition.

General Syntax for Recursive CTEs

Based on the above problems, you can see the general syntax for recursive CTEs in SQL:

WITH CTE AS(    -- Starting Table    UNION -- Combine Starting Table with Table from Below Query    -- Updated Table with Update conditions if any    FROM CTE-- CTE calling itself, i.e recursion    -- Terminating Condition)SELECT * FROM CTE

The terminating condition doesnt necessarily have to be a WHERE condition; it can also be a JOIN condition. Youll look at terminating conditions with JOIN in the following cases.

Add Sample Data

Recursion is commonly used to find hierarchical structures in a company or when working with family trees. Youll need to create a new table in your database. The first column will be the agents code, and the second will be their managers agent code. If an agent has no manager, the value in the second column will be NULL. Below is the hierarchical structure you will be adding to your database:

Hierarchical structure

You can find the script to add the data in this repo.

Task One

Say youre given an agents ID and are asked to return a table with the agents manager, skip-manager, skip-skip-manager, etc., all the way to the top. Below is the command to complete this task:

with HIERARCHY_CTE AS (  SELECT     h1.agent_code,     h1.manager_code   from     HIERARCHY h1   WHERE     h1.agent_code == 'A006'   UNION   SELECT     h2.agent_code,     h2.manager_code   FROM     HIERARCHY h2     JOIN HIERARCHY_CTE ON h2.AGENT_CODE = HIERARCHY_CTE.manager_code) SELECT   * FROM   HIERARCHY_CTE

Again, consider each query individually.

SELECT   h1.agent_code,   h1.manager_code FROM   HIERARCHY h1 WHERE   h1.agent_code == 'A006'

The above query returns the agents code and their managers agent code. Below is the output:

agent_code   manager_code---------   ------------A006         A005

Now look at the second query:

SELECT   h2.agent_code,   h2.manager_code FROM   HIERARCHY h2   JOIN HIERARCHY_CTE ON h2.AGENT_CODE = HIERARCHY_CTE.manager_code

This performs a JOIN between the HIERARCHY table and the CTE you created. It tries to find the rows where the agent_code from HIERARCHY matches the manager_code for the CTE. It then returns the agents code and their managers code, or A006s manager and A006s skip-manager.

Below is the output:

agent_code  manager_code---------   ------------A005        A003

The union returns the following table:

agent_code  manager_code---------   ------------A006        A005---------   ------------A005        A003

In the second run, the first query will again return:

agent_code   manager_code---------    ------------A006           A005

The second query, however, will include A003s manager as well. Below is the output:

agent_code manager_code--------- ------------A005        A003--------- ------------A003        A001

The UNION operator combines the tables. Your updated CTE will be:

agent_code    manager_code---------    ------------A006           A005---------    ------------A005           A003---------    ------------A003           A001

A001 has no manager. In the next run, you will hit the terminating condition.

Task Two

In this task, youll assign a level to each agent. The agents with no boss are level 1, the agents who report to them are level 2, and so on. Below is the command that returns this table:

with HIERARCHY_CTE AS (  SELECT     h1.agent_code,     h1.manager_code,     1 AS level   from     HIERARCHY h1   WHERE     h1.manager_code IS NULL   UNION   SELECT     h2.agent_code,     h2.manager_code,     level + 1 AS level   from     HIERARCHY h2     JOIN HIERARCHY_CTE on h2.manager_code == HIERARCHY_CTE.agent_code) SELECT   * FROM   HIERARCHY_CTE

This is similar to the command from your previous task. The main difference is that you start from the top agent, rather than the bottom.

Running Totals

Running totals are also known as cumulative sums. Consider the ORDERS table in your database. It has columns called ord_amount and ord_date. Try to calculate the cumulative sum on each date record in the ord_date column. Below is the command:

SELECT   ord_date,   SUM(ord_amount) OVER (    ORDER BY       ord_date  ) AS Running_Total FROM   ORDERS

The OVER command tells SQL that you want to find the cumulative sum of the ord_amount column over the whole data set. The column in the ORDER BY clause is used to sort the results by date.

CASE WHEN Statements

CASE WHEN statements in SQL are similar to if-else or switch-case statements in other languages. Below is the general syntax for CASE WHEN statements:

CASEWHEN CONDITION THEN RESULTWHEN CONDITION THEN RESULTWHEN CONDITION THEN RESULTELSE RESULTEND Column_Name

The result in ELSE is returned if none of the conditions return true.

The CUSTOMER table has a column called grade with values from 0-4. You are tasked with creating a new column called grade_letter that maps 1 to A, 2 to B, and everything else to F. You can use CASE WHEN statements to complete this task. Below is the command:

SELECT   *,   CASE WHEN grade == 1 THEN 'A' WHEN grade == 2 THEN 'B' ELSE 'F' END grade_letter FROM   CUSTOMER

Conclusion

This guide offered you some hands-on experience in working with more advanced SQL topics. Implementing these tools will help you gain more insights from your data so that you can write better queries in SQL.

If you are working with SQL, you might be interested in CloudQuery. The SQL-based open source cloud asset inventory integrates with major cloud providers including Azure, AWS, and GCP. You can use it to audit and evaluate your cloud assets from a current and historical perspective.

For more on what CloudQuery can do for you, check out its documentation.

This article was originally posted on https://www.realpythonproject.com/advanced-sql-going-beyond-the-basics/Connect with me on LinkedIn: https://www.linkedin.com/in/rahulbanerj

Original Link: https://dev.to/rahulbanerjee99/advanced-sql-going-beyond-the-basics-890

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To