Phases of the SQL JOIN

Photo by maggie hung on Unsplash

Anyone working on data projects, with any kind of tabular datasets is bound to come to a situation where they will need to join or bring two or more datasets/tables together to form a more relevant dataset/table for analysis.

In this blog, I go over some of the common scenarios of JOINS. Each case of JOIN is explained best using set logic visualization; Venn diagrams and the corresponding SQL syntax that goes with it (the syntax followed here is as used in PostgreSQL, but the logic behind the joins is the same). With the goal of joining two tables, the following types of JOINS are possible. (Consider the green shaded region in all the figures in each section.)

  1. Inner Join

INNER JOIN

An inner join is similar to an intersection in set theory. The common values between two datasets/tables are selected to form an inner joined table.

Image by Author: Inner Join
SELECT * 
FROM table_A
INNER JOIN table_B
ON table_A.matching_column = table_B.matching_column

An inner join is a symmetrical join where the placement of the two tables in the syntax does not matter in terms of the resulting joined dataset/table.

OUTER JOIN

There are a few different types of outer joins. Outer joins define how the values present in one of the tables being joined is managed.

FULL OUTER JOIN — A full outer join defines a dataset/table with all the information present in both the datasets/tables being joined. The resulting table/dataset will also contain NaNs or Null values for columns that correspond to the opposite table the row data actually comes from.

Image by Author: Full Outer Join
SELECT * 
FROM table_A
FULL OUTER JOIN table_B
ON table_A.matching_column = table_B.matching_column

Using just “JOIN” is synonymous to using the full form “FULL OUTER JOIN”. A full outer join is also a symmetrical join. To practice, here’s a link to a codewars kata detailing a basic JOIN problem: problem for practice.

By adding a WHERE clause to the above full outer join syntax, you can get all the rows that are unique to either of the tables being joined, essentially the opposite of an inner join

Image by Author: Full Outer Join with WHERE clause
SELECT * 
FROM table_A
FULL OUTER JOIN table_B
ON table_A.matching_column = table_B.matching_column
-- further filter after the join to return unique values--WHERE table_A.id IS null OR
table_B.id IS null

LEFT OUTER JOIN — A left outer join returns all the information that are in the left table. In case, there is no match with the right table the resulting values are null.

Image by Author: Left Outer Join
SELECT * 
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.matching_column = table_B.matching_column

A left outer join is not a symmetrical join. Placement of the table names does matter, depending on what you want your resulting table to show.

NOTE: The table name that follows the FROM statement is the ‘left table’ (here table_A) and the table being joined on is the ‘right table’ (table_B).

Values unique only to the left table can be returned by adding a WHERE clause to a left outer join.

Image by Author: Left Outer Join with WHERE clause
SELECT * 
FROM table_A
FULL OUTER JOIN table_B
ON table_A.matching_column = table_B.matching_column
-- filter after the join to return unique values to left table--WHERE table_B.id IS null

RIGHT OUTER JOIN — The mechanics of a right outer join is similar to a left outer join. A right outer join returns all the information that are in the right table.

Image by Author: Right Outer Join
SELECT * 
FROM table_B
RIGHT OUTER JOIN table_A
ON table_B.matching_column = table_A.matching_column

A right outer join is also not a symmetrical join.

NOTE: The table name that follows the FROM statement is the ‘right table’ (here table_B) and the table being joined on is the ‘left table’ (table_A).

Similarly, values unique only to the right table can be returned by adding a WHERE clause to a right outer join.

Image by Author: Right Outer Join with WHERE clause
SELECT * 
FROM table_B
RIGHT OUTER JOIN table_A
ON table_B.matching_column = table_A.matching_column
-- filter after the join to return unique values to right table--WHERE table_A.id IS null

UNION

The UNION operator allows us to concatenate resulting outputs from two or more SELECT statements. The outputs to be pasted together needs to make logical sense and should match up in such a way that they can be stacked on top of each other (essentially, do the column headings match in a meaningful way).

SELECT column_name(s) FROM table_A
UNION
SELECT column_name(s) FROM table_B

SELF-JOIN

A self join is a special case where the tables being joined can be seen as copies of the same table. The table is not actually copied but SQL performs the command as if it were two copies. The syntax follows similarly like a standard JOIN syntax with the same table referenced in both parts

SELECT table_copy_A.col, table_copy_B.col
FROM table AS table_copy_A
JOIN table AS table_copy_B
ON table_copy_A.some_column = table_copy_B.other_column

I love how this post in LinkedIn explains it so clearly and simply by Eric Weber. It’s a very short post so I encourage you to click and read over.

I’ll be back on this post to add some examples of tables and use cases. So, what kind of JOIN are you looking for?

Data Science Student