Phases of the SQL JOIN
Anyone working on data projects, with any kind of tabular datasets is bound to come to a situation where they will need to join or bring two or more datasets/tables together to form a more relevant dataset/table for analysis.
In this blog, I go over some of the common scenarios of JOINS. Each case of JOIN is explained best using set logic visualization; Venn diagrams and the corresponding SQL syntax that go with it (the syntax followed here is as used in PostgreSQL, but the logic behind the joins is the same). With the goal of joining two tables, the following types of JOINS are possible. (Consider the green shaded region in all the figures in each section.)
- Inner Join
- Outer Join
- Unions
- Self-Join
INNER JOIN
An inner join follows the same logic to an intersection in set theory. All records or rows that match values in the key columns between two tables are selected to form an inner joined table.
SELECT *
FROM table_A
INNER JOIN table_B
ON table_A.matching_column = table_B.matching_column
An inner join is a symmetrical join where the placement of the two tables in the syntax does not matter in terms of the resulting joined dataset/table.
OUTER JOIN
There are a few different types of outer joins. Outer joins define how the values present in one of the tables being joined is managed.
FULL OUTER JOIN — A full outer join defines a dataset/table with all the information present in both the datasets/tables being joined. The resulting table/dataset will also contain NaNs or Null values for columns that correspond to the opposite table the row data actually comes from.
SELECT *
FROM table_A
FULL OUTER JOIN table_B
ON table_A.matching_column = table_B.matching_column
Using just “JOIN” is synonymous to using the full form “FULL OUTER JOIN”. A full outer join is also a symmetrical join. To practice, here’s a link to a codewars kata detailing a basic JOIN problem: problem for practice.
By adding a WHERE clause to the above full outer join syntax, you can get all the rows that are unique to either of the tables being joined, essentially the opposite of an inner join
SELECT *
FROM table_A
FULL OUTER JOIN table_B
ON table_A.matching_column = table_B.matching_column-- further filter after the join to return unique values--WHERE table_A.id IS null OR
table_B.id IS null
LEFT OUTER JOIN — A left outer join returns all the information that are in the left table. In case, there is no match with the right table the resulting values are null.
SELECT *
FROM table_A
LEFT OUTER JOIN table_B
ON table_A.matching_column = table_B.matching_column
A left outer join is not a symmetrical join. Placement of the table names does matter, depending on what you want your resulting table to show.
NOTE: The table name that follows the FROM statement is the ‘left table’ (here table_A) and the table being joined on is the ‘right table’ (table_B).
Values unique only to the left table can be returned by adding a WHERE clause to a left outer join.
SELECT *
FROM table_A
FULL OUTER JOIN table_B
ON table_A.matching_column = table_B.matching_column-- filter after the join to return unique values to left table--WHERE table_B.id IS null
RIGHT OUTER JOIN — The mechanics of a right outer join is similar to a left outer join. A right outer join returns all the information that are in the right table.
SELECT *
FROM table_B
RIGHT OUTER JOIN table_A
ON table_B.matching_column = table_A.matching_column
A right outer join is also not a symmetrical join.
NOTE: The table name that follows the FROM statement is the ‘right table’ (here table_B) and the table being joined on is the ‘left table’ (table_A).
Similarly, values unique only to the right table can be returned by adding a WHERE clause to a right outer join.
SELECT *
FROM table_B
RIGHT OUTER JOIN table_A
ON table_B.matching_column = table_A.matching_column-- filter after the join to return unique values to right table--WHERE table_A.id IS null
UNION
The UNION operator allows us to concatenate resulting outputs from two or more SELECT statements. Essentially appends rows of similar of similar data. The outputs to be pasted together needs to make logical sense and should match up in such a way that they can be stacked on top of each other (essentially, do the column headings match in a meaningful way). Other assumptions:
- The column names match
- The column datatype matches
- Both tables have same number of columns
SELECT column_name(s) FROM table_A
UNION
SELECT column_name(s) FROM table_B
SELF-JOIN
A self join is a special case where the tables being joined can be seen as copies of the same table. The table is not actually copied but SQL performs the command as if it were two copies. The syntax follows similarly like a standard JOIN syntax with the same table referenced in both parts
SELECT table_copy_A.col, table_copy_B.col
FROM table AS table_copy_A
JOIN table AS table_copy_B
ON table_copy_A.some_column = table_copy_B.other_column
I love how this post in LinkedIn explains it so clearly and simply by Eric Weber. It’s a very short post so I encourage you to click and read over.
I’ll be back on this post to add some examples of tables and use cases. So, what kind of JOIN are you looking for?