structured query language
cheat sheet. you’re welcome.
SQL is the language used for RDBMS, relational database management systems. Unlike Excel, SQL allows users to set up rules and relationships between databases and handle big data.
SQL started in the 1970’s by IBM. Back then, it was called SEQL, Structured English Query Language.
In 1986, SQL was standardized by ANSI, American National Standards Institute, followed by ISO, International Organization for Standardization, in 1987.
Keyword Definitions
Aggregation - combining data in a categorical level (column); e.g. COUNT(*), SUM(column), AVG(column)
Clause - built-in functions that make up a statement; e.g., SELECT, FROM, WHERE
Column - or “field”; data of the same type
Database - an organized collection of structured information; e.g. collection of tables
Drop - remove the entire database or table (different from truncate)
Expression - values from predicates; case sensitive; e.g., ‘CA’ is the expression in WHERE state = ‘CA’
Foreign Key Relation - where referencing key column with a column from another table
Logical Operator - part of the predicate; e.g., AND, OR
Predicate - conditions; e.g., WHERE clause
Primary Key - the column that uniquely identifies each record in the table; values are unique and NOT NULL; e.g. employee ID can be a primary key since unchangeable and unique per employee vs. employee last name should not be primary key because changeable or possible duplicates
Query - a statement that returns records
R/DBMS - (relational) database management system; e.g. mySQL (most popular, open source), SQL Server (Microsoft), Oracle
Relation Schema - or “table schema”; a format of the records or attributes of the table; i.e. column headers and data types
Relation Instance - a finite set of tuples in the relational database system; no duplicate records
Row - or “record”; a set of data
Schema - the layout/definition of how fields, tables, and relationships are set up; the “schema” of a database
Statement - what you write in SQL to get an answer or make a change
Sub-query - or “subselect”; a query within a query; use parenthesis; usually after the FROM, INSERT or CREATE TABLE AS
Syntax - formula; an arrangement of words that form a statement
Table - made of fields (columns) and records (rows)
Truncate - remove all the rows (records) from the table
Tuple - a single row/record of a table
Types of SQL Commands
DCL - data control language; deals with rights, controls, and permissions of the database system
DDL - data definition language; edit the structure or schema of the database; e.g., create, drop, rename, alter tables
DML - data manipulation language; manipulate data itself or CRUD (credit, read, update, delete)
DQL - data query language; perform queries on the data within schema objects
TCL - transaction control language; deals with transactions within the database
Data Types
Basic Rules
SQL is not case-sensitive
SQL is whitespace independent
Starting with ( /* ) and ending with ( */ ) will make a block comment, usually for introductions - see etiquette
Adding two hyphens ( -- ) in the front will turn the statement into a linear comment
Format for date is ‘yyyy-mm-dd’ (apostrophes included)
Using single (‘) or double (“) apostrophes converts to text or string
If the font turns blue, it usually indicates that it’s a reserved keyword
When filtering null values, you can either write WHERE NOT (column = 0) or WHERE column IS NOT NULL
A schema must be selected to run queries (either double click or run the USE statement)
Use commas ( , ) to return multiple columns from the SELECT clause; e.g. SELECT column1, column2
Using an asterisk ( * ) following the SELECT clause returns all headers; e.g. SELECT *
You can use arithmetic operators with columns; this will create a new column in the output; e.g. SELECT column1 + column2 AS sum
Field names with spaces need to be inside backticks ( ` )
Create header alias with keyword AS; e.g. SELECT stateorigin AS state
Every statement must be terminated with a semicolon ( ; )
If there are multiple statements, highlight the code you want to run
F5 or Ctrl + Enter are shortcuts to run queries; otherwise, use the lightning button on top (MySQL)
When writing queries, press TAB to auto-complete statements
Look up specific columns from specific tables with prefix and period; syntax = table_name.column_name
Look up specific tables from specific database with prefix and period; syntax = database_name.table_name
Create table alias right after table name; rest of statement can reference the new alias; e.g. FROM people ppl
Order of Operations
Order of operations: anything within parenthesis, multiplication/division, addition/subtraction
The order of clauses is important when writing a query. It helps you optimize processing time and understand the semantics of SQL coding
The order in which we write code is not the same order that the SQL engine processes it
Lexical (front-end)
1. SELECT
2. DISTINCT
3. <aggregation>
4. FROM, including JOIN
5. WHERE
6. GROUP BY
7. HAVING
8. ORDER BY
9. LIMIT and OFFSET
Logical (back-end)
1. FROM, including JOIN
2. WHERE
3. GROUP BY
4. <aggregation>
6. HAVING
7. <WINDOW functions>
8. SELECT
9. DISTINCT
10. UNION
11. ORDER BY
12. LIMIT and OFFSET
This is why we cannot use aliases (e.g. SELECT fullguest AS name) in the GROUP BY clause because the SELECT statement had not been run yet
Using subqueries would speed up most queries because SQL will follow the logical order of operations starting from the innermost sub-queries (which reduces the size of the dataset)
Clauses & Keywords
SELECT - which columns to return in the query
SELECT DISTINCT - returns only unique values
SELECT COUNT - returns the number of records in the specified column
FROM - which table to source from
JOIN - merge data from another table
WHERE (or WHERE NOT) - adding conditions to the statement
GROUP BY - group records of same values; when working with aggregated columns (e.g., SUM(), COUNT()), make sure to GROUP BY all the non-aggregated columns
HAVING - adding conditions to GROUP BY; unlike WHERE, HAVING can apply to aggregated data (e.g., SUM(column1) AS total)
ORDER BY - sort order; the default is ASC (ascending) order
ASC - ascending order
DESC - descending order
CAST __ AS CHAR - sort in alphanumeric order; e.g., 100 will appear before 68 because ‘1’ comes before ‘6’
CAST __ AS INT - sort in numerical order; syntax ORDER BY CAST(header_name AS INT)
ORDER BY 1, 2 - sort by first column then the second column
LIMIT n - stop returning results after n results have been returned; syntax LIMIT 10 (this will show up to 10 results)
OFFSET - use with LIMIT clause to skip results; syntax LIMIT 5 OFFSET 5 (this will show the 2nd half of 10 results)
Joining Databases
Example of JOIN outputs
JOIN - (or inner join) look up a different table in statement; syntax SELECT * FROM table1 JOIN table2 ON table2.column = table1.column
USING - shorter syntax for JOIN if header name and number of columns are the same; syntax SELECT * FROM table1 JOIN table2 USING (column1, column2)
LEFT JOIN - (or left outer join) returns all records from the left table even if NULL or no match
RIGHT JOIN - (or right outer join) returns all records from the right table even if NULL or no match (not common)
FULL OUTER JOIN - returns all records including NULL or no match from both tables
CROSS JOIN - each record in table1 gets each record in table2; every possible combination; no ON clause needed; explicit syntax SELECT * FROM table1 CROSS JOIN table2; implicit syntax (not recommended) SELECT * FROM table1, table2
UNION - combines two syntaxes; e.g., SELECT column1, column2, “Active” as status FROM table WHERE date>date2 UNION SELECT column1, column2, “Archive” as status FROM table WHERE date<date2
SELF JOIN - look up value within the same table; e.g. USE database; SELECT table_alias1.column1, table_alias1.column2 FROM table_alias1 JOIN table_alias2 ON table_alias1.column3 = table_alias2.column1
SELF OUTER JOIN - look up value within the same table to show records that are NULL or no match
NATURAL JOIN - combine tables with common columns; NOT recommended because letting machine figure out relationship
IMPLICIT JOIN - lookup multiple tables without the JOIN clause; NOT recommended because confusing for next user
You can join multiple tables by just adding another JOIN ON syntax below in the same statement
While using JOIN function, if both tables have same header, make sure to qualify in statement by using prefix table.header
If table has no column with unique values, you can combine two columns (think =B:B&C:C in excel). Table setting should show more than one key columns. Use syntax SELECT * FROM table1 JOIN table2 ON table1.column1 = table2.column1 AND table1.column2 = table2.column2
Operators
AND - both conditions must be true; e.g. WHERE height > 65 AND weight < 200
OR - at least of the conditions is true; e.g. WHERE height > 65 OR weight < 200
NOT - negate condition; e.g., WHERE NOT (height < 60)
LENGTH - filter records with a specific number of characters; e.g. WHERE LENGTH title <= 25
IN - meets any of the multiple conditions; e.g., WHERE state IN (‘NJ’, ‘CA’, ‘TX’)
BETWEEN - between; e.g., WHERE score BETWEEN 5 AND 10
ALL - all of the subquery values meet the condition
ANY - any of the subquery values meet the condition
EXISTS - subquery returns one or more records
SOME - any of the subquery values meet the condition
LIKE - returns strings with certain characters; NOT case-sensitive; e.g., WHERE state LIKE ‘%ny’
% is used as a ‘wild card’ regardless of the number of characters
_ is used as a placeholder for any single character, including spaces
BINARY where 1 = match and 0 = not match; case sensitive; e.g., SELECT * FROM country WHERE BINARY states LIKE ‘New%’
%a will return everything ending with ‘a’
a% will return everything starting with ‘a’
%a% will return everything including with ‘a’
b%a will return everything starting with ‘b’ and ending with ‘a’
b_ _ _ a will return 5-character strings that start with ‘b’ and end with ‘a’
%short%girl% will return results that have the words “short” and “girl”
%short_girl% will return results that have the words “short girl”
REGEXP - regular expression; similar to LIKE but searches for more complex strings and patterns; YES case sensitive; e.g., WHERE name REGEXP ‘ly’
. (period) is used as a placeholder for any single character, including spaces
‘( )’ parenthesis is a reserved character; escape it by using ‘\\’ (two back slashes) before the open or close parenthesis
‘{ }’ curly brackets are used as character quantifiers
whitespace is matched using ‘\s’
Adding ‘?’ matches one or zero white spaces; e.g. ‘\s?’
tabs are matched using ‘\t’
newlines are matched using ‘\n’ or ‘\r\n’
test out the regex codes on regex101 (external link)
ab will return everything including ‘ab’
^ab will return everything starting with ‘ab’
ab$ will return everything ending with ‘ab’
ab|cd|ef will return everything that has ‘ab’ OR ‘cd’ OR ‘ef’
ab|^cd|ef$ will return everything has ‘ab’ OR starts with ‘cd’ OR ends with ‘ef’
[a-e]f will return anything within brackets (‘a’ to ‘e’) immediately followed by ‘f’
a.b will return everything that has ‘a’, ‘any character’, and ‘b’
ca*t the asterisk will match to any number of the previous thing, in this case ‘a’; e.g. cat, caat, caaat, caaaat
[A-z]+ the plus sign will match to one or more of the previous thing, in this case one or more letters
[0-9]{3} will return everything that includes 3 digits between zero to nine
[A-z1-9]{5}$ will return everything that includes a 5-character string of any letter or non-zero number and nothing else following that