structured query language

cheat sheet. you’re welcome.

SQL is the language used for RDBMS, relational database management systems. Unlike Excel, SQL allows users to set up rules and relationships between databases and handle big data.

SQL started in the 1970’s by IBM. Back then, it was called SEQL, Structured English Query Language.

In 1986, SQL was standardized by ANSI, American National Standards Institute, followed by ISO, International Organization for Standardization, in 1987.


Keyword Definitions

Aggregation - combining data in a categorical level (column); e.g. COUNT(*), SUM(column), AVG(column)

Clause - built-in functions that make up a statement; e.g., SELECT, FROM, WHERE

Column - or “field”; data of the same type

Database - an organized collection of structured information; e.g. collection of tables

Drop - remove the entire database or table (different from truncate)

Expression - values from predicates; case sensitive; e.g., ‘CA’ is the expression in WHERE state = ‘CA’

Foreign Key Relation - where referencing key column with a column from another table

Logical Operator - part of the predicate; e.g., AND, OR

Predicate - conditions; e.g., WHERE clause

Primary Key - the column that uniquely identifies each record in the table; values are unique and NOT NULL; e.g. employee ID can be a primary key since unchangeable and unique per employee vs. employee last name should not be primary key because changeable or possible duplicates

Query - a statement that returns records

R/DBMS - (relational) database management system; e.g. mySQL (most popular, open source), SQL Server (Microsoft), Oracle

Relation Schema - or “table schema”; a format of the records or attributes of the table; i.e. column headers and data types

Relation Instance - a finite set of tuples in the relational database system; no duplicate records

Row - or “record”; a set of data

Schema - the layout/definition of how fields, tables, and relationships are set up; the “schema” of a database

Statement - what you write in SQL to get an answer or make a change

Sub-query - or “subselect”; a query within a query; use parenthesis; usually after the FROM, INSERT or CREATE TABLE AS

Syntax - formula; an arrangement of words that form a statement

Table - made of fields (columns) and records (rows)

Truncate - remove all the rows (records) from the table

Tuple - a single row/record of a table


Types of SQL Commands

DCL - data control language; deals with rights, controls, and permissions of the database system

DDL - data definition language; edit the structure or schema of the database; e.g., create, drop, rename, alter tables

DML - data manipulation language; manipulate data itself or CRUD (credit, read, update, delete)

DQL - data query language; perform queries on the data within schema objects

TCL - transaction control language; deals with transactions within the database


Data Types


Basic Rules

  • SQL is not case-sensitive

  • SQL is whitespace independent

  • Starting with ( /* ) and ending with ( */ ) will make a block comment, usually for introductions - see etiquette

  • Adding two hyphens ( -- ) in the front will turn the statement into a linear comment

  • Format for date is ‘yyyy-mm-dd’ (apostrophes included)

  • Using single (‘) or double (“) apostrophes converts to text or string

  • If the font turns blue, it usually indicates that it’s a reserved keyword

  • When filtering null values, you can either write WHERE NOT (column = 0) or WHERE column IS NOT NULL

  • A schema must be selected to run queries (either double click or run the USE statement)

  • Use commas ( , ) to return multiple columns from the SELECT clause; e.g. SELECT column1, column2

  • Using an asterisk ( * ) following the SELECT clause returns all headers; e.g. SELECT *

  • You can use arithmetic operators with columns; this will create a new column in the output; e.g. SELECT column1 + column2 AS sum

  • Field names with spaces need to be inside backticks ( ` )

  • Create header alias with keyword AS; e.g. SELECT stateorigin AS state

  • Every statement must be terminated with a semicolon ( ; )

  • If there are multiple statements, highlight the code you want to run

  • F5 or Ctrl + Enter are shortcuts to run queries; otherwise, use the lightning button on top (MySQL)

  • When writing queries, press TAB to auto-complete statements

  • Look up specific columns from specific tables with prefix and period; syntax = table_name.column_name

  • Look up specific tables from specific database with prefix and period; syntax = database_name.table_name

  • Create table alias right after table name; rest of statement can reference the new alias; e.g. FROM people ppl


Order of Operations

  • Order of operations: anything within parenthesis, multiplication/division, addition/subtraction

  • The order of clauses is important when writing a query. It helps you optimize processing time and understand the semantics of SQL coding

  • The order in which we write code is not the same order that the SQL engine processes it

  • Lexical (front-end)

    1. SELECT

    2. DISTINCT

    3. <aggregation>

    4. FROM, including JOIN

    5. WHERE

    6. GROUP BY

    7. HAVING

    8. ORDER BY

    9. LIMIT and OFFSET

  • Logical (back-end)

    1. FROM, including JOIN

    2. WHERE

    3. GROUP BY

    4. <aggregation>

    6. HAVING

    7. <WINDOW functions>

    8. SELECT

    9. DISTINCT

    10. UNION

    11. ORDER BY

    12. LIMIT and OFFSET

  • This is why we cannot use aliases (e.g. SELECT fullguest AS name) in the GROUP BY clause because the SELECT statement had not been run yet

  • Using subqueries would speed up most queries because SQL will follow the logical order of operations starting from the innermost sub-queries (which reduces the size of the dataset)


Clauses & Keywords

SELECT - which columns to return in the query

SELECT DISTINCT - returns only unique values

SELECT COUNT - returns the number of records in the specified column

FROM - which table to source from

JOIN - merge data from another table

WHERE (or WHERE NOT) - adding conditions to the statement

GROUP BY - group records of same values; when working with aggregated columns (e.g., SUM(), COUNT()), make sure to GROUP BY all the non-aggregated columns

HAVING - adding conditions to GROUP BY; unlike WHERE, HAVING can apply to aggregated data (e.g., SUM(column1) AS total)

ORDER BY - sort order; the default is ASC (ascending) order

ASC - ascending order

DESC - descending order

CAST __ AS CHAR - sort in alphanumeric order; e.g., 100 will appear before 68 because ‘1’ comes before ‘6’

CAST __ AS INT - sort in numerical order; syntax ORDER BY CAST(header_name AS INT)

ORDER BY 1, 2 - sort by first column then the second column

LIMIT n - stop returning results after n results have been returned; syntax LIMIT 10 (this will show up to 10 results)

OFFSET - use with LIMIT clause to skip results; syntax LIMIT 5 OFFSET 5 (this will show the 2nd half of 10 results)


Joining Databases

Example of JOIN outputs

JOIN - (or inner join) look up a different table in statement; syntax SELECT * FROM table1 JOIN table2 ON table2.column = table1.column

USING - shorter syntax for JOIN if header name and number of columns are the same; syntax SELECT * FROM table1 JOIN table2 USING (column1, column2)

LEFT JOIN - (or left outer join) returns all records from the left table even if NULL or no match

RIGHT JOIN - (or right outer join) returns all records from the right table even if NULL or no match (not common)

FULL OUTER JOIN - returns all records including NULL or no match from both tables

CROSS JOIN - each record in table1 gets each record in table2; every possible combination; no ON clause needed; explicit syntax SELECT * FROM table1 CROSS JOIN table2; implicit syntax (not recommended) SELECT * FROM table1, table2

UNION - combines two syntaxes; e.g., SELECT column1, column2, “Active” as status FROM table WHERE date>date2 UNION SELECT column1, column2, “Archive” as status FROM table WHERE date<date2

SELF JOIN - look up value within the same table; e.g. USE database; SELECT table_alias1.column1, table_alias1.column2 FROM table_alias1 JOIN table_alias2 ON table_alias1.column3 = table_alias2.column1

SELF OUTER JOIN - look up value within the same table to show records that are NULL or no match

NATURAL JOIN - combine tables with common columns; NOT recommended because letting machine figure out relationship

IMPLICIT JOIN - lookup multiple tables without the JOIN clause; NOT recommended because confusing for next user

  • You can join multiple tables by just adding another JOIN ON syntax below in the same statement

  • While using JOIN function, if both tables have same header, make sure to qualify in statement by using prefix table.header

  • If table has no column with unique values, you can combine two columns (think =B:B&C:C in excel). Table setting should show more than one key columns. Use syntax SELECT * FROM table1 JOIN table2 ON table1.column1 = table2.column1 AND table1.column2 = table2.column2


Operators

  • AND - both conditions must be true; e.g. WHERE height > 65 AND weight < 200

  • OR - at least of the conditions is true; e.g. WHERE height > 65 OR weight < 200

  • NOT - negate condition; e.g., WHERE NOT (height < 60)

  • LENGTH - filter records with a specific number of characters; e.g. WHERE LENGTH title <= 25

  • IN - meets any of the multiple conditions; e.g., WHERE state IN (‘NJ’, ‘CA’, ‘TX’)

  • BETWEEN - between; e.g., WHERE score BETWEEN 5 AND 10

  • ALL - all of the subquery values meet the condition

  • ANY - any of the subquery values meet the condition

  • EXISTS - subquery returns one or more records

  • SOME - any of the subquery values meet the condition

  • LIKE - returns strings with certain characters; NOT case-sensitive; e.g., WHERE state LIKE ‘%ny’

    • % is used as a ‘wild card’ regardless of the number of characters

    • _ is used as a placeholder for any single character, including spaces

    • BINARY where 1 = match and 0 = not match; case sensitive; e.g., SELECT * FROM country WHERE BINARY states LIKE ‘New%’

%a will return everything ending with ‘a’
a% will return everything starting with ‘a’
%a% will return everything including with ‘a’
b%a will return everything starting with ‘b’ and ending with ‘a’
b_ _ _ a will return 5-character strings that start with ‘b’ and end with ‘a’
%short%girl% will return results that have the words “short” and “girl”
%short_girl% will return results that have the words “short girl”

  • REGEXP - regular expression; similar to LIKE but searches for more complex strings and patterns; YES case sensitive; e.g., WHERE name REGEXP ‘ly’

    • . (period) is used as a placeholder for any single character, including spaces

    • ‘( )’ parenthesis is a reserved character; escape it by using ‘\\’ (two back slashes) before the open or close parenthesis

    • ‘{ }’ curly brackets are used as character quantifiers

    • whitespace is matched using ‘\s’

    • Adding ‘?’ matches one or zero white spaces; e.g. ‘\s?

    • tabs are matched using ‘\t’

    • newlines are matched using ‘\n’ or ‘\r\n’

    • test out the regex codes on regex101 (external link)

ab will return everything including ‘ab’
^ab will return everything starting with ‘ab’
ab$ will return everything ending with ‘ab’
ab|cd|ef will return everything that has ‘ab’ OR ‘cd’ OR ‘ef’
ab|^cd|ef$ will return everything has ‘ab’ OR starts with ‘cd’ OR ends with ‘ef’
[a-e]f will return anything within brackets (‘a’ to ‘e’) immediately followed by ‘f’
a.b will return everything that has ‘a’, ‘any character’, and ‘b’
ca*t the asterisk will match to any number of the previous thing, in this case ‘a’; e.g. cat, caat, caaat, caaaat
[A-z]+ the plus sign will match to one or more of the previous thing, in this case one or more letters
[0-9]{3} will return everything that includes 3 digits between zero to nine
[A-z1-9]{5}$ will return everything that includes a 5-character string of any letter or non-zero number and nothing else following that