calculus

derivatives, integrals, & limits

Derivatives

Derivative - the measure of the rate at which the value y changes with respect to the range of variable x in function y = f(x); the first derivative is written as f’(x) and the second derivative as f’’(x) or f^2(x)

Differentiation - the process of finding the derivative, the rate of change

Coefficient - the number used to multiply the variable

Slope-intercept formula - y = mx + c

m = the slope or “gradient”; rise over run; change in y over change in x; formula is (y2 - y1) / (x2 - x1)
c = intercept; where y crosses the x-axis

Rules/reminders when finding the derivative:

Partial Derivatives

Partial Derivative - the partial derivative of f(x) with several variables (x, y) is the derivative with respect to one of those variables while the others are held as constant

the partial derivative is written as ∂f/∂x (∂ is called curly d or “del”) or fx or fy (x/y is written in subscript, so ‘f sub x’ or ‘f sub y’)
If z = f(x, y), then ∂z/∂x is asking for the partial derivative with respect to ‘x’ (and keep ‘y’ as constant) and ∂z/∂y is asking for the partial derivative with respect to ‘y’ (and keep ‘x’ as constant)
Same rule applies for three variables; e.g. when finding partial derivative for f(x, y, z), f(x) differentiate x while keeping y and z as constants

Rules when finding partial derivatives:

Chain Rule Example: z = x^3y + 3x^2 + y^3, x = 4 + t^2, y = 5t^3

Composite functions. Must use the chain rule to find dz/dt. The formula is [∂z/∂x * dx/dt] + [∂z/∂y * dy/dt]

Partial derivative of z with respect to x and keep y constant
Differentiate x with respect to t
Multiply results from 1 and 2
Partial derivative of z with respect to y and keep x constant
Differentiate y with respect to t
Multiply results from 4 and 5
Add results from 3 and 6

Higher Order Partial Derivatives - or “mixed partial derivative”; in the case of 2 (or more) variables, the partial derivative is taken with respect to one variable. Then, the partial derivative is taken from the result (double derivative) but with respect to the other variable. Example: ∂f/∂x (or fx) is the partial derivative of f with respect to x, and ∂/∂y(∂f/∂x) (or fxy) is the partial derivative of the partial derivative of f with respect to y

Clairaut's Theorem - symmetry of second derivatives; this theorem tells us that the mixed variable partial derivatives are equal
▸ fxy(a,b) = fyx(a,b)
▸fxyz = fyxz = fzxy (each variable was differentiated once)
▸fyyx = fyxy = fxyy (x was differentiated once; y was differentiated twice)

Example: f = x^2y^3 + 4xy^2

fx = 2xy^3 + 4y^2
fxy = 6xy^2 + 8y
fy = 3x^2y^2 + 8xy
fyx = 6xy^2 + 8y

Gradient, Divergence, Curl

Gradient (∇) - like the derivative, the gradient measures how steep a slope is; it denotes the direction of the greatest change of a scalar function

While a multi-dimensional function f(x, y, z) has partial derivatives, the gradient of a function ∇f(x) is the collection of all its partial derivatives into a vector; i.e. a full derivative of a multivariable function
∇ is an upside-down delta sign called “del”, indicating change; or also called “nabla”; nabla refers to the symbol itself, and del refers to the operator it represents
Gradient points to the direction of the greatest increase or steepest ascent; not to be confused with coordinates which indicate location and not direction; e.g. if ∇f(x, y, z) = (5, 10, 81), then you want to move towards z which at that point the gradient equals zero (i.e. maxima)

Scalar - a quantity having only magnitude (size), not direction; e.g. temperature

Scalar valued function - a function that takes one or more values but returns a single value; e.g. f(x,y,z) = x^2+2yz^5

Vector - a quantity having direction as well as magnitude, especially as determining the position of one point in space relative to another; e.g. velocity which is speed + direction

Vector valued function - an extension of scalar functions, but unlike scalar functions, the vector function takes one or more values and returns multiple values; a function where the domain is a subset of the real numbers and the range is a vector; e.g. ∇f(x,y,z) would output three values creating a three dimensional space

Divergence - measure of the quantity of flux (density) emanating from any point of the vector field

Formula is ∇ ⋅ F = ∂F/∂x + ∂F/∂y
Same symbol as Gradient plus dot ∇ ⋅
Divergence is a vector operator that operates on a vector field, producing a scalar field giving the quantity of the vector field's source at each point

Zero-divergence - when ∇F is zero, it means the density stays constant (i.e. fluid flows freely)

Sink - negative divergence; more dense after a momentary fluid motion

Source - positive divergence; less dense during a momentary motion

Hat symbol (^) over i and j indicate “unit vector in the direction of i (or j)”. So in this case, it was asking (x^2 - y^2) in the x-direction and 2xy in the y-direction.

Curl - measures the “rotation” in the vector field

Formula is ∇xF = ∂F(2)/∂x - ∂F(1)/∂y
Same symbol as Gradient plus x ∇ x
Positive number suggests counter-clockwise direction, while negative number suggests clockwise direction
Typically used in three-dimensional fields

Integrals

Integral - or the antiderivative; assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal (very small) data

Example: f(x) = x^2
Differentiation: f’(x) = 2x = g(x)
Integration: ∫g(x) = x^2 (i.e. the opposite of derivative)

Integration - the process of finding the integral or area under the curve; useful for measuring accumulation over time

Integrand - the function that is to be integrated

Definite integral - has a lower and upper limit, thus resulting in a constant result; this formula gives a number (or area)

Indefinite integral - no limits are applied and a mandatory arbitrary constant “C” is applied; this formula gives a function

Rules when finding the integral:

Why add “C”? When initially differentiating a function, the constant disappears because the derivative of any constant is zero. Since integration works backward and one does not know if there was a constant involved initially, this “C” is added as a placeholder.

Integration by Parts - (or partial integration) the process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivative.

The formula is ∫u v dx = u∫v dx −∫u' (∫v dx) dx
Or (shortened) ∫u dv = uv - ∫v du
When determining which function to assign to ‘u’ go in this order:
1. inverse trigonometric functions (sin^-1(x))
2. logarithmic functions (ln(x))
3. algebraic functions (x^3)
4. trigonometric functions (cos(x))
5. exponential functions (e^x)

Limits

Limit - the value that a function approaches as the input approaches some value; describes how a function behaves near a point (instead of at that point exactly)

L'Hospital's Rule - if the result is an indeterminate form (e.g., 0/0 or ∞/∞), then this rule tells us to differentiate the numerator and differentiate the denominator and then take the limit; lim x→c f(x)/g(x) = lim x→c f’(x)/g’(x)

Lines & Curves

Tangent line - a line that touches one point in a curve; it is the instantaneous rate of change of that single point in the curve

Secant line - a line that connects with two points in a curve; it is the average rate of change between the two points of the curve

choose variable x before and after the x used to calculate the tangent line
you want to understand how the slope behaves as you get closer to the tangent slope (reference the function of limit)
use formula (y2 - y1) / (x2 - x1)

Example: f(x) = x^3

Derivative ▸ f’(x) = 3x^2 or f’’(x) = 6x
Slope of tangent line at x = 2▸mtan = f(2) = 12
Slope of secant line at [1, 3] ▸ msec = (y2 - y1) / (x2 - x1) = (f(3)-f(1)) / (3 - 1) = 13
Slope of secant line at [1.9, 2.1] ▸ msec = 12.01
lim x→2 ((fx) - f(2)) / (x - 2)
lim x→2 (x^3 - 8) / (x - 2) (factor difference of cubes)
lim x→2((x - 2)(x^2 + 2x + 4))/(x - 2) (cancel out the (x - 2))
lim x→2 (x^2 + 2x + 4) = 12

Maxima / Minima
Where the slope of the tangent line to the function is zero. Local maxima/minima references to a particular interval, while absolute maxima/minima is the highest/lowest point of the entire domain. There can be multiple points for local but only one absolute.

Steps to find the local min/max:

Set the derivative equal to zero and solve for x or the critical value(s)
Plug in an x value that’s higher and lower into f’(x) to figure out the direction of the slope (negative to positive means minimum, positive to negative means maximum).
If the slope is zero at the critical point x = c (f’(x)=0), then find the second derivative f’’(x).
Plug in the min/max x value into the original function f(x) to find the y-value
Final answer would be that the local min/max is located at (x, y)

Gramma & Beta Functions

Factorial - denoted by (!); the product of all positive integers less than or equal to n; exception for zero 0! = 1; formula is z! = z(z - 1)(z - 2)…1

Gamma Function (Γ) - also “Euler's integral of second kind”; the analytic continuation (a technique to extend the domain of definition of a given analytic function) of the factorial defined at Γ(n)=(n−1)! or Γ(z+1)=zΓ(z) - see proof of integration by parts on the right
But why extend? Factorials can only calculate non-negative integers (a discrete set), and the Gamma function helps generalize the factorial into more complex numbers (a continuous set)

Formula is Γ(z) = lim 0→∞ ∫x^(z-1)e^-x dx
Or also Γ(z+1) = lim 0→∞ ∫(x^z)(e^-x) dx
When z is a real number Γ(z+1) = z! or Γ(z) = (z - 1)!
Defined for all complex numbers except for non-positive integers; accepted inputs are 3, 4.5, -7.2
Interpolates the factorial function to non-integer values; i.e., it fills in the gaps between the factorial points
useful for defining several probability distributions, such as Gamma distribution, Chi-squared distribution, Student's t-distribution, and Beta distribution; these distributions are used for Hypothesis Testing, Bayesian Analysis, building statistical models like LDA (Latent Dirichlet Allocation), and in stochastic processes.
useful for modeling situations involving continuous change
useful in identities and proofs in analytic contexts

Beta Function (β) - also “Euler's integral of the first kind”; forms an association between the input and output sets in integral equations and many more Mathematical operations

Formula is β(x, y) = lim 0→1 ∫(t^(x - 1))*((1 - t)^(y - 1)) dt
Expressed as β(x, y) where x and y are real numbers greater than zero
Symmetric function β(x, y) = β(y, x)
Gamma is a one-variable function and Beta is a two-variable function
Relationship to Gamma: β(x, y) = Γ(x)Γ(y) / Γ(x + y)
useful for modeling portfolio management through the preferential attachment process
a component of beta distribution, which is a dynamic, continuously updated probability distribution with two parameters; useful for machine learning when modeling uncertainty about the probability of success of a given experiment

factorial defines only top-right portion of graph
gamma function expands the domain

Side note: factoring

Factoring Difference of Squares
a^2 - b^2 = (a + b)(a - b)

Factoring Difference of Cubes
a^3 - b^3 = (a - b)(a^2 + ab + b^2)
a^3 + b^3 = (a + b)(a^2 - ab + b^2)

Factoring Trinomials
f(x) = ax^2 + bx + c
when a = 1

find two values (d and e) that ADD to ‘b’ and MULTIPLY to ‘c’
factor to (x + d)(x + e)

when a ≠ 1

multiply a x c
use the output of (a x c) to find two values (d and e) that ADD to ‘b’ and then replace ‘b’
ax^2 + bx + c = ax^2 + dx + ex + c
split and factor the new formula ax^2 + dx | + ex + c = x(x + f) + g(x + f)
distributive method of multiplying binomials (x + g)(x + f)

home page

data science