Free Cheatsheet · SL 4.7–4.9 · AHL 4.14–4.18

IB Math AA HL Probability Distributions — Complete Cheatsheet

Every formula, GDC syntax, and exam trap for IB Mathematics Analysis & Approaches HL Probability Distributions — Discrete RVs, the Binomial distribution, the Normal distribution, standardisation, and combined Binomial-Normal patterns. Hand-built by an IBO-certified Singapore tutor with 15+ years of IB experience.

Topic: Probability Distributions Syllabus: SL 4.7–4.9, AHL 4.14–4.18 Read time: ~16 minutes Last updated: Apr 2026

Probability Distributions is the largest single topic in IB Mathematics Analysis & Approaches HL — a single Section B question can carry 14 to 22 marks, and combined Binomial-Normal questions appear on almost every Paper 2. The traps are silent: writing $N(\mu, \sigma)$ instead of $N(\mu, \sigma^2)$, multiplying instead of squaring in $\text{Var}(aX + b)$, rounding the intermediate $p$ between sub-parts, or forgetting that "at least 1" should be solved by complement. The GDC saves you nothing if your distribution statement is wrong.

This cheatsheet condenses every formula and GDC trick from SL 4.7–4.9 and AHL 4.14–4.18 into one revisable page. It is organised into three major blocks — a general intro to discrete and continuous random variables, a full treatment of the Binomial distribution, and a full treatment of the Normal distribution — and finishes with the combined Binomial-Normal exam pattern that defines HL Section B. Scroll to the bottom for the printable PDF and gated full library.

§1 — Discrete & Continuous Random Variables SL 4.7–4.8

Diagram contrasting a discrete probability distribution with a continuous probability density function
Discrete vs continuous: discrete has $\mathbb{P}(X=x)$ at points; continuous has area under $f(x)$ for intervals.

Discrete random variable

Total prob:$\displaystyle\sum P(X = x) = 1$
Expectation:$E(X) = \sum x \cdot P(X = x)$
Variance:$\text{Var}(X) = E(X^2) - [E(X)]^2$
Linear E:$E(aX + b) = a E(X) + b$
Linear Var:$\text{Var}(aX + b) = a^2 \text{Var}(X)$

Continuous random variable (PDF)

Total prob:$\displaystyle\int_{-\infty}^{\infty} f(x)\,dx = 1$
Interval prob:$P(a \le X \le b) = \displaystyle\int_a^b f(x)\,dx$
Mode:maximum of $f(x)$
Median $m$:$\displaystyle\int_{-\infty}^m f(x)\,dx = \tfrac{1}{2}$
Point prob:$P(X = k) = 0$   (always, for continuous)

Discrete vs Continuous

DiscreteContinuous
$P(X = r)$ is meaningful$P(X = r) = 0$ always
Use Binomial for counts of successesUse Normal for measurements
$P(X < k) \neq P(X \le k)$$P(X < k) = P(X \le k)$
TrapDon't confuse $P(X \le r)$ (CDF) with $P(X = r)$ (PDF). For continuous variables, individual point probabilities are always zero — use intervals.

§2 — Binomial Distribution: BINS & Notation SL 4.8, AHL 4.14

BINS checklist

  • B — Fixed $n$ trials.
  • I — Independent trials.
  • N — 2 outcomes only (success/failure).
  • S — Same $p$ each trial.

All four conditions must hold for the binomial model to apply.

Notation

  • $X \sim B(n, p)$.
  • $n$ = number of trials.
  • $p$ = probability of success.
  • $X$ = number of successes; $X = 0, 1, 2, \ldots, n$.
NoteState the BINS conditions explicitly in your working to earn the M1 mark. Write "Let $X \sim B(n, p)$" before any GDC calculation.

§3 — Binomial PDF, Mean & Variance SL 4.8, AHL 4.14

PDF formula

$$P(X = r) = \binom{n}{r} p^r (1 - p)^{n-r} = \dfrac{n!}{r!(n - r)!}\, p^r q^{n-r}, \quad q = 1 - p$$

Mean and variance

Mean:$E(X) = np$
Variance:$\text{Var}(X) = np(1 - p)$
SD:$\text{SD}(X) = \sqrt{np(1 - p)}$
Mode:$\approx \lfloor (n + 1) p \rfloor$

§4 — GDC & CDF Translation SL 4.8

GDC syntax

  • TI-84 exact: binompdf(n,p,r) $= P(X = r)$.
  • TI-84 cumulative: binomcdf(n,p,r) $= P(X \le r)$.
  • $P(X \ge r)$: 1 - binomcdf(n,p,r-1).
  • Casio: STAT $\to$ DIST $\to$ BINM. Bpd: exact. Bcd: cumulative.

CDF phrases — translation table

PhraseNotation
at most $r$$P(X \le r)$
at least $r$$1 - P(X \le r - 1)$
more than $r$$1 - P(X \le r)$
fewer than $r$$P(X \le r - 1)$
exactly $r$$P(X = r)$
between $a$ and $b$$P(X \le b) - P(X \le a - 1)$

Conditional probability for binomial

$$P(A \mid B) = \dfrac{P(A \cap B)}{P(B)}$$

Typical: $P(X = r \mid X \le k) = \dfrac{P(X = r)}{P(X \le k)}$ — only valid when $r \le k$.

Trick"At least 1" questions almost always use the complement: $P(X \ge 1) = 1 - P(X = 0) = 1 - (1 - p)^n$. This avoids summing many CDF terms.

§5 — Finding Minimum $n$ AHL 4.14

Method 1 — algebraic

  1. Translate "$P(X \ge 1) > k$" to "$P(X = 0) < 1 - k$".
  2. Substitute: $(1 - p)^n < 1 - k$.
  3. Take logs (note both sides negative): $n > \dfrac{\ln(1 - k)}{\ln(1 - p)}$.
  4. Round UP to the next integer. Verify by substituting back.

Method 2 — GDC table/graph

  1. Set $Y_1 = $ binomcdf(X, p, r), $Y_2 = $ target.
  2. GRAPH $\to$ CALC $\to$ Intersect, OR use TABLE with $\Delta\text{Tbl} = 1$ and scroll.
  3. Always verify by substituting back.
TrapWhen solving $np(1 - p) = k$ for $p$, remember there are two solutions: $p = \dfrac{n \pm \sqrt{n^2 - 4nk}}{2n}$. Award both answers unless context restricts $p$.

§6 — Linear Transformations & Finding $p$ AHL 4.14

Linear transformations $Y = aX + b$

E:$E(Y) = a E(X) + b$
Var:$\text{Var}(Y) = a^2 \text{Var}(X)$

Adding $b$ shifts the expectation but does not change the variance. The coefficient $a$ is always squared, even if $a < 0$. Example: $Y = 2 - 5X$, so $a = -5$, $b = 2$. $E(Y) = 2 - 5 E(X)$ and $\text{Var}(Y) = (-5)^2 \text{Var}(X) = 25 \text{Var}(X)$.

Finding $p$ from mean / variance

Given $E(X) = np = m$ and $\text{Var}(X) = np(1 - p)$:

$$1 - p = \dfrac{\text{Var}(X)}{E(X)} = \dfrac{np(1 - p)}{np}, \quad \text{then } n = \dfrac{E(X)}{p}$$

TrickIf both $E(X)$ and $\text{Var}(X)$ are given, divide: $\dfrac{\text{Var}(X)}{E(X)} = 1 - p$. This avoids the quadratic entirely.
Trap$\text{Var}(aX + b) = a^2 \text{Var}(X)$, not $a \cdot \text{Var}(X)$. The coefficient is always squared.

§7 — False Binomials & Combined Patterns AHL 4.14–4.18

False binomials

  • First success: Geometric distribution, $P(X = k) = (1 - p)^{k - 1} p$.
  • Without replacement, small $N$: Hypergeometric (not in syllabus).
  • Dependent trials: Neither binomial nor geometric.
  • Exception: if $N$ is large and you sample without replacement, the distribution is approximately Binomial.

Normal $\to$ Binomial pattern

  1. Stage 1: use the Normal distribution to find $p = P(\text{item meets condition})$.
  2. Stage 2: a sample of $n$ items, $X \sim B(n, p)$.
  3. Trigger phrase: "A sample of $n$ is selected...".
  4. Carry the exact GDC value for $p$ — never round between parts.

§8 — Normal Distribution Properties SL 4.9, AHL 4.14

Standard normal bell curve showing the symmetric shape centred at the mean mu with standard deviation sigma
Normal bell curve: symmetric about the mean $\mu$, total area $= 1$.

$X \sim N(\mu, \sigma^2)$. Key properties:

  • Symmetric about $\mu$.
  • Mean $=$ Median $=$ Mode $= \mu$.
  • $P(X = k) = 0$ always (continuous).
  • Total area under the curve $= 1$.
  • Inflection points at $\mu \pm \sigma$.
  • Asymptotic to the $x$-axis.
  • $\sigma \uparrow$: wider; $\sigma \downarrow$: narrower.
TrapWrite $X \sim N(\mu, \sigma^2)$ with $\sigma^2$ (variance), not $\sigma$ (standard deviation). This is a common notation error that loses marks.

§9 — Empirical Rule (P1, no GDC) SL 4.9

Normal curve with the empirical rule shaded — 68% within one standard deviation, 95% within two, 99.7% within three
Empirical rule: 68% within $\mu \pm \sigma$, 95% within $\mu \pm 2\sigma$, 99.7% within $\mu \pm 3\sigma$.

The 68/95/99.7 rule — MEMORISE

  • $P(\mu - \sigma < X < \mu + \sigma) \approx 68\%$
  • $P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 95\%$
  • $P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 99.7\%$

Derived values (one-tail):

  • $P(X > \mu + \sigma) \approx 16\%$
  • $P(X > \mu + 2\sigma) \approx 2.5\%$
  • $P(X > \mu + 3\sigma) \approx 0.15\%$
  • $P(\mu < X < \mu + \sigma) \approx 34\%$
  • $P(\mu + \sigma < X < \mu + 2\sigma) \approx 13.5\%$

Symmetry rules (Paper 1, no GDC)

  • $P(X < \mu) = 0.5$.
  • $P(X < \mu - k) = P(X > \mu + k)$.
  • $P(\mu < X < \mu + k) = P(\mu - k < X < \mu)$.
  • Same $\sigma$, different $\mu$: $b - \mu_X = c - \mu_Y \Rightarrow P(X > b) = P(Y > c)$.
TrickOn Paper 1 (no GDC), identify $\mu \pm k\sigma$ values first, then apply the empirical rule and symmetry. This is faster than trying to derive probabilities from scratch.

§10 — Standardisation & $z$-scores AHL 4.14

If $X \sim N(\mu, \sigma^2)$, then:

$$Z = \dfrac{X - \mu}{\sigma} \sim N(0, 1)$$

  • Interpretation 1: $z$ = number of standard deviations above the mean. $z > 0$: above; $z < 0$: below.
  • Interpretation 2: $z$ = position on the standard scale (mean 0, SD 1).
  • "$k$ SDs above mean" $\Rightarrow z = k \Rightarrow X = \mu + k\sigma$.
NoteStandardisation converts any normal distribution to the standard normal $Z \sim N(0, 1)$. This allows comparison of values from different normal distributions using $z$-scores — larger $|z|$ means more standard deviations from the mean.

§11 — Normal GDC Skills & Finding $\mu, \sigma$ SL 4.9, AHL 4.14

GDC normal syntax

  • TI-84 forward: normalcdf(a, b, mu, sigma). Use $\pm 10^{99}$ for $\pm\infty$.
  • TI-84 inverse: invNorm(p, mu, sigma) gives $x$ such that $P(X < x) = p$.
  • For $P(X > x) = p$: use $1 - p$ in invNorm.
  • Casio: STAT $\to$ DIST $\to$ NORM. Ncd: forward. InvN: inverse. Casio: enter $\sigma$ before $\mu$!

Finding $\mu$ and $\sigma$

  1. Step 1: from $P(X < a) = p$, get $z = $ invNorm(p, 0, 1).
  2. Step 2: use $\dfrac{a - \mu}{\sigma} = z$.
  3. Two unknowns: two conditions $\to$ two equations, solve simultaneously.
  4. IQR method: $\sigma = \dfrac{\text{IQR}}{2 \times 0.6745}$   ($z_{0.75} = 0.6745$ — memorise).
TrickWhen given the IQR, use $\sigma = \text{IQR}/(2 \times 0.6745)$ to find $\sigma$ directly. Then find $\mu$ from any single probability condition.
TrapOn Casio calculators, enter $\sigma$ before $\mu$ — the opposite order to TI-84. This is a common source of error in exams.

§12 — Two-Stage & Comparison Patterns AHL 4.14–4.18

Two-stage pattern: Normal $\to$ Binomial

  1. Find $p = P(\text{condition})$ from the Normal distribution.
  2. Sample of $n$ items: $X \sim B(n, p)$.
  3. Trigger: "a sample of $n$...".
  4. Carry the exact GDC $p$ forward — never round between parts.

Combined exam pattern (typical Section B)

  • Part (a): basic normal probability $\to$ Part (b): find $\sigma$ via invNorm.
  • Part (c): use that $p$ in Binomial $\to$ Part (d): conditional probability or find $n$.

Comparing two distributions

Given $X \sim N(\mu_1, \sigma^2)$ and $Y \sim N(\mu_2, \sigma^2)$:

$$P(X > a) = P(Y > b) \iff \dfrac{a - \mu_1}{\sigma} = \dfrac{b - \mu_2}{\sigma} \iff a - \mu_1 = b - \mu_2$$

Percentile comparison: compute $z_X = \dfrac{x - \mu_X}{\sigma_X}$ and $z_Y = \dfrac{y - \mu_Y}{\sigma_Y}$. Larger $z$ means relatively better performance.

TrickWhen you see "a sample of $n$ is selected..." after a normal-distribution question, this is the trigger to switch to a Binomial model with $p$ from the normal calculation.

§13 — Exam Traps & Attack Plan All sections

Top exam traps

  1. $N(\mu, \sigma^2)$: write $\sigma^2$, not $\sigma$.
  2. $\text{Var}(Y = aX + b) = a^2 \text{Var}$, not $a \cdot \text{Var}$.
  3. Both values of $p$ when solving the variance quadratic.
  4. Round UP, not down, for minimum $n$.
  5. "First success" is geometric, not binomial: $(1 - p)^{k - 1} p$.
  6. "At least 1" $= 1 - P(X = 0)$.
  7. Casio: enter $\sigma$ before $\mu$.
  8. Without replacement: check whether $N$ is small.
  9. The 68/95/99.7 rule is exact only at $\pm 1, 2, 3 \sigma$.
  10. State the distribution before the GDC step — earns M1.

Attack plan — when you see, do this

See...Do...
"count of"Binomial
"mass / height / time"Normal
"$k$ SDs above"$X = \mu + k\sigma$
"P1, no GDC"Symmetry + 68/95/99.7
"find min $n$"Complement + log
"$\mu$ and $\sigma$?"2 equations from 2 conditions
"conditional probability"$P(A \cap B) / P(B)$
"first success"Geometric $(1 - p)^{k - 1} p$
"sample from normal"2-stage Normal $\to$ Binomial
"IQR given"$\sigma = \text{IQR} / (2 \times 0.6745)$

Notation summary

  • $X \sim B(n, p)$ — Binomial.
  • $X \sim N(\mu, \sigma^2)$ — Normal.
  • $Z \sim N(0, 1)$ — Standard Normal.
  • $E(X) = \mu$, $\text{Var}(X) = \sigma^2$.

Always write the distribution first, then use the GDC — this earns M1 every time.

Worked Example — Combined Normal & Binomial

Question (HL Paper 2 style — 10 marks)

The masses (in grams) of mangoes from a Singapore farm are normally distributed with mean $\mu = 320$ and standard deviation $\sigma = 25$. A mango is classified as "premium" if its mass exceeds 360 g. A box of 12 mangoes is selected at random.

(a) Find the probability that a randomly chosen mango is premium. (b) Find the probability that exactly 3 mangoes in the box are premium. (c) Find the probability that at least 1 mango in the box is premium.

Solution

  1. (a) Let $M \sim N(320, 25^2)$. Then $p = P(M > 360) = $ normalcdf(360, 1E99, 320, 25) $\approx 0.0548$.  (M1)(A1)
  2. BINS check: 12 fixed trials, independent (random sample), 2 outcomes (premium / not), same $p$.  (R1)
  3. Let $X \sim B(12, 0.0548)$ (carry the exact GDC value, not the rounded $0.055$).  (M1)
  4. (b) $P(X = 3) = $ binompdf(12, 0.0548, 3) $\approx 0.0238$.  (A1)
  5. (c) Use the complement: $P(X \ge 1) = 1 - P(X = 0) = 1 - (1 - 0.0548)^{12}$.  (M1)
  6. $P(X \ge 1) = 1 - 0.9452^{12} \approx 1 - 0.5072 = 0.493$.  (A1)

Examiner's note: The trap in part (b) is rounding $p$ to $0.055$ between sub-parts, which gives $P(X = 3) \approx 0.024$ — close, but lose-the-A1 close on tighter problems. Always carry the unrounded GDC value forward. Also, in part (c), $P(X \ge 1)$ is much faster via the complement than summing $P(X = 1) + P(X = 2) + \ldots + P(X = 12)$.

Common Student Questions

When can I model a situation with the Binomial distribution?
Use the BINS checklist: B — fixed number of trials $n$; I — independent trials; N — exactly two outcomes (success / failure); S — same probability $p$ in every trial. All four conditions must hold. State BINS explicitly in your working and then write "Let $X \sim B(n, p)$" before any GDC step — this earns the M1 mark.
Should I write $N(\mu, \sigma)$ or $N(\mu, \sigma^2)$?
Always write $X \sim N(\mu, \sigma^2)$ — the second parameter is the variance, not the standard deviation. This is a common notation slip that loses marks even when the GDC calculation is correct. The same convention is used in the IB formula booklet.
How do I find the minimum $n$ in a binomial "at least one" question?
Use the complement: $P(X \ge 1) > k$ becomes $(1 - p)^n < 1 - k$. Take logs: $n > \dfrac{\ln(1 - k)}{\ln(1 - p)}$. The dividing log is negative, so the inequality flips — careful! Always round UP to the next integer (because $n$ must be a whole number that satisfies the strict inequality), then verify by substituting back into the original CDF.
What is the empirical rule and when is it actually exact?
The empirical rule states $P(\mu - \sigma < X < \mu + \sigma) \approx 68\%$, $P(\text{within } 2\sigma) \approx 95\%$, and $P(\text{within } 3\sigma) \approx 99.7\%$. These are approximations that the IB treats as known facts on Paper 1 (no GDC). They only apply at exactly $\pm 1, 2,$ or $3$ standard deviations from the mean — not at $1.5\sigma$ or $2.5\sigma$.
What does $\text{Var}(aX + b)$ equal — is it $a \cdot \text{Var}(X) + b$?
No. $\text{Var}(aX + b) = a^2 \cdot \text{Var}(X)$. Adding a constant does not change the variance at all — only multiplying does, and the multiplier is squared. So if $Y = 2 - 5X$, then $\text{Var}(Y) = 25 \cdot \text{Var}(X)$, even though the coefficient is negative. This squared-coefficient rule is one of the most heavily tested traps in HL Section A.

Get the printable PDF version

Same cheatsheet, formatted for A4 print — keep it next to your study desk. Free for signed-in users.