Why counting first?
Most probability problems start with "how many waysโฆ". Before you can compute the chance of an event, you need to know how many outcomes there are AND how many are favorable. That's counting โ the foundation.
Imagine the question: "What's the probability of being dealt a flush?" To answer that, you need TWO counts โ how many flushes are there in a 52-card deck, and how many possible 5-card hands are there in total? Both are counting problems. Probability is just favorable / total.
The four calculators in this family form a sequence. Factorial (n!) is the foundation โ the number of ways to arrange n distinct items in a row. Permutations P(n, r) generalise that to "arrange r items from n where order matters". Combinations C(n, r) generalise to "choose r items from n where order does NOT matter". Once you have the counts, probability questions become P = favorable รท total.
Two questions to ask yourself for any counting problem: (1) Are we picking ALL items or just some? (2) Does the ORDER of the picked items matter? Those two questions narrow you to factorial, P(n, r), or C(n, r).
Counting comes first. Once you know how many outcomes exist and how many are favorable, probability is a simple division.
Factorial โ n!
n! (read "n factorial") is the product of every whole number from 1 to n. It tells you the number of ways to arrange n distinct items in a row.
Definition: n! = n ร (nโ1) ร (nโ2) ร โฆ ร 2 ร 1. So 5! = 5 ร 4 ร 3 ร 2 ร 1 = 120.
Why does this count arrangements? Imagine you have 5 books and 5 slots on a shelf. The first slot has 5 choices. The second has 4 (one book is already placed). The third has 3, then 2, then 1. Multiply: 5 ร 4 ร 3 ร 2 ร 1 = 120 different shelf-orders.
0! is defined as 1, by convention. There is exactly one way to arrange zero items (the empty arrangement). The convention makes downstream formulas like C(n, 0) work cleanly.
Factorial grows fast. 10! is over 3 million. 20! is over 2 quintillion (2,432,902,008,176,640,000). 70! exceeds the number of atoms in the observable universe. The Factorial calculator caps at n = 170 because beyond that JavaScript overflows.
- 7! = 7 ร 6 ร 5 ร 4 ร 3 ร 2 ร 1
- = 42 ร 5 ร 4 ร 3 ร 2 ร 1
- = 210 ร 4 ร 3 ร 2 ร 1
- = 840 ร 3 ร 2 ร 1
- = 2520 ร 2 ร 1
- = 5040
- 7! = 5,040 โ the number of ways 7 distinct people can stand in a queue.
Factorial counts arrangements of all n items. It's the building block for permutations and combinations.
Permutations โ P(n, r)
P(n, r) is the number of ordered arrangements of r items chosen from n. ORDER MATTERS โ different orderings count as different outcomes.
Formula: P(n, r) = n! รท (n โ r)! = n ร (nโ1) ร (nโ2) ร โฆ ร (n โ r + 1). The product has r terms.
Real-life parallel: awarding gold, silver, bronze among 10 runners. The roles are distinct (gold โ silver), so the order of selection matters. P(10, 3) = 10 ร 9 ร 8 = 720 different ways to award the medals.
When r = n, P(n, n) = n! โ the full permutation. All n items arranged in some order. When r = 0, P(n, 0) = 1 (the empty arrangement).
Permutations are the right tool whenever the prompt distinguishes positions ("first / second / third"), assigns different roles (president / VP / secretary), or treats different orderings as different outcomes (PIN codes, word arrangements, seating).
- How many 4-digit PINs can you make from 0โ9 with NO digit used twice?
- P(10, 4) = 10 ร 9 ร 8 ร 7 = 5,040 distinct PINs.
- Compare: with repetition allowed, you have 10 ร 10 ร 10 ร 10 = 10,000.
- The difference is whether the same digit can appear twice.
P(n, r) = ordered arrangements. Use it when "first / second / third" or distinct roles are part of the question.
Combinations โ C(n, r)
C(n, r) is the number of unique groups of r items chosen from n. ORDER DOES NOT MATTER โ the same group counted in different orders is still one group.
Formula: C(n, r) = n! รท (r! ร (n โ r)!).
Real-life parallel: dealing a 5-card poker hand from a 52-card deck. The hand {Aโ , Kโฅ, Qโฆ, Jโฃ, 10โ } is the same hand whether the cards came out in that order or any of the 5! = 120 orderings of those cards. So we count the orderings ONCE โ that's the r! division.
C(52, 5) = 52! / (5! ร 47!) = 2,598,960. Two and a half million distinct 5-card hands.
Symmetry: C(n, r) = C(n, n โ r). Choosing r items to KEEP is the same as choosing n โ r items to LEAVE BEHIND. So C(20, 7) = C(20, 13). The Combinations calculator uses the smaller of r and n โ r internally to keep computation cheap.
Why is C(n, r) = P(n, r) รท r!? Because every combination of r items can be arranged in r! orders โ and a permutation counts each of those orders as different. Divide by r! to collapse them back into one group.
- Each handshake is a pair of people โ order doesn't matter (A shaking B is the same handshake as B shaking A).
- C(20, 2) = (20 ร 19) / 2 = 190 distinct handshakes.
- Compare to permutations: P(20, 2) = 20 ร 19 = 380. Twice as many because AโB and BโA are counted separately. Wrong tool for handshakes.
C(n, r) = unordered groups. Use it whenever the prompt asks for "selections", "groups", "teams", "hands", or any case where {A, B, C} = {C, B, A}.
Combinations vs Permutations โ the test
One question decides which tool: does the order matter? If yes, use P(n, r). If no, use C(n, r). The math is closely related: P(n, r) = C(n, r) ร r!.
If the prompt distinguishes positions, slots, or roles โ order matters. Use P(n, r). Examples: race medals, PIN codes, seating arrangements, word/letter arrangements, picking president/VP/secretary.
If the prompt asks for a group, a hand, a team, a selection โ order doesn't matter. Use C(n, r). Examples: poker hands, lottery tickets, pizza toppings, study groups, handshakes.
When in doubt, ask: would swapping two items change the answer? For race medals โ yes (gold-silver โ silver-gold). For poker hands โ no (same hand). That tells you which tool.
Math relationship: P(n, r) = C(n, r) ร r!. Every group of r items can be arranged in r! orders. Permutations count each ordering separately; combinations count each group once. So permutations are always โฅ combinations (when r โฅ 2).
- As a study group (no roles): C(20, 3) = 1,140
- As president / VP / secretary (distinct roles): P(20, 3) = 6,840
- 6,840 = 1,140 ร 6, where 6 = 3! = ways to assign 3 roles among 3 chosen students.
Order matters โ permutation. Order does NOT matter โ combination. P(n, r) = C(n, r) ร r!.
Single-event probability โ the foundation
Probability of an event = favorable outcomes รท total outcomes (when each outcome is equally likely). This is the classical definition.
Bag with 3 red and 5 white balls. Draw one. P(red) = 3 รท 8 = 0.375 = 37.5%. Each ball is equally likely; 3 of the 8 are favorable.
For more complex problems, computing favorable and total counts is the hard part โ that's where the counting calculators above help. P(flush) = (count of flushes) รท (count of 5-card hands) = (4 ร C(13, 5)) รท C(52, 5).
The complement is just 1 โ P. P(not red) = 1 โ 0.375 = 0.625. Often the complement is much easier to compute than the event itself, which leads to the "complement trick" we'll see for at-least-one problems.
- 52 cards, 13 are hearts.
- P(heart) = 13 / 52 = 1/4 = 25%.
- P(not heart) = 1 โ 0.25 = 75%.
- P(red) = 26/52 = 50%, since hearts and diamonds are both red.
P(event) = favorable / total. For complex events, the calculators help you count favorable and total.
What probability really means โ long-run frequency, not a schedule
The most common probability misconception: "P = 0.25 means it happens once in every 4 trials." It does not. Probability is a long-run average, not a schedule.
P = 0.25 means: as the number of trials grows large, the proportion of "yes" outcomes converges to 25%. In any small batch โ even 20 attempts โ the actual count can be 0, 5, 10, or anything else. There's no per-batch guarantee.
This matters in real life. A 1-in-100 chance of failure per launch does NOT mean "fine for 99 launches and broken on the 100th." It can fail on the very first launch. Or never in 500. The probability is the limit, not the calendar.
Practically: when you read a probability, read it as "in the long run, this fraction" โ not as a guarantee about timing. The "1 in N" form is a friendlier way to express the same number, but it inherits the same caveat. "1 in 25" is the long-run rate; the actual gap between events is variable.
Probability = long-run frequency. Don't extrapolate small-batch behavior from a long-run number.
P(A and B) and P(A or B) โ for independent events
When two events don't influence each other, multiplying gives "both happen", and inclusion-exclusion gives "at least one happens".
AND: P(A and B) = P(A) ร P(B), only when A and B are independent (one doesn't change the probability of the other). Two coin flips are independent. Two dice rolls are independent. Drawing two cards WITHOUT replacement is NOT independent โ the deck shrinks.
OR: P(A or B) = P(A) + P(B) โ P(A and B). We add P(A) and P(B), then subtract the overlap (the both-happen case) once because it's counted twice in the addition. This is the inclusion-exclusion principle in its simplest form.
Mutually exclusive events (where P(A and B) = 0) collapse the OR formula to just P(A) + P(B). Example: rolling a 1 OR a 2 on a single die โ both can't happen on one roll, so just 1/6 + 1/6 = 2/6.
- P(coin = H) = 1/2, P(die = 6) = 1/6, independent.
- P(both) = 1/2 ร 1/6 = 1/12 โ 8.3%
- P(at least one) = 1/2 + 1/6 โ 1/12 = 6/12 + 2/12 โ 1/12 = 7/12 โ 58.3%
- P(neither) = 1 โ 7/12 = 5/12 โ 41.7%
Multiply for "both", add and subtract overlap for "at least one". Only valid when events are independent.
At least one in n trials โ the complement trick
When the question is "what's the probability of AT LEAST ONE X in n trials?", computing P(none) and subtracting from 1 is far easier than summing all the cases.
Direct counting is painful. "At least one six in 4 dice rolls" requires summing P(exactly 1 six) + P(exactly 2 sixes) + P(exactly 3 sixes) + P(exactly 4 sixes). Four binomial terms.
Complement trick: ask the inverse question. P(no sixes in 4 rolls) = (5/6)โด, since each roll independently has a 5/6 chance of NOT being a six. That's a single product. Then 1 โ (5/6)โด โ 51.77% is "at least one six".
This is the famous de Mรฉrรฉ problem from 17th-century gambling โ a French nobleman bet on "at least one six in 4 rolls" and made money long-term, while his bet on "at least one double-six in 24 throws of two dice" lost long-term (โ 49.1%). The math vindicated his empirical observation.
The trick generalises to any "at least one X" problem with independent identical trials. The trials must be independent and have the SAME per-trial probability for the simple form to apply.
- Direct count is awful โ pairs, triples, all overlapping.
- Complement: P(all 23 birthdays are distinct) = (365/365) ร (364/365) ร โฆ ร (343/365)
- โ 0.4927 โ about a 49% chance of all-distinct birthdays.
- 1 โ 0.4927 โ 50.7% โ slightly more likely than not that two share.
- Surprising. With 50 people the chance climbs to ~97%.
For "at least one" โ compute P(none) and subtract from 1. The complement trick saves enormous case-summing.
Binomial โ k successes in n trials
When you need P(exactly k successes) โ or P(at most k) or P(at least k) โ in n independent trials with the same per-trial probability p, the binomial formula is the tool.
Formula for exactly k successes: P(X = k) = C(n, k) ร p^k ร (1 โ p)^(n โ k). Three pieces:
C(n, k) โ the count of "patterns" of k successes among n trials. Combinations again.
p^k โ the probability of getting k successes on those specific k trials.
(1 โ p)^(n โ k) โ the probability of getting (n โ k) failures on the remaining trials.
For "at most k", sum P(X = 0) + P(X = 1) + โฆ + P(X = k). For "at least k", sum P(X = k) + P(X = k+1) + โฆ + P(X = n). The binomial calculator handles all three operators.
- P(X = 3) = C(5, 3) ร 0.5ยณ ร 0.5ยฒ
- = 10 ร 0.125 ร 0.25
- = 10 ร 0.03125
- = 0.3125 = 31.25%
- Compare: P(X = 0) = 1/32 โ 3.1%, P(X = 5) = 1/32 โ 3.1%. The middle counts are most likely โ that's the binomial bell shape.
Binomial = C(n, k) ร p^k ร (1โp)^(nโk). Sum across a range for at-most or at-least cases.
Conditional probability โ P(A | B)
"Given that B happened, what's the probability of A?" Restrict your world to just the B-slice, then ask what fraction of that slice is also A.
Notation: P(A | B), read "the probability of A given B".
Formula: P(A | B) = P(A โฉ B) รท P(B). Divide the overlap (both happen) by P(B).
Intuition: knowing B happened restricts our universe to just the part where B is true. We then ask, "of that restricted universe, what fraction also has A?" That fraction is the conditional probability.
P(A | B) is generally NOT equal to P(B | A). "P(rain | clouds)" is different from "P(clouds | rain)" โ clouds without rain happen all the time; rain without clouds basically never. Bayes' theorem (next section) is the formula that flips between them.
- A face card is drawn from a 52-card deck. What's the probability it's a king?
- P(king and face) = 4/52 (the four kings)
- P(face) = 12/52 (jacks, queens, kings)
- P(king | face) = (4/52) รท (12/52) = 4/12 = 1/3 โ 33.3%
- Without conditioning: P(king) = 4/52 โ 7.7%. Knowing it's a face card raises the probability.
P(A | B) restricts to B's slice, then asks what fraction is A. Use it for any "given X" probability question.
Bayes' theorem โ flipping the conditional
Bayes' theorem flips a known forward conditional P(B | A) into the inverse P(A | B). Critical for medical-test reasoning, spam filters, forensics โ any "I observed X, what should I now believe about Y?" problem.
Formula: P(A | B) = P(B | A) ร P(A) รท P(B). Three inputs:
P(B | A) โ the FORWARD conditional. Often called the likelihood. For a medical test: 'if you have the disease, what's the chance the test is positive?' (test sensitivity).
P(A) โ the PRIOR. Your belief about A before seeing any evidence. For a disease, this is its prevalence in the population.
P(B) โ the TOTAL probability of observing B. Combines true positives AND false positives.
The most-cited example is the medical-test puzzle. A 99% sensitive test for a disease with 1% prevalence. You test positive. The intuitive guess is 'around 99% chance I have it.' The actual answer (with 5% false positive rate) is around 17%. Why? Because the prior is so small that even few false positives among the 99% healthy population swamp the true positives among the 1% sick.
This is the 'base rate fallacy' โ humans systematically ignore prior probabilities when interpreting evidence. Bayes' theorem is the antidote.
- Prior P(sick) = 0.01 (1% prevalence).
- Sensitivity P(positive | sick) = 0.99.
- False positive rate P(positive | healthy) = 0.05.
- Total P(positive) = (0.01 ร 0.99) + (0.99 ร 0.05) = 0.0099 + 0.0495 = 0.0594.
- Bayes: P(sick | positive) = (0.99 ร 0.01) / 0.0594 โ 0.167.
- About 16.7% โ much lower than the 99% intuitive guess. The base rate matters more than the test accuracy.
Bayes' theorem flips P(B | A) into P(A | B) using the prior. When priors are small, even very accurate evidence can yield surprisingly LOW posterior probabilities.
Common mistakes
Five recurring pitfalls in counting and probability problems.
1. Confusing combinations with permutations. The fix: ask "does the order matter?" before reaching for a formula. Different orderings = different outcomes โ permutation. Same outcome regardless of order โ combination.
2. Multiplying for non-independent events. P(A) ร P(B) only works for independent events. Drawing two cards WITHOUT replacement is NOT independent โ use P(A) ร P(B | A) instead.
3. Reading 'or' as exclusive. In probability, 'A or B' usually means 'A, or B, or both'. Inclusion-exclusion handles the both-happen overlap by subtracting it once. For mutually exclusive events the overlap is 0 and the correction disappears.
4. Confusing P(A | B) with P(B | A). They're not equal in general. Bayes' theorem is the bridge. The medical-test surprise is the canonical example.
5. Treating "1 in N" as a guarantee. It's a long-run average, not a per-batch guarantee. A 1-in-100 chance can hit on the first attempt or never in 500.
Order? Independence? OR includes both-cases? Forward vs inverse conditional? Long-run vs schedule? โ the five questions to ask on any probability problem.
Frequently asked questions
When do I use combinations vs permutations?
โพ
When do I use combinations vs permutations?
โพAsk if the order matters. If a different order = a different outcome (race medals, PIN codes, seating), use permutations. If a different order = the same outcome (poker hands, lottery, study groups), use combinations. Math link: P(n, r) = C(n, r) ร r!.
Why does 0! = 1?
โพ
Why does 0! = 1?
โพBy convention, because there is exactly one way to arrange zero items (the empty arrangement). The convention also keeps formulas like C(n, 0) = n! / (0! ร n!) = 1 consistent at the boundaries.
How is the at-least-one probability calculated?
โพ
How is the at-least-one probability calculated?
โพ1 โ (1 โ p)^n, where p is the per-trial probability and n is the number of trials. This is the "complement trick" โ instead of summing all the success cases, compute the chance of NONE happening (a single product) and subtract from 1.
What is Bayes' theorem in plain English?
โพ
What is Bayes' theorem in plain English?
โพA formula to flip a known conditional. If you know "given the disease, the test is positive 99% of the time" (P(positive | sick)) and the disease's prevalence (P(sick)) and the overall positive rate (P(positive)), Bayes gives you "given a positive test, what's the chance of disease" (P(sick | positive)). Critical for any "update belief after evidence" problem.
Does P = 0.25 mean it happens once in every 4 trials?
โพ
Does P = 0.25 mean it happens once in every 4 trials?
โพNo. P = 0.25 is a long-run average. In any small batch (say 20 attempts), the event might happen 0 times, or 5 times, or 10 โ there's no per-batch guarantee. The actual rate converges to 25% only over many, many trials.
When can I use P(A and B) = P(A) ร P(B)?
โพ
When can I use P(A and B) = P(A) ร P(B)?
โพOnly when A and B are INDEPENDENT โ when one event doesn't change the probability of the other. Two coin flips, two dice rolls, two days of weather (roughly). Drawing two cards WITHOUT replacement is NOT independent.
Why is P(A or B) = P(A) + P(B) โ P(A and B)?
โพ
Why is P(A or B) = P(A) + P(B) โ P(A and B)?
โพThe 'both happen' case is counted once by P(A) and a second time by P(B) โ adding them double-counts it. We subtract the overlap once to fix the count. Inclusion-exclusion in its simplest form. For mutually exclusive events the overlap is 0 and the correction disappears.
Open the calculators
Four calculators, one conceptual sequence โ factorial, permutations, combinations, and the probability tool with seven modes including conditional and Bayes.