jameshunt (.us)

Did you know that in a group of only 23 people, there’s a 50% chance that at least two of those people share the exact same birthday?

Yup. It’s called The Birthday Paradox, and it’s got some interesting math behind it. Surprisingly, it has some pretty far-reaching implications for cryptography, and highlights a fundamental thing I’ve noticed about crypto: your intuition is usually wrong.

## The Math

Let’s flip the problem on its head to make the math a little easier. Instead of calculating the probability that people are sharing birthdays, let’s consider the probability that no one shares a birthday. Together, these two quantities must add up to 1, since there are no other possible outcomes – either we have collisions, or we don’t.

We’ll call the probability of unique birthdays P. Let’s also ignore February 29th (sorry leap-year kids), and assume that we have a uniform distribution (that is, no dates are special). This gives us a total of 365 possible values, each one equally likely.

If we have a group with only one person in it, we have all 365 possible dates to choose from, so \(P=1\).

\$\$ P_1 = 365 / 365 = 1 \$\$

If we add a person, there are now 364 unique birthdays left, so the problem becomes one of selection:

\$\$ P_2 = {365 – 1} / 365 = 0.9973 \$\$

There’s a 99.73% chance these two people won’t share a birthday.

Incidentally (although it’s easy to miss in a cursory glance), we can calculate the total probability \(P\) by multiplying the two independent event probabilities we have so far:

\$\$ P = P_2 P_1 = ({365 – 1} / 365) (365 / 365) = 0.9973 \$\$

If we add a third person, we not only have to pick a unique birthday for the second person (\(P2=365−1365\)), we now have to pick a unique date from the remaining birthdays for the third person, or:

\$\$ P_3 = {365 – 2} / 365 = 0.9945 \$\$

Again, we can calculate the total probability \(P\) from the independent event probabilities:

\$\$ P = P_3 P_2 P_1 \$\$

\$\$ P = ({365 – 2} / 365) ({365 – 1} / 365) (365 / 365) \$\$

\$\$ P = 0.9945 × 0.9973 × 1 = 0.9918 \$\$

Since the probabilities of each event after the first are less than 1, the compound probability is less than either independent event. This is where we start to see that the intuition is wrong, because most people think in terms of the individual events.

In a more general form, for a set of \(N\) unique values, selecting \(k\) of them leads to a probability \(P\) that all values are unique of:

\$\$ P = {N – 1} / N {N – 2} / N … {N – (k -1 )}/N \$\$

There will be \(k−1\) terms in this equation, so we can simplify:

\$\$ P = {(N – 1) (N – 2) … (N – (k – 1))} / N^{k-1} \$\$

Introducing an \(N\) term to both numerator denominator, we start to approach \(N!\):

\$\$ P = (N / N) ({(N – 1) (N – 2) … (N – (k – 1))} / N^{k-1}) \$\$

Then the denominator simplifies to \(Nk−1+1\), which is just \(Nk\):

\$\$ P = {N (N – 1) (N – 2) … (N – k)} / N^k \$\$

Further simplification can be had by introducing the remainder of the factorial into both the numerator and denominator:

\$\$ P = {N (N – 1) (N – 2) … (N – (k – 1)) (N – k)! } / {N^k (N – k)!} \$\$

This allows us to collapse the numerator to just \(N!\):

\$\$ P = N! / {N^k (N – k)!} \$\$

That is the final (factorial) form of what we’ll call The Birthday Bound.