When I was an undergraduate, 78 years ago, it took a long time for me to grasp the concepts of probability and randomness. All the examples seemed to involve drawing red, black, and white balls from an urn. My thought was, "okay, I'll try to play, but what is this all about?"
Then, to make matters really confusing, the examples began to distinguish between drawing the balls out of the urn and hanging onto them, versus drawing the balls out of the urn and putting them back into the urn. I never understood why people would want to draw balls out of an urn in the first place, much less put each one back into the urn before drawing the next one --- "but I might draw it again!" was my concern. I was not a very bright undergraduate.
In those courses, the distinction being made by these examples was "sampling with replacement" versus "sampling without replacement".
I think I might have a better handle on it now, 78 years later:
- When they draw balls on television for lotteries, it is done without replacement. They don't draw a 41 and then put it back in the hopper so that it can possibly be drawn again for the same contest.
- When your iPod shuffles its tunes, it does so without replacement if you listen to all the tunes. But if you restart it, the random process starts all over again; restarting the iPod puts all the tunes back in the mix. It samples with replacement.
This distinction is at the heart of the questions circulating on the internet about whether the "shuffle" feature of iPods is truly random. Many people become concerned/ puzzled/ disconcerted when some tunes on their iPods seem to come up more often and others are played rarely, if ever.
More specifically, when an iPod does a shuffle, it reorders the songs much the way a Vegas dealer shuffles a deck of cards, then plays them back in the new order. So if you keep listening for the week or so it takes to complete the list, you will hear everything, just once. But people generally listen only to the first few dozen songs. In theory, that sample should be evenly distributed among all the artists and albums in their collections. So why do you typically get three Wilco songs in an hour while Aretha Franklin waits in the wings forever?
The answer lies in the difference between sampling with
replacement versus sampling without
replacement. This guy (quoted in the same article) is off the mark:
Paul Kocher, president of Cryptography Research, puts it another way:
"Our brains aren't wired to understand randomness."
It may well be true that we don't understand randomness; but with the iPod shuffle, the problem is distinguishing between "with replacement" and "without replacement".
Next revision for the iPod? Make the shuffle feature random without replacement, even if you stop listening.
[Thanks to Craig Newmark, of Newmark's Door for the pointer]