OddThinking

A blog for odd things and odd thoughts.

Shuffling and Ownage

Last week, Mike Pope wrote about card-shuffling.

By his own admission, he didn’t know the best way, and he described a number of sub-optimal solutions – generally O(n2) – but some had an unbounded worst case, and one displayed biases.

Another commenter pointed to John Viega‘s solution, which assigned a random number to each card, sorted the result, and then reshuffled if no two random numbers were the same. It was on average about O(n.log n) but it had an unbounded worst case.

So along I came, and ever so helpfully added a comment to the article with an algorithm I stole off a Commodore-64 program and have been using (very occasionally) for over 20 years. It is short, bounded, takes O(n), it has a low-constant time and a small memory footprint

Pope came back to say he’s tried it, but he was not sure of the randomness. I wasn’t sure how to respond to that – it was obvious that it is completely random to me. I made a note to go back to leave another comment, but before I got a chance…

… along came Jeff Atwood at Coding Horror, and he waded in with a post. (Actually, Eric Lippert argued against my algorithm first in a comment on the article, but I saw Jeff Atwood’s post before Eric Lippert’s comment.) Atwood brought up an algorithm very similar to the one that I had proposed, and then tried to shoot it down on two grounds.

One, that it was too complicated. I didn’t feel that it was complicated at all. Then again, I have been using it for a while and it seems natural. Anyway, quicksort is more complicated than bubblesort, but you implement it once, and it’s done, right?

The other was that its random number generator wasn’t secure enough. (There’s a very slight strawman here – my original didn’t mention the seeding of the random number generator, but (a) Atwood wasn’t trying to directly attack my solution, he was trying to attack a naive solution that he himself would have tried, and (b) Atwood’s solution wasn’t much different to the one I would have chosen, had I bothered to mention it.)

My response to that was to grumble to myself about security experts who think every single project has security implications, resulting in terrible scope creep. Yes, if you are implementing an online casino, the security of randomisation is a real problem. However, for a huge majority of implementations, it simply isn’t an issue. For a single player implementation of Klondike it isn’t important. My main use of card-shuffling is for Monte Carlo simulations. The idea that someone could crack the code and start predicting what was going to come next is irrelevant in these situations.

If you have a choice between implementing securely or insecurely for the same price, obviously you should take the secure method – hey, you never know when your code will be re-used – but how much does the secure method cost?

Atwood recommends generating a GUID per card, and sorting by that! Oh dear! Not only is it O(n.log n) but it has a huge constant time. My Monte Carlo simulations are going to be dominated by the card-shuffling, especially when testing a game like Blackjack with simplistic play and 416 card decks. I’ll stick with my algorithm, thanks!

I made a note to go back to leave another comment, but before I got a chance…

… Atwood posted another follow-up in which he called the algorithm I used “naïve”. He then quietly demonstrated that it wasn’t truly random at all.

You could have knocked me down with a feather. I was completely wrong. Perhaps “naïve” is the word!

My algorithm had biases – exactly the sort of biases that you don’t want in a Monte Carlo simulation.

Atwood points out that a slightly different implementation fixes the problem, and seems to offer the best of both worlds.

In summary: I’ve been owned by Jeff Atwood. Eric Lippert owns a part-share of me, too.

7 CommentsCategories: Doubleplus Geek,Insufficiently Advanced Technology,S/W Dev
Tags: cards, Coding Horror, randomness, security, shuffling

Comments

  1. Back in the days of Windows ’95, I wrote a program that shuffled 4 million, uniquely-numbered cards. It produced a file containing the card number, a prize description (most of which were, “You can enter the 2nd chance draw!”) and a verification code. The file was sent to a printing firm, which printed up the 4 million cards.

    I got the shuffling algorithm badly wrong twice before finally devising a variant of Knuth’s algorithm. At the time I was proud that it ran in O(number of cards) time and O(number of types of card) space while giving a pragmatically – if not theoretically – random distribution. Looking back, I’m just relieved that there wasn’t a lotteries board asking to see my random number generator 🙂

  2. I could’ve told you, had I known you were involved in the first place.

    I read a bunch about the derivation of the Fisher-Yates shuffle a few years back, because it’s given as the answer to “how do I shuffle an array?” in the Perl documentation’s FAQs, and bias in shuffles came up in a number of threads on Perl Monks. One of them, that I remember well, discussed doing a quicksort where the comparison function declares the numbers equal or unequal half the time, respectively, based on the output of a random number generator. (@shuffled = sort { rand < .5 } @orig) – which seems to work, but suffers from bias. It took me a while to understand why. That was my first contact with the subject matter.

    So when I read Jeff’s article, I looked at that algorithm, scratched my head, went “err hmm… that looks biased”, and then put it out of my mind and went about my ways. I was not at all surprised to read Jeff’s tail-between-legs follow-up about that algorithm being broken and how it took him a while to understand why.

    Anything that needs to be analysed stochastically is notoriously insidious to get right – particularly if it involves randomness. And of course we all know that “Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.” —John von Neumann

  3. This would be a good interview question.

    The correct answer of course is std::random_shuffle.

    I had a brief play with it just now. Here it is shuffling a three card deck 600000 times:

    abc: 99973
    acb: 100118
    bac: 99733
    bca: 99955
    cab: 100043
    cba: 100178

  4. The correct answer of course is std::random_shuffle.

    Heh. That answer would be correct, except that you started with the premise of C++… :-)

    The standard libraries of most languages probably have an equivalent. The Perl FAQ entry starts out “if your Perl isn’t ancient, use List::Util::shuffle.” Atwood is a .NET guy; that huge framework more likely than not also has a shuffling function somewhere. Of course programmers are prone to reinvent seemingly-trivial functionality before they think of thumbing through their tool’s docs.

  5. That answer would be correct, except that you started with the premise of C++… 🙂

    Guilty as charged…

    Actually whilst poking around with this I found that the Visual C++ implementation of the std::random_shuffle algorithm correctly handles the fact that rand() has only a certain degree of precision, whereas the gcc implementation does not. This may introduce a subtle bias in the gcc version if the number of items to be shuffled approaches RANDOM_MAX.

  6. I would submit that Jeff Atwood was told by someone else about his error, and in fact was owned himself. Jeff blogged on it, but didn’t actually own anyone. There’s some code ninja who has actually owned you by proxy…

    Likely this person.

    I actually agreed when he said when it was complicated, because I couldn’t imagine myself doing it. I mean physically, with a set of cards. If I was being really anal about shuffling cards, I’d do the following. In Java:

    public static List shuffle(List cards)
    {
    final List out = new LinkedList();
    for (final Card c : cards)
    {
    int index = (int) (Math.random() * (out.size() + 1));
    out.add(index, c);
    }
    return out;
    }

    The test I wrote says it’s not “naive”. Now all you need to do is choose a nice list which does O(1) inserts + O(1) lookups (clearly, an exercise for the reader).

    PS: Damn I can’t do code listing…

  7. This may introduce a subtle bias in the gcc version if the number of items to be shuffled approaches RANDOM_MAX.

    This limitation is not correct, and the reality is much worse than your statement. The problem happens if either the state space of the RNG is smaller than the possible states from which you want to choose (or the state space of the initializer is smaller than the output state space). If you want to shuffle 52 cards, you have 52! output states, so you need at least 226 bits of state in the RNG (and it has to be initialized somehow with at least that many bits).

    As cited above, see the Wikipedia entry for Fisher-Yates shuffle.

    Andrew

Leave a comment

You must be logged in to post a comment.