Algorithms to Live By: The Computer Science of Human Decisions. Brian Christian
Читать онлайн книгу.something that you will never search is a complete waste; searching something you never sorted is merely inefficient.
The question, of course, becomes how to estimate ahead of time what your future usage will be.
The poster child for the advantages of sorting would be an Internet search engine like Google. It seems staggering to think that Google can take the search phrase you typed in and scour the entire Internet for it in less than half a second. Well, it can’t—but it doesn’t need to. If you’re Google, you are almost certain that (a) your data will be searched, (b) it will be searched not just once but repeatedly, and (c) the time needed to sort is somehow “less valuable” than the time needed to search. (Here, sorting is done by machines ahead of time, before the results are needed, and searching is done by users for whom time is of the essence.) All of these factors point in favor of tremendous up-front sorting, which is indeed what Google and its fellow search engines do.
So, should you alphabetize your bookshelves? For most domestic bookshelves, almost none of the conditions that make sorting worthwhile are true. It’s fairly rare that we find ourselves searching for a particular title. The costs of an unsorted search are pretty low: for every book, if we know roughly where it is we can put our hands on it quickly. And the difference between the two seconds it would take to find the book on a sorted shelf and the ten seconds it would take to scan for it on an unsorted one is hardly a deal breaker. We rarely need to find a title so urgently that it’s worth spending preparatory hours up front to shave off seconds later on. What’s more, we search with our quick eyes and sort with slow hands.
The verdict is clear: ordering your bookshelf will take more time and energy than scanning through it ever will.
Your unsorted bookshelf might not be an everyday preoccupation, but your email inbox almost certainly is—and it’s another domain where searching beats sorting handily. Filing electronic messages by hand into folders takes about the same amount of time as filing physical papers in the real world, but emails can be searched much more efficiently than their physical counterparts. As the cost of searching drops, sorting becomes less valuable.
Steve Whittaker is one of the world’s experts on how people handle their email. A research scientist at IBM and professor at UC Santa Cruz, Whittaker, for almost two decades, has been studying how people manage personal information. (He wrote a paper on “email overload” in 1996, before many people even had email.) In 2011, Whittaker led a study of the searching and sorting habits of email users, resulting in a paper titled “Am I Wasting My Time Organizing Email?” Spoiler alert: the conclusion was an emphatic Yes. “It’s empirical, but it’s also experiential,” Whittaker points out. “When I interview people about these kinds of organizational problems, that’s something that they characteristically talk about, is that they sort of wasted a part of their life.”
Computer science shows that the hazards of mess and the hazards of order are quantifiable and that their costs can be measured in the same currency: time. Leaving something unsorted might be thought of as an act of procrastination—passing the buck to one’s future self, who’ll have to pay off with interest what we chose not to pay up front. But the whole story is subtler than that. Sometimes mess is more than just the easy choice. It’s the optimal choice.
Sorts and Sports
The search-sort tradeoff suggests that it’s often more efficient to leave a mess. Saving time isn’t the only reason we sort things, though: sometimes producing a final order is an end in itself. And nowhere is that clearer than on the sporting field.
In 1883, Charles Lutwidge Dodgson developed incredibly strong feelings about the state of British lawn tennis. As he explains:
At a Lawn Tennis Tournament, where I chanced, some while ago, to be a spectator, the present method of assigning prizes was brought to my notice by the lamentations of one of the Players, who had been beaten (and had thus lost all chance of a prize) early in the contest, and who had had the mortification of seeing the 2nd prize carried off by a Player whom he knew to be quite inferior to himself.
Normal spectators might chalk up such “lamentations” to little more than the sting of defeat, but Dodgson was no ordinary sympathetic ear. He was an Oxford lecturer in mathematics, and the sportsman’s complaints sent him on a deep investigation of the nature of sports tournaments.
Dodgson was more than just an Oxford mathematician—in fact, he’s barely remembered as having been one. Today he’s best known by his pen name, Lewis Carroll, under which he wrote Alice’s Adventures in Wonderland and many other beloved works of nineteenth-century literature. Fusing his mathematical and literary talents, Dodgson produced one of his lesser-known works: “Lawn Tennis Tournaments: The True Method of Assigning Prizes with a Proof of the Fallacy of the Present Method.”
Dodgson’s complaint was directed at the structure of the Single Elimination tournament, where players are paired off with one another and eliminated from competition as soon as they lose a single match. As Dodgson forcefully argued, the true second-best player could be any of the players eliminated by the best—not just the last-eliminated one. Ironically, in the Olympics we do hold bronze medal matches, by which we appear to acknowledge that the Single Elimination format doesn’t give us enough information to determine third place.* But in fact this format doesn’t tell us enough to determine second place either—or, indeed, anything except the winner. As Dodgson put it, “The present method of assigning prizes is, except in the case of the first prize, entirely unmeaning.” Said plainly, the silver medal is a lie.
“As a mathematical fact,” he continued, “the chance that the 2nd best Player will get the prize he deserves is only 16/31sts; while the chance that the best 4 shall get their proper prizes is so small, that the odds are 12 to 1 against its happening!”
Despite the powers of his pen, it appears that Dodgson had little impact on the world of lawn tennis. His solution, an awkward take on triple elimination where the defeat of someone who had defeated you could also eliminate you, never caught on. But if Dodgson’s solution was cumbersome, his critique of the problem was nevertheless spot on. (Alas, silver medals are still being handed out in Single Elimination tournaments to this day.)
But there’s also a deeper insight in Dodgson’s logic. We humans sort more than our data, more than our possessions. We sort ourselves.
The World Cup, the Olympics, the NCAA, NFL, NHL, NBA, and MLB—all of these implicitly implement sorting procedures. Their seasons, ladders, and playoffs are algorithms for producing rank order.
One of the most familiar algorithms in sports is the Round-Robin format, where each of n teams eventually plays every one of the other n − 1 teams. While arguably the most comprehensive, it’s also one of the most laborious. Having every team grapple with every other team is like having guests exchange hugs at our dinner party: the dreaded O(n2), quadratic time.
Ladder tournaments—popular in sports like badminton, squash, and racquetball—put players in a linear ranking, with each player allowed to issue a direct challenge to the player immediately above them, exchanging places if they prevail. Ladders are the Bubble Sorts of the athletic world and are thus also quadratic, requiring O(n2) games to reach a stable ranking.
Perhaps the most prevalent tournament format, however, is a bracket tournament—as in the famous NCAA basketball “March Madness,” among many others. The March Madness tournament progresses from the “Round of 64” and the “Round of 32” to the “Sweet 16,” “Elite Eight,” “Final Four,” and the finals. Each round divides the field in half: does that sound familiarly logarithmic? These tournaments are effectively Mergesort, beginning with unsorted pairs of teams and collating, collating, collating them.
We know that Mergesort operates in linearithmic time—O(n log n)—and so, given that there are 64 teams, we can expect to only need something like 6 rounds (192 games), rather than the whopping 63 rounds (2,016 games) it would take to do a Ladder or Round-Robin. That’s a huge improvement: algorithm design at work.
Six rounds of March Madness sounds about right,