Robert Walker’s entire career depends on the answer to that question. Walker is the sports book director at the Mirage Hotel and Casino in Las Vegas, which means that every week he fields thousands of bets in sports ranging from pro football to Ivy League basketball. For all those games, Walker has to offer a line (or point spread), which lets bettors know which team is favored to win and by how many points. The way the tine works is simple. Say the Giants are favored this week by three and a half points over the Rams. If you bet on the Giants, they have to win by four points or more for you to win the bet. Conversely, if you bet on the Rams, they have to lose by three points or less (or win), for you to walk away with the casino’s money. In other sports, bets are framed in terms of odds: if you bet on the favorite, you might have to put down $150 to get $100 back, while if you bet on the underdog, you’d have to lay down $75 to win $100.
As a bookmaker Walker's job is not to try to pick what team will win. He leaves that to the gamblers, at least in theory. Instead, his job is to make sure that the gamblers bet roughly the same amount of money on one team as on the other. If he does that, then he knows that he will win half the bets he’s taken in and lose the other half. Why would Walker be satisfied with just breaking even? Because bookies make more money on every bet they win than they lose on every bet they get wrong. If you place a point-spread bet with a bookie, you have to put up $11 to win $10. Imagine there are only two bettors, one who bets on the favorite and the other who bets on the underdog. Walker takes in $22 ($11 from each of them). He pays out $21 to the winner. The $1 he keeps is his profit. That slim advantage, which is known as the vigorish, or the vig, is what pays the bookie’s bills. And the bookie keeps that advantage only when he avoids having too much money riding on one side of a bet.
To keep that from happening, Walker needs to massage the point spread so that bets keep coming in for both teams. ‘The line we want is the line that’ll split the public, because that’s when you start earning that vig,” he said. In the week before the 2001 Super Bowl, for instance, the Mirage’s opening line had the Baltimore Ravens favored by two and a half points, But soon after the line was posted, the Mirage booked a couple of early $3,000 bets on Baltimore. That’s not much money, but it was enough to convince Walker to raise the point spread to three. If everyone wanted to bet on Baltimore, chances were the line wasn’t right. So the line moved. The opening line is set by the bookmaker, but it shifts largely in response to what bettors do—much as stock prices rise and fall with investor demand.
In theory you could set the opening line wherever and simply allow it to adjust from there automatically, so that the point spread would rise or fall anytime there was a significant imbalance between the amounts wagered on each side. The Mirage would have no problem doing this; its computerized database tracks the bets as they come in. But bookies place a premium on making the opening line as accurate as possible, because if they set it badly they’re going to get stuck taking a lot of bad bets. Once a line opens, though, it’s out of the bookie’s hands, and a game’s point spread ends up representing bettors’ collective judgment of what the final outcome of that game will be. As Bob Martin, who was essentially the country’s oddsmaker in the 1970s, said, “Once you put a number on the board, it becomes public property”
The public, it turns out, is pretty smart. It does not have a crystal ball: point spreads only weakly predict the final scores of most NFL games, for instance. But it is very hard for even well-informed gamblers to beat the final spread consistently. In about half the games, favorites cover the spread, while in the other half underdogs beat the spread. This is exactly what a bookie wants to have happen. And there are no obvious mistakes in the market’s judgment—like, say, home teams winning more than the crowd predicts they will, or road underdogs being consistently undervalued. Flaws in the crowd’s judgment are found occasionally, but when they are they’re typically like theone documented in a recent paper that found that in weeks fifteen, sixteen, and seventeen of the NFL season, home underdogs have historically been a good bet. So you have to search hard to outperform the betting crowd. Roughly three-quarters of the time, the Mirage’s final line will be the most reliable forecast of the outcomes of NFL games that you can find.
The same is true in many other sports. Because sports betting is a kind of ready-made laboratory to study predictions and their outcomes, a host of academics have perused gambling markets to see how efficient—that is, how good at capturing all the available information—they are. The results of their studies are consistent: in general, in most major sports the market is relatively efficient. In some cases, the crowd’s performance is especially good: in horse racing, for instance, the final odds reliably predict the race’s order of finish (that is, the favorite wins most often, the horse with the second-lowest odds is the second-most-often winner, and so on) and also provide, in economist Raymond D. Sauer’s words, “reasonably good estimates of the probability of winning.” In other words, a three-to-one horse will win roughly a quarter of the time. There are exceptions: odds are less accurate in those sports and games where the betting market is smaller and less liquid (meaning that the odds can change dramatically thanks to only a few bets), like hockey or golf or small-college basketball games. These are often the sports where professional gamblers can make real money, which makes sense given that we know the bigger the group, the more accurate it becomes. And there are also some interesting quirks: in horseracing, for instance, people tend to bet on long shots slightly more often than they should and bet on favorites slightly less often than they should. (This seems to be a case of risk-seeking behavior: bettors, especially bettors who have been losing, would rather take a flyer on a long shot that offers the possibility of big returns than grind it out by betting on short-odds favorites.) But on the whole, if bettors aren’t collectively foreseeing the future, they’re doing the next best thing.
The stock market did not pause to mourn. Within minutes, investors started dumping the stocks of the four major contractors who had participated in the Challenger launch; Rockwell International, which built the shuttle and its main engines; Lockheed, which managed ground support; Martin Marietta, which manufactured the ship’s external fuel tank and Morton Thiokol, which built the solid-fuel booster rocket. Twenty-one minutes after the explosion, Lockheed’s stock was down 5 percent, Martin Marietta’s was down 3 percent, and Rockwell was down 6 percent.
Morton Thiokol’s stock was hit hardest of all. As the finance professors Michael T. Maloney and J. Harold Mulherin report in their fascinating study of the market’s reaction to the Challenger disaster so many investors were trying to sell Thiokol stock and so few people were interested in buying it that a trading halt was called almost immediately. When the stock started trading again, almost an hour after the explosion, it was down 6 percent. By the end of the day, its decline had almost doubled, so that at market close, Thiokol’s stock was down nearly 12 percent. By contrast, the stocks of the three other firms started to creep back up, and by the
end of the day their value had fallen only around 3 percent.
What this means is that the stock market had, almost immediately labeled Morton Thiokol as the company that was responsible for the Challenger disaster. The stock market is, at least in theory a machine for calculating the present value of all the “free cash flow” a company will earn in the future, (Free cash flow is the money that’s left over after a company has paid all its bills and its taxes, has accounted for depreciation, and has invested in the business. It’s the money you’d get to take home and put in the bank if you were the sole owner of the company) The steep decline in Thiokol’s stock price—especially compared with the slight declines in the stock prices of its competitors—was an unmistakable sign that investors believed that Thiokol was responsible, and that the consequences for its bottom line would he severe.
As Maloney and Mulherin point out, though, on the day of the disaster there were no public comments singling out Thiokol as the guilty party. While the New York Times article on the disaster that appeared the next morning did mention two rumors that had been making the rounds, neither of the rumors implicated Thiokol, and the Times declared, "There are no clues to the cause of the accident.”
Regardless, the market was right. Six months after the explosion, the Presidential Commission on the Challenger revealed that the 0-ring seals on the booster rockets made by Thiokol—seals that were supposed to prevent hot exhaust gases from escaping—became less resilient in cold weather, creating gaps that allowed the gases to leak out. (The physicist Richard Feynman famously demonstrated this at a congressional hearing by dropping an 0-ring in a glass of ice water. When he pulled it out, the drop in temperature had made it brittle.) In the case of the Challenger, the hot gases had escaped and burned into the main fuel tank, causing the cataclysmic explosion. Thiokol was held liable for the accident. The other companies were exonerated.
In other words, within a half hour of the shuttle blowing up, the stock market knew what company was responsible. To be sure, this was a single event, and it’s possible that the market's singling out of Thiokol was just luck. Or perhaps the company’s business seemed especially susceptible to a downturn in the space program. Possibly the trading halt had sent a signal to investors to be wary These all are important cautions, but there is still something eerie about what the market did. That's especially true because in this case the stock market was working as a pure weighing machine, undistorted by the factors—media speculation, momentum trading, and Wall Street hype—that make it a peculiarly erratic mechanism for aggregating the collective wisdom of investors. That day, it was just buyers and sellers trying to figure out what happened and getting it right.
How did they get it right? That’s the question that Maloney and Mulherin found so vexing. First, they looked at the records of insider trades to see if Thiokol executives, who might have known that their company was responsible, had dumped stock on January 28. They hadn’t. Nor had executives at Thiokol’s competitors, who might have heard about the 0-rings and sold Thiokol’s stock short. There was no evidence that anyone had dumped Thiokol stock while buying the stocks of the other three contractors (which would have been the logical trade for someone with inside information). Savvy insiders alone did not cause that first-day drop in Thiokol’s price. It was all those investors—most of them relatively uninformed—who simply refused to buy the stock.
But why did they not want Thiokol’s stock? Maloney and Mulherin were finally unable to come up with a convincing answer to that question. In the end, they assumed that insider information was responsible for the fail in Thiokol’s price, but they could not explain how. Tellingly, they quoted the Cornell economist Maureen O’Hara, who has said, “While markets appear to work in practice, we are not sure how they work in theory”
Maybe. But it depends on what you mean by “theory”. If you strip the story down to its basics, after all, what happened that January day was this: a large group of individuals (the actual and potential shareholders of Thiokol’s stock, and the stocks of its competitors) was asked a question—”how much less are these four companies worth now that the Challenger has exploded?”—that had an objectively correct answer. Those are conditions under which a crowds average estimate—which is, dollar weighted, what a stock price is—h likely to be accurate. Perhaps someone did, in fact, have inside knowledge of what had happened to the 0-rings. But even if no one did, it's plausible that once you aggregated all the bits of information about the explosion that all the traders in the market had in their heads that day, it added up to something close to the truth. As was true of those who helped John Craven find the Scorpion, even if none of the traders was sure that Thiokol was responsible, collectively they were certain it was.
The market was smart that day because it satisfied the four conditions that characterize wise crowds: diversity of opinion (each person should have some private information, even if it’s just an eccentric interpretation of the known facts), independence (people’s opinions are not determined by the opinions of those around them), decentralization (people are able to specialize and draw on local knowledge), and aggregation (some mechanism exists for turning private judgments into a collective decision). If a group satisfies those conditions, its judgment is likely to be accurate. Why? At heart, the answer rests on a mathematical truism. If you ask a large enough group of diverse, independent people to make a prediction or estimate a probability and then average those estimates, the errors each of them makes in coming up with an answer will cancel themselves out. Each person’s guess, you might say, has two components: information and error. Subtract the error, and you’re left with the information.
Now, even with the errors canceled out, it’s possible that a group’s judgment will be bad. For the group to be smart, there has to be at least some information in the “information” part of the “information minus error” equation. (If you’d asked a large group of children to buy and sell stocks in the wake of the Challenger disaster it’s unlikely they would have picked out Thiokol as the culprit.) What is striking, though—and what makes a phrase like “the wisdom of crowds” meaningful—is just how much information a group’s collective verdict so often contains. In cases like Francis Galton’s experiment or the Challenger explosion, the crowd is holding a nearly complete picture of the world in its collective brain.
Perhaps this isn’t surprising. After all, we are the products of evolution, and presumably we have been equipped to make sense of the world around us. But who knew that, given the chance, we can collectively make so much sense of the world. After all, think about what happens if you ask a hundred people to run a 100-meter race, and then average their times. The average time will not be better than the time of the fastest runners. It will be worse. It will be a mediocre time. But ask a hundred people to answer a question or solve a problem, and the average answer will often be at least as good as the answer of the smartest member. With most things, the average is mediocrity. With decision making, it’s often excellence. You could say it’s as if we've been programmed to be collectively smart.
Who Wants to Be a Millionaire? was a simple show in terms of structure: a contestant was asked multiple-choice questions, which got successively more difficult, and if she answered fifteen questions in a row correctly she walked away with $1 million. The show’s gimmick was that if a contestant got stumped by a question, she could pursue three avenues of assistance. First, she could have two of the four multiple-choice answers removed (so she’d have at least a fifty-fifty shot at the right response). Second, she could place a call to a friend or relative, a person whom, before the show, she had singled out as one of the smartest people she knew, and ask him or her for the answer. And third, she could poll the studio audience, which would immediately cast its votes by computer. Everything we think we know about intelligence suggests that the smart individual would offer the most help. And, in fact, the “experts” did okay offering the right answer—under pressure—almost 65 percent of the time. But they paled in comparison to the audiences. Those random crowds of people with nothing better to do on a weekday afternoon than sit in a TV studio picked the right answer 91 percent of the time.
Now the results of Who Wants to Be a Millionaire? would never stand up to scientific scrutiny. We don’t know how smart the experts were, so we don’t know how impressive outperforming them was. And since the experts and the audiences didn’t always answer the same questions, it’s possible, though not likely that the audiences were asked easier questions. Even so, it’s hard to resist the thought that the success of the Millionaire audience was a modern example of the same phenomenon that Francis Galton caught a glimpse of a century ago.
As it happens, the possibilities of group intelligence, at least when it came to judging questions of fact, were demonstrated by a host of experiments conducted by American sociologists and psychologists between 1920 and the mid-1950s, the heyday of research into group dynamics. Although in general, as we’ll see, the bigger the crowd the better, the groups in most of these early experiments—which for some reason remained relatively unknown outside of academia—were relatively small. Yet they nonetheless performed very well. The Columbia sociologist Hazel Knight kicked things off with a series of studies in the early 1920s, the first of which had the virtue of simplicity In that study Knight asked the students in her class to estimate the room’s temperature, and then took a simple average of the estimates. The group guessed 72.4 degrees, while the actual temperature was 72 degrees. This was not, to be sure, the most auspicious beginning, since classroom temperatures are so stable that it’s hard to imagine a class’s estimate being too far off base. But in the years that followed, far more convincing evidence emerged, as students and soldiers across America were subjected to a barrage of puzzles, intelligence tests, and word games. The sociologist Kate H. Gordon asked two hundred students to rank items by weight, and found that the group’s “estimate” was 94 percent accurate, which was better than all but five of the individual guesses. In another experiment students were asked to look at ten piles of buckshot—each a slightly different size than the rest—that had been glued to a piece of white cardboard, and rank them by size. This time, the group’s guess was 94.5 percent accurate. A classic demonstration of group intelligence is the jelly-beans-in-the-jar experiment, in which invariably the group’s estimate is superior to the vast majority of the individual guesses. When finance professor Jack Treynor ran the experiment in his class with a jar that held 850 beans, the group estimate was 871. Only one of the fifty-six people in the class made a better guess.
There are two lessons to draw from these experiments. First, in most of them the members of the group were not talking to each other or working on a problem together. They were making individual guesses, which were aggregated and then averaged. This is exactly what Galton did, and it is likely to produce excellent results. (In a later chapter, we’ll see how having members interact changes things, sometimes for the better, sometimes for the worse.) Second, the group’s guess will not be better than that of every single person in the group each time. In many (perhaps most) cases, there will be a few people who do better than the group. This is, in some sense, a good thing, since especially in situations where there is an incentive for doing well (like, say, the stock market) it gives people reason to keep participating. But there is no evidence in these studies that certain people consistently outperform the group. In other words, if you run ten different jelly-bean-counting experiments, it’s likely that each time one or two students will outperform the group. But they will not be the same students each time. Over the ten experiments, the group’s performance will almost certainly be the best possible. The simplest way to get reliably good answers is just to ask the group each time.
A similarly blunt approach also seems to work when wrestling with other kinds of problems. The theoretical physicist Norman L. Johnson has demonstrated this using computer simulations of individual “agents” making their way through a maze. Johnson, who does his work at the Los Alamos National Laboratory was interested in understanding how groups might be able to solve problems that individuals on their own found difficult. So he built a maze—one that could be navigated via many different paths, some shorter, and some longer—and sent a group of agents into the maze one by one. The first time through, they just wandered around, the way you would if you were looking for a particular café in a city where you’d never been before. Whenever they came to a turning point—what Johnson called a “node” —they would randomly choose to go right or left. Therefore some people found their way, by chance, to the exit quickly others more slowly. Then Johnson sent theagents back into the maze, but this time he allowed them to use the information they’d learned on their first trip, as if they'd dropped bread crumbs behind them the first time around. Johnson wanted to know how well his agents would use their new information. Predictably enough, they used it well, and were much smarter the second time through. The average agent took 34.3 steps to find the exit the first time, and just 12.8 steps to find it the second.
The key to the experiment, though, was this: Johnson took the results of all the trips through the maze and used them to calculate what he called the group’s “collective solution.” He figured out what a majority of the group did at each node of the maze, and then plotted a path through the maze based on the majority’s decisions. (If more people turned left than right at a given node, that was the direction he assumed the group took. Tie votes were broken randomly.) The group’s path was just nine steps long, which was not only shorter than the path of the average individual (12.8 steps), but as short as the path that even the smartest individual had been able to come up with. It was also as good an answer as you could find. There was no way to get through the maze in fewer than nine steps, so the group had discovered the optimal solution. The obvious question that follows, though, is: The judgment of crowds may be good in laboratory settings and classrooms, but what happens in the real world?
In May 1968, the U.S. submarine Scorpion disappeared on its way back to Newport News after a tour of duty in the North Atlantic. Although the navy knew the sub’s last reported location, it had no idea what had happened to the Scorpion, and only the vaguest sense ofhow far it might have traveled after it had last made radio contact. As a result, the area where the navy began searching for the Scorpion was a circle twenty miles wide and many thousands of feet deep. You could not imagine a morehopeless task. The only possible solution, one might have thought, was to track down three or four top experts on submarines and ocean currents, ask them where they thought the Scorpion was, and search there. But, as Sherry Sontag and Christopher Drew recount in their book Blind Mans Bluff a naval officer named John Craven had a different plan.
First, Craven concocted a series of scenarios—alternative explanations for what might have happened to the Scorpion. Then he assembleda team of men with a wide range of knowledge, including mathematicians, submarine specialists, and salvage men. Instead of asking them to consult with each other to come up with an answer, he asked each of them to offer his best guess about how likely each of the scenarios was. To keep things interesting, the guesses were in theform of wagers, with bottles of Chivas Regal as prizes. And so Craven’s men bet on why the submarine ran into trouble, on its speed as it headed to the ocean bottom, on the steepness of its descent, and so forth.
Needless to say no one of these pieces of information could tell Craven where the Scorpion was. But Craven believed that if he put all the answers together, building a composite picture of how the Scorpion died, he’d end up with a pretty good idea of where it was. And that’s exactly what he did. He took all the guesses, and used a formula called Bayes’s theorem to estimate the Scorpion’s final location. (Bayes’s theorem is a way of calculating how new information about an event changes your preexisting expectations of how likely the event was.) When he was done, Craven had what was, roughly speaking, the group’s collective estimate of where the submarine was.
The location that Craven came up with was not a spot that any individual member of the group had picked. In other words, not one of the members of the group had a picture in his head that matched the one Craven had constructed using the information gathered from all of them. The final estimate was a genuinely collective judgment that the group as a whole had made, as opposed to representing the individual judgment of the smartest people in it. It was also a genuinely brilliant judgment. Five months after the Scorpion disappeared, a navy ship found it. It was 220 yards from where Craven’s group had said it would he.
What’s astonishing about this story is that the evidence that the group was relying on in this case amounted to almost nothing. It was really just tiny scraps of data. No one knew why the submarine sank, no one had any idea how fast it was traveling or how steeply it fell to the ocean floor. And yet even though no one in the group knew any of these things, the group as a whole knew them all.
The second kind of problem is what’s usually called a coordination problem. Coordination problems require members of a group (market, subway riders, college students looking for a party) to figure out how to coordinate their behavior with each other, knowing that everyone else is trying to do the same. How do buyers and sellers find each other and trade at a fair price? How do companies organize their operations? How can you drive safely in heavy traffic? These are all problems of coordination. The final kind of problem is a cooperation problem. As their
name suggests, cooperation problems involve the challenge of getting self-interested, distrustful people to work together, even when narrow self-interest would seem to dictate that no individual should take part. Paying taxes, dealing with pollution, and agreeing on definitions of what counts as reasonable pay are all examples of cooperation problems.
A word about structure. The first half of this book is, you might say, theory although leavened by practical examples. There’s a chapter for each of the three problems (cognition, coordination, and cooperation), and there are chapters covering the conditions that are necessary for the crowd to he wise: diversity, independence, and a particular kind of decentralization. The first half begins with the wisdom of crowds, and then explores the three conditions that make it possible, before moving on to deal with coordination and cooperation.
The second part of the book consists of what are essentially case studies. Each of the chapters is devoted to a different way of organizing people toward a common (or at least loosely common) goal, and each chapter is about the way collective intelligence either flourishes or flounders. In the chapter about corporations, for instance, the tension is between a system in which only a few people exercise power and a system in which many have a voice. The chapter about markets starts with the question of whether markets can be collectively intelligent, and ends with a look at the dynamics of a stock-market bubble.
There are many stories in this book of groups making bad decisions, as well as groups making good ones. Why? Well, one reason is that this is the way the world works. The wisdom of crowds has a far more important and beneficial impact on our everyday lives than we recognize, and its implications for the future are immense. But in the present, many groups struggle to make even mediocre decisions, while others wreak havoc with their had judgment. Groups work well under certain circumstances, and less well under others. Groups generally need rules to maintain order and coherence, and when they’re missing or malfunctioning, the result is trouble. Groups benefit from members talking to and learning from each other, hut too much communication, paradoxically, can actually make the group as a whole less intelligent. While big groups are often good for solving certain kinds of problems, big groups can also he unmanageable and inefficient. Conversely, small groups have the virtue of being easy to run, but they risk having too little diversity of thought and too much consensus. Finally Mackay was right about the extremes of collective behavior: there are times—think of a riot, or a stockmarket bubble—when aggregating individual decisions produces a collective decision that is utterly irrational. The stories of these kinds of mistakes are negative proofs of this books argument, underscoring the importance to good decision making of diversity and independence by demonstrating what happens when they’re missing.
Diversity and independence are important because the best collective decisions are the product of disagreement and contest, not consensus or compromise. An intelligent group, especially when confronted with cognition problems, does not ask its members to modify their positions in order to let the group reach a decision everyone can be happy with. Instead, it figures out how to use mechanisms —like market prices, or intelligent voting systems— to aggregate and produce collective judgments that represent not what any one person in the group thinks but rather, in some sense. what they all think. Paradoxically the best way for a group to be smart is for each person in it to think and act as independently as possible.
Perhaps the most severe critic of the stupidity of groups was the French writer Gustave Le Bon, who in 1895 published the polemical classic The Crowd: A Study of the Popular Mind. Le Bon was appalled by the rise of democracy in the West in the nineteenth century and dismayed by the idea that ordinary people had come to wield political and cultural power. But his disdain for groups went deeper than that.A crowd, Le Bon argued, was more than just the sum of its members. Instead, it was a kind of independent organism. ft had an identity and a will of its own, and it often acted in ways that no one within the crowd intended. When the crowd did act, Le Bon argued, it invariably acted foolishly. A crowd might be brave or cowardly or cruel, but it could never be smart. As he wrote, “In crowds it is stupidity and not mother wit that is accumulated.” Crowds “can never accomplish acts demanding a high degree of intelligence,” and they are “always intellectually inferior to the isolated individual.” Strikingly for Le Bon, the idea of “the crowd” included not just obvious examples of collective wildness, like lynch mobs or rioters. It also included just about any kind of group that could make decisions.
So Le Bon lambasted juries, which “deliver verdicts of which each individual juror would disapprove.” Parliaments, he argued, adopt laws that each of theft members would normally reject. In fact, if you assembled smart people who were specialists in a host of different fields and asked them to “make decisions affecting matters ofgeneral interest,” the decisions they would reachwould be no better, on the whole, than those “adopted by a gathering of imbeciles.”
Over the course of this book I follow Le Bon’s lead in giving the words “group” and “crowd” broad definitions, using thewords to refer to everything from game-show audiences to multibillion-dollar corporations to a crowd of sports gamblers. Some of the groups in this book, like the management teams in Chapter 9 are tightly organized and very much aware of their identities as groups. Other crowds, like the herds of cars caught in traffic that I write about in Chapter 7, have no formal organization at all. And still others, like the stock market, exist mainly as an ever-changing collection of numbers and dollars. These groups are all different, but they have in common the ability to act collectively to make decisions and solve prohlems—even if the people in the groups aren’t always aware that’s what they’re doing. And what is demonstrahly true of some of these groups—namely, that they are smart and good at problem solving—is potentially true of most, if not all, of them. In that sense, Gustave Le Bon had things exactly backward. if you put together a big enough and diverse enough group of people and ask them to “make decisions affecting matters of general interest,” that group’s decisions will, over time, be “intellectually (superior) to the isolated individual,” no matter how smart or well-informed he is.
This intelligence, or what I’ll call the wisdom of crowds,” is at work in the world in many different guises. It’s the reason the Internet search engine Google can scan a billion Web pages and find the one page that has the exact piece of information you were looking for. It’s the reason it’s so hard to make money betting on NFL games, and it helps explain why for the past fifteen years, a few hundred amateur traders in the middle of iowa have done a better job of predicting election results than Gallup polls have. The wisdom of crowds has something to tell us about why the stock market works (and about why every so often, it stops working). ‘The idea of collective intelligence helps explain why when you go to the convenience store in search of milk at two in the morning, there is a carton of milk waiting there for you, and it even tells us something important about why people pay their taxes and help coach Little League. It’s essential to good science. And it has the potential to make a profound difference in the way companies do business.
In one sense, this book tries to describe the world as it is, looking at things that at first glance may not seem similar but that are ultimately very much alike. But this book is also about the world as it might be. One of the striking things about the wisdom of crowds is that even though its effects are all around us, it’s easy to miss, and, even when it’s seen, it can he hard to accept. Most of us, whether as voters or investors or consumers or managers, believe that valuable knowledge is concentrated in a very few hands (or, rather, in a very few heads). We assume that the key to solving problems or making good decisions is finding that one right person who will have the answer. Even when we see a large crowd of people, many of them not especially well-informed, do something amazing like, say predict the outcomes of horse races, we are more likely to attribute that success to a few smart people in the crowd than to the crowd itself. As sociologists Jack B. Soll and Richard Larrick put it, we feel the need to “chase the expert.” The argument of this book is that chasing the expert is a mistake, and a costly one at that. We should stop hunting and ask the crowd (which, of course, includes the geniuses as well as everyone else) instead. Chances are, it knows.
Galton’s destination was the annual West of England Fat Stock and Poultry Exhibition, a regional fair where the local farmers and townspeople gathered to appraise the quality of each other’s cattle, sheep, chickens, horses, and pigs. Wandering through rows of stalls examining workhorses and prize hogs may seem to have been a strange way for a scientist (especially an elderly one) to spend an afternoon, but there was a certain logic to it. Galton was a man obsessed with two things: the measurement of physical and mental qualities, and breeding. And what, after all, is a livestock show but a big showcase for the effects of good and bad breeding?
Breeding mattered to Galton because he believed that only a very few people had the characteristics necessary to keep societies healthy. He had devoted much of his career to measuring those characteristics, in fact, in order to prove that the vast majority of people did not have them. At the International Exhibition of 1884 in London, for instance, he set up an “Anthropometric Laboratory,” where he used devices of his own making to test exhibition-goers on, among other things, their “Keenness of Sight and of Hearing, Colour Sense, Judgment of Eye, [and] Reaction Time.” His experiments left him with little faith in the intelligence of the average person, “the stupidity and wrong-headedness of many men and women being so great as to be scarcely credible.” Only if power and control stayed in the hands of the select, well-bred few, Galton believed, could a society remain healthy and strong.
As he walked through the exhibition that day, Galton came across a weight-judging competition. A fat ox had been selected and placed on display, and members of a gathering crowd were lining up to place wagers on the weight of the ox. (Or rather, they were placing wagers on what the weight of the ox would be after it had been “slaughtered and dressed.”) For sixpence, you could buy a stamped and numbered ticket, where you filled in your name, your address, and your estimate. The best guesses would receive prizes.
Eight hundred people tried their luck. They were a diverse lot. Many of them were butchers and farmers, who were presumably expert at judging the weight of livestock, but there were also quite a few people who had, as it were, no insider knowledge of cattle. “Many non-experts competed,” Galton wrote later in the scientific journal Nature, “like those clerks and others who have no expert knowledge of horses, but who bet on races, guided by newspapers, friends, and their own fancies.” The analogy to a democracy, in which people of radically different abilities and interests each get one vote, had suggested itself to Galton immediately. “The average competitor was probably as well fitted for making a just estimate of the dressed weight of the ox, as an average voter is of judging the merits of most political issues on which he votes,” he wrote.
Galton was interested in figuring out what the “average voter” was capable of because he wanted to prove that the average voter was capable of very little. So he turned the competition into an im-promptu experiment. When the contest was over and the prizes had been awarded, Galton borrowed the tickets from the organizers and ran a series of statistical tests on them. Galton arranged the guesses (which totaled 787 in all, after he had to discard thirteen because they were illegible) in order from highest to lowest and graphed them to see if they would form a bell curve. Then, among other things, he added all the contestants’ estimates, and calculated the mean of the group’s guesses. That number represented, you could say, the collective wisdom of the Plymouth crowd. If the crowd were a single person, that was how much it would have guessed the ox weighed.
Galton undoubtedly thought that the average guess of the group would be way off the mark. After all, mix a few very smart people with some mediocre people and a lot of dumb people, and it seems likely you’d end up with a dumb answer. But Galton was wrong. The crowd had guessed that the ox, after it had been slaughtered and dressed, would weigh 1,197 pounds. After it had been slaughtered and dressed, the ox weighed 1,198 pounds. In other words, the crowd’s judgment was essentially perfect. Perhaps breeding did not mean so much after all. Galton wrote later: “The result seems more creditable to the trustworthiness of a democratic judgment than might have been expected.” That was, to say the least, an understatement.
Crowd behaviour is often associated with irrationality. Crowds form mobs and cults. They panic and the herd instinct is often wrong and easily swayed. At least that is the common perception. But scientist and polymath Francis Galton discovered that not all crowd behaviour was negative. Indeed he found that if you asked enough people the same question, they might come up with better answers than even the experts.
It was in 1906 that Galton made his discovery of what is known as the wisdom of crowds. He attended a farmers' fair in Plymouth where he was intrigued by a weight guessing contest. The goal was to guess the weight of an ox when it was butchered and dressed. Around 800 people entered the contest and wrote their guesses on tickets. The person who guessed closest to the butchered weight of the ox won a prize.
After the contest Galton took the tickets and ran a statistical analysis on them. He discovered that the average guess of all the entrants was remarkably close to the actual weight of the butchered ox. In fact it was under by only 1lb for an ox that weighed 1,198 lbs. This collective guess was not only better than the actual winner of the contest but also better than the guesses made by cattle experts at the fair. It seemed that democracy of thought could produce amazing results.
However, to benefit from the wisdom of crowds several conditions must be in place. First each individual member of the crowd must have their own independent source of information. Second they must make individual decisions and not be swayed by the decisions of those around them. And third, there must be a mechanism in place that can collate these diverse opinions.
Internet search engines are a good example of the wisdom of crowds in action. It is the reason the pages you search for come up near the top of the search engine list.
In general terms the more people are linking to a page and the more popular it is the higher it comes. Another highly visible example of crowd decision making can be found in the television game show Who Wants To Be A Millionaire. When the player does not know which one of four answers is correct, they can ask the audience. Each member of the audience makes a separate and individual vote for the answer they favour. These votes are then collected and the results displayed. Often it is obvious from the result that one particular answer has found favour. And that is the one the player generally goes along with. In 95% of cases it is correct.
First Published in Nature (1907), No. 1949, Vol. 75, 450-451.
In these democratic days, any investigation into the trustworthiness and peculiarities of popular judgments is of interest. The material about to be discussed refers to a small matter, but is much to the point.
A weight-judging competition was carried on at the annual show of the West of England Fat Stock and Poultry Exhibition recently held at Plymouth, A fat ox having been selected, competitors bought stamped and numbered cards, for 6d. each, on which to inscribe their respective names, addresses, and estimates of what the ox would weigh after it had been slaughtered and " dressed." Those who guessed most successfully received prizes. About 8oo tickets were issued, which were kindly lent me for examination after they had fulfilled their immediate purpose. These afforded excellent material.
The judgments were unbiased by passion and uninfluenced by oratory and the like. The sixpenny fee deterred practical joking, and the hope of a prize and the joy of competition prompted each competitor to do his best. The competitors included butchers and farmers, some of whom were highly expert in judging the weight of cattle; others were probably guided by such information as they might pick up, and by their own fancies.
The average competitor was probably as well fitted for making a just estimate of the dressed weight of the ox, as an average voter is of judging the merits of most political issues on which he votes, and the variety among the voters to judge justly was probably much the same in either case. After weeding thirteen cards out of the collection, as being defective or illegible, there remained 787 for discussion. I arrayed them in order of the magnitudes of the estimates, and converted the cwt., quarters, and lbs, in which they were made, into lbs., under which form they will be treated.
Distribution of the estimates of the dressed weight of a particular living ox, made by 787 different persons.
According to the democratic principle of "one vote one value," the middlemost estimate expresses the vox populi, every other estimate being condemned as too low or too high by a majority of the voters (for fuller explanation see " One Vote, One Value," NATURE, February 28, p. 414), Now the middlemost estimate is 1207 lb., and the weight of the dressed ox proved to be 1198 lb.; so the vox populi was in this case 9 lb., or 0.8 per cent of the whole weight too high. The distribution of the estimates about their middlemost value was of the usual type, so far that they clustered closely in its neighbourhood and became rapidly more sparse as the distance from it increased.
Diagram from the tabular values.
But they were not scattered symmetrically. One quarter of them deviated more than 45 lb. above the middle most (3.7 per cent.), and another quarter deviated more than 29 lb. below it (2.4 per cent.), therefore the range of the two middle quarters, that is, of the middle-most half, lay within those limits.
It would be an equal chance that the estimate written on any card picked at random out of the collection lay within or without those limits. In other words, the "probable error" of a single observation may be reckoned as 1/2 (45+29), or 37 lb. (3.1 per cent.). Taking this for the p.e. of the normal curve that is best adapted for comparison with the observed values, the results are obtained which appear in above table, and graphically in the diagram.
The abnormality of the distribution of the estimates now becomes manifest, and is of this kind. The competitors may be imagined to have erred normally in the first instance, and then to have magnified all errors that were negative and to have minified all those that were positive. The lower half of the "observed" curve agrees for a large part of its range with a normal curve having the p.e.=45, and the upper half with one having its p.e.=29. I have not sufficient knowledge of the mental methods followed by those who judge weights to offer a useful opinion as to the cause of this curious anomaly. It is partly a psychological question, in answering which the various psychophysical investigations of Fechner and others would have to be taken into account. Also the anomaly may be partly due to the use of a small variety of different methods, or formulae, so that the estimates are not homogeneous in that respect.
It appears then, in this particular instance, that the vox populi is correct to within 1 per cent of the real value, and that the individual estimates are abnormally distributed in such a way that it is an equal chance whether one of them, selected at random, falls within or without the limits of -3.7 per cent and +2.4 per cent of their middlemost value.
This result is, I think, more creditable to the trust-worthiness of a democratic judgment than might have been expected.
The authorities of the more important cattle shows might do service to statistics if they made a practice of preserving the sets of cards of this description, that they may obtain on future occasions, and loaned them under proper restrictions, as those have been, for statistical discussion. The fact of the cards being numbered makes it possible to ascertain whether any given set is complete.
A facsimile of the original article is available here. Galton's piece resulted a few letters to the editor of Nature that were printed (along with Galton's responses) a few weeks later. The letters can be seen here and here in their original format but are reproduced below. In them Galton reveals that he had also calculated the mean answer as opposed to just the median revealed in the article.
Nature March 28 1907, NO. 1952, VOL. 75
LETTERS TO THE EDITOR.
IN reference to the weight-judging competition, Mr. Gallon says that " the average competitor was probably as well fitted for making a just estimate of the dressed weight of the ox as an average voter is of judging the merits of most political issues on which he votes." These competitions are very popular in Cornwall ; but I do not think that Mr. Gallon at all realises how large a percentage of the voters-the great majority, I should suspect -are butchers, farmers, or men otherwise occupied with cattle. To these men the ability to estimate the meatequivalent weight of a living animal is an essential part of their business ; and, as an instance of their training, I may mention that one of the butchers here has a son under thirteen years of age who is an adept at this vArk, and is already, I am told, one of the best weight-judges in the district. This boy has been trained to it by his father, and already surpasses his instructor. Moreover, many of the competitors doubtlessly compete frequently, compare notes afterwards, and correct future estimates by past experience. Now the point of all this is that, in so far as this state of things prevails, we have to deal with, not a vox populi, but a vox expertorunt. - I am afraid -that the majority of such competitors know far more of their business, are far better trained, and are better fittedto form a judgment, than are the majority of voters of-any party, and of either the uneducated or the so-called " educated " classes. I heartily wish that the case were otherwise.
F. H. PERRI-COSTE.
Polperro, Cornwall, March 21.
I INFERRED that many non-experts were among the competitors, (1) because they were too numerous (about 800) to be mostly experts ; (2) because of the abnormally wide vagaries of judgment at either end of the scale ; (3) because of the prevalence of a sporting instinct, such as leads persons who, know little about horses to bet on races. But I have no facts whereby to test the truth of my inference. It would be of service in future competitions if a line headed " Occupation " were inserted 'in the cards, after those for the address.
MR. HOOKER, in NATURE of March 21, seems not to have quite appreciated my principal contention in the letters "One Vote, One Value" and " Vox Populi " of February 28 and March 7 respectively. It was to show that the verdict given by the ballot-box must he the Median estimate, because every other estimate is condemned in advance by a majority of the voters. This being the case, I examined the votes in a particular instance according to the most appropriate method for dealing with medians, quartiles, &c. I had no intention of trespassing into. the technical and much-discussed question of the relative merits of the Median and of the several kinds of Mean, and beg to be excused from not doing so now except in two particulars. First, that it may not be sufficiently realised that the suppression of any one value in a series can only make the difference of one half-place to the median, whereas if the series be small it may make a great difference to the mean ; consequently, I think my proposal that juries should openly adopt the median when estimating damages, and councils when estimating money grants, has independent merits of its own, besides being in strict accordance with the true theory of the ballot-box. Secondly, Mr. Hooker's approximate calculation from my scanty list of figures, of what the mean would be of all the figures, proves to be singularly correct ; he makes it 1196 lb. (which is the mean of the deviates at 5°, 15°, 95°), whereas it should have been 1197 lb. This shows well that a small orderly sample is as useful for calculating means as a very much larger random sample, and that the compactness of a table of centiles is no hindrance to their wider use. I regret to be unable -to learn the proportion of the competitors who were farmers, butchers, or non-experts. It would be well in future competitions to have a line on the cards for ` occupation." Certainly many non-experts competed, like those clerks and others who have no expert knowledge of horses, but who bet on races, guided by newspapers, friends, and their own fancies.
However it should be noted that in another letter a few weeks previous Galton had not exhibited much faith in the mean as a useful measure of collective judgement.
Nature, Volume 75, Issue 1948, pp. 414 (1907).
ONE VALUE, ONE VOTE
A CERTAIN class of problems do not as yet appear to be solved according to scientific rules, though they are of much importance and of frequent recurrence. Two examples will suffice. (1) A jury has to assess damages. 2) The council of a society has to fix on a sum of money, suitable for some particular purpose. Each voter, whether of the jury or of the council, has equal authority with each of his colleagues. How can the right conclusion be reached, considering that there may be as many different estimates as there are members? That conclusion is clearly not the average of all the estimates, which would give a voting power to "cranks'' in proportion to their crankiness. One absurdly large or small estimate would leave a greater impress on the result than one of reasonable amount, and the more an estimate diverges from the bulk of the rest, the more influence would it exert. I wish to point out that the estimate to which least objection can be raised is the middlemost estimate, the number of votes that it is too high being exactly balanced by the number of votes that it is too low. Every other estimate is condemned by a majority of voters as being either too high or too low, the middlemost alone escaping this condemnation. The number of voters may be odd or even. If odd, there is one middlemost value; thus in 11 votes the middlemost is the 6th; in 99 votes the middlemost is the 50th. If the number of voters be even, there are two middlemost values, the mean of which must be taken; thus in 12 votes the middlemost lies between the 6th and the 7th; in 100 votes between the 50th and the 51st. Generally, in 2n-1 votes the middlemost is the nth; in 2n votes it lies between the nth and the (n + 1)th.
I suggest that the process for a jury on their retirement should be (1) to discuss and interchange views ; (2) for each juryman to write his own independent estimate on a slip of paper ; (3) for the foreman to arrange the slips in order of the values written on them ; (4) to take the average of the 6th and 7th as the verdict, which might finally be approved by a substantive proposition. Similarly as regards the resolutions of councils, having regard to the above (2n -1) and 2n remarks.
A classic demonstration of group intelligence is the jelly-beans-in-the-jar experiment, in which invariably the group’s estimate is superior to the vast majority of the individual guesses. When finance professor Jack Treynor ran the experiment in his class with a jar that held 850 beans, the group estimate was 871. Only one of the fifty-six people in the class made a better guess.
Here is a short video demonstrating the experiment.
Another successful experiment.
“Men, it has been well said, think in herds. It will be seen that they go mad in herds while they only recover their senses one by one.”
Charles Mackay, 1841
Extraordinary Popular Delusions and The Madness of Crowds
“Anyone taken as a individual is tolerably and reasonable – as a member of a group he becomes a blockhead.”
“The mass never comes up to the standard of its best member, but on the contrary degrades itself to a level with the lowest”
Henry David Thoreau
“Madness is the exception in individuals but the rule in groups”
"I do not believe in the collective wisdom of individual ignorance.”
"In crowds it is stupidity, not mother wit that is accumulated.” Crowds "can never accomplish acts demanding a high degree of intelligence," and they are "always intellectually inferior to the isolated individual."
Gustave Le Bon
Not everyone has agreed with the sentiments of the above commentators. For example:
It is possible that the many, no one of whom taken singly may be a good man, may yet taken all together be better than the few, not individually but collectively … Each individual will be a worse judge than the experts, but when all work together, they are better, or at any rate no worse.
Aristotle, 4th Century BC
"Where our government has information on a known extremist and that information is not shared and acted upon as it should have, so that this extremist boarded a plane with dangerous explosives that could have cost nearly 300 lives, a systemic failure has occurred and I consider that totally unacceptable," Mr Obama said.
"We need to learn from this episode and act quickly to fix the flaws in our system because our security is at stake and lives are at stake."
Markets and trust.
Local field workers are at the front lines of development projects. They are the ones who are ultimately responsible for translating those great ideas - seeming panaceas for poverty - into meaningful impact.
Problem is, big time donors, who are wildly excited about the great-idea-of-the-moment, expect big time results from these field workers, not just by now, but by
Expectations like that cannot be met in the blink of an eye. This project may be about getting more money down to farmers, but it takes a lot of time and concerted effort to get money to flow in a system that is completely devoid of any semblance of trust.
Functional markets are built on trust. Think about it: You implicitly trust that you’ll get the perfect non-fat, extra-hot, half-sweet venti chai latte from the stranger behind the counter mere seconds after you order it (at least I do). The barista, in turn, trusts that you will front the cash before you indulge in your afternoon pick-me-up.
Small holder farmers, however, have never been able to trust seed suppliers to offer reliable products and services, and vegetable buyers have never been able to trust small holders to supply a reasonable quantity and quality of produce. There is zero
institutional trust.[L]ow income societies have less trust than rich societies….What is important is the radius of trust. Do you trust only the members of your immediate family? Or does the circle widen to include your extended family, or your clan, or your village, or your ethnic group, or all the way to strangers? In a low-trust society, you trust your friends and family, but nobody else. (William Easterly, White Man’s Burden)
A comment posted with the photo stated:
Queuing is never in China's vocabulary and cutting a queue is perceived as normal. You could encounter this act in almost anywhere such as restaurants, banks, toilets or ATMs. Once I experienced this in a super-mart. A customer shouted at the counter girl that since he bought only one small item he should be served first and it would be ridiculous for him to go to the end of the queue for paying. Surprisingly, the counter girl gave in.
The picture here depicted a scene where the Chinese are forced to queue up for purchasing the train tickets before their Chinese New Year. Noticed that they are so worried that people may cut into their queue they have to hug or arm-lock one another.
Here's a normal day getting onto public transport in China.
Here's how it looks on designated "Queuing Days"
Of course, the Chinese reaction to queuing, or lack thereof, can be viewed in a completely rational manner. Time spent in a queue is time wasted. It is likely that a society wastes a huge amount of resources when people stand in queues. This time could be spent more productively instead of waiting in a queue. Is their a more efficient way of organising queues as a coordination and allocation mechanism.
Steven Landsburg, the armchair economist, has a theoy on this. A foolproof method to shorten queues.
You spend too much time waiting in lines. "Too much" isn't some vague value judgment—it's a precise economic calculation. A good place in line is a valuable commodity, but it's not ordinarily traded in the marketplace. And this "missing market" inevitably produces inefficient outcomes.
Under the current rules, line formation suffers from economic inefficiencies because we enter lines without regard to the interests of later arrivals who queue behind us. How to make line formation more efficient? Change the rules so that new arrivals go to the front of the line instead of the back. Then the addition of a new person in line would impose no costs at all on those who come later. With that simple reform, lines would be a lot shorter. People who got pushed back beyond a certain point would give up and go home. (Well, actually they'd leave the line and try to re-enter as newcomers, but let's suppose for the moment that we can effectively prohibit that behavior.) On average, we'd spend less time waiting, and we'd be happier.
On Friday we looked at the model of Perfect Competition and how market particpant's behaviour is coordinated to bring about the most efficient allocation of resources even though no one is acting explicitly to achieve this. This view of Adam Smith's Invisible Hand was proven theoretically by Arrow and Debreu and shown to be true in an experimental setting by Vernon Smith. See chapter five of The Wisdom of Crowds for more on this. All of these people were awarded the Nobel Prize in Economics. The conditions necessary for a market to achieve the most efficient outcome are
- Perfect information
- All participants are perfectly rational
- Freedom of entry and exit
- No transaction costs
In these short clips from South Park we see how a lack of barriers to entry means that super-normal profits are eliminated in the long run.
First the boys come up with a great idea....
Then they realise that with freedom of entry, positive profits are only a short run phenomenon
This is an excellent primer of the sub-prime crisis using champagne flutes to explain how it all went wrong. He also mentions the role of ratings agencies which should be considered in terms of herding and independence (or lack thereof) as discussed in chapter three of The Wisdom of Crowds.
To learn how the pension fund manager from a Norwegian village got caught up in this mess watch the slide show on the subprime crisis (some of the language may not be to everyone's tastes):
FA Hayek would be delighted with this approach to using tacit or local knowledge. This piece from The Wall Street Journal takes a completely view and argues that there should be greater centralisation in the US education system.
But above all, we need to change the desperate centralization that exists in Irish Education. Our central command system, operated from Marlborough Street, prevents schools from deciding what's important to them. Why shouldn't a school decide to make excellence in science and maths or excellence in the Arts; a central part of its work, by having the freedom to hire expert graduates at whatever level of salary is required? We must give schools the autonomy to decide for themselves and reduce the over reliance on the centre.
This is Charlie Chaplin's inimitable take on the specialisation of labour from his 1933 film Modern Times, a classic
Part One: The Assemby Line
Part Two: The Feeding Machine and Chaos!
And is the equally classic quote from Adam Smith's The Wealth of Nations.
Note that Smith estimates somewhere between a 240 and 4800 fold increase in productivity by dividing the labour in this pin factory!
To take an example, therefore, from a very trifling manufacture; but one in which the division of labour has been very often taken notice of, the trade of the pin-maker; a workman not educated to this business (which the division of labour has rendered a distinct trade), nor acquainted with the use of the machinery employed in it (to the invention of which the same division of labour has probably given occasion), could scarce, perhaps, with his utmost industry, make one pin in a day, and certainly could not make twenty.
But in the way in which this business is now carried on, not only the whole work is a peculiar trade, but it is divided into a number of branches, of which the greater part are likewise peculiar trades. One man draws out the wire, another straights it, a third cuts it, a fourth points it, a fifth grinds it at the top for receiving the head; to make the head requires two or three distinct operations; to put it on, is a peculiar business, to whiten the pins is another; it is even a trade by itself to put them into the paper; and the important business of making a pin is, in this manner, divided into about eighteen distinct operations, which, in some manufactories, are all performed by distinct hands, though in others the same man will sometimes perform two or three of them.
I have seen a small manufactory of this kind where ten men only were employed, and where some of them consequently performed two or three distinct operations. But though they were very poor, and therefore but indifferently accommodated with the necessary machinery, they could, when they exerted themselves, make among them about twelve pounds of pins in a day. There are in a pound upwards of four thousand pins of a middling size.Those ten persons, therefore, could make among them upwards of forty-eight thousand pins in a day. Each person, therefore, making a tenth part of forty-eight thousand pins, might be considered as making four thousand eight hundred pins in a day.
But if they had all wrought separately and independently, and without any of them having been educated to this peculiar business, they certainly could not each of them have made twenty, perhaps not one pin in a day; that is, certainly, not the two hundred and fortieth, perhaps not the four thousand eight hundredth part of what they are at present capable of performing, in consequence of a proper division and combination of their different operations.