### Books

• Mathematics and Humour. 1980
• Innumeracy: mathematical illiteracy and its consequences. 1988
• Beyond Numeracy: an uncommon dictionary of mathematics. 1991
• A Mathematician Reads the Newspaper. 1995
• Once Upon a Number. 1998

## Books : reviews

### John Allen Paulos.Once Upon a Number: the hidden mathematical logic of stories. Penguin. 1998

rating : 3.5 : worth reading
review : 11 January 2010

Paulos here investigates the differences, and similarities, between everyday and mathematical reasoning. His main investigation in on statistical arguments, with smaller forays into logic and complexity. He contrasts the anecdotes, stories, myths that we use in everyday life, and the mathematical counterparts that have grown and been refined from them. His main thrust is that these give us two connected but different ways of understanding the world, and that we should be alert to these differences. It is important that we don't go about:

p5. mistaking anecdotes for statistical evidence or ... taking averages to be descriptive of individual cases

It's not that one way is right, and one is wrong. They offer two possible ways of understanding the world.

p26. Stories and statistics offer us complementary choices of knowing a lot about a few people or knowing a little about many people.

Each has its problems, and each its advantages. For example, anecdotal reasoning can lead us to make incorrect generalisations, because the very richness of the story that makes it so compelling gives us so much information that there are accidental correlations.

pp26-27. If the number of traits considered is large compared to the number of people being surveyed, there will appear to be more of a relationship among the traits than actually obtains. ...
You can find perfect correlations that mean nothing for any N people and N characteristics. ...
Just as stories are sometimes a corrective to the excessive abstraction of statistics, statistics are sometimes a corrective to the misleading richness of stories.

The mathematical approach can help us see when these errors might occur. He illustrates this point with several standard statistical "paradoxes", none the worse for being repeated. One very good point he makes, in the case of "digging the dirt" on celebrities and others, is that we should weigh the evidence presented against the effort taken to produce that evidence:

p55. the ratio of the amount of dirt unearthed to the time and resources spent digging for it (or for something that can pass for dirt). ... I don't think I have a particularly disreputable group of friends and acquaintances, but few could withstand a 30-million-dollar investigation into their private lives.

He has lots of good little anecdotes (deliberately used to be memorable!), although occasionally they don't give enough background or explanation. (Maybe he didn't want to clutter the main text with too much detail, but that's what appendices are for.) For example, he gives an example of a paradox about the odds of poker hands with wildcards, but doesn't explain the details:

p85. The less probable the hand, the higher its rank. Three of a kind is less probable (and hence of a higher rank) than two pair, ... the introduction of wild cards and the discretion that they allow players can jumble the order of the various possible hands.
With two wild cards it becomes more likely that you will be dealt three of a kind than two pair. (Any pair combined with a wild card is three of a kind.) Since in this situation you are less likely to obtain two pair, such a hand should beat three of a kind. Suppose you change the rules and declare this by fiat, so that players choosing between two pair and three of a kind will now choose two pair. Under these altered rules it again becomes more likely that they will be dealt two pair rather than three of a kind.

On the face of it, this looks weird. How can changing the rules change the probability of the hands? What I think is going on is that, given a set of rules, you decide which of two (or more) possible hands you have, given wildcards, and choose the "better" hand. If the better hand is three of a kind, you will choose your wildcards to make three of a kind even if they could also make two pairs, and three of a kind becomes more likely under this choice. Conversely, if two of a kind is deemed to be the better hand, then, dealt exactly the same hand, you would choose to use any wildcards to make two of a kind, which then, because of this different choice, becomes the more likely hand. (But I may be wrong, since I don't know poker!) The point, however, is that, given choices (wildcards), things become complicated and difficult to order, and that life is full of such choices, and that the choices we make can affect the value of what we have chosen.

There is another place where he doesn't explain everything that's needed. That's in the section on extensional versus intensional definitions. He explains what an extensional definition is ...

p87. Standard scientific and mathematical logic is termed extensional since objects and sets are determined by their extensions (i.e., by their members). That is, entities are the same if they have the same members, even if they are referred to differently. In everyday intensional ... logic, this isn't so. Entities that are equal in extensional logic can't always be interchanged in intensional logic.

... but not what an intensional definition is. The difference is one is by membership {Alice, Bob, Eve}, and one is by property {everyone over 5'6" tall in this room}. It might so happen that two definitions happen to refer to the same members (Alice, Bob, Eve happen to be the only people of the specified height in the room at the moment), but that could change. One common step in mathematical proof to "substitute equals for equals": if I see "2+2", I can substitute "4". But Paulos points out that we cannot substitute an extensional definition for an intensional one (or vice versa), because, what if Alice leaves the room? The important point is that we use intensional definitions all the time, and can easily find ourselves in difficulties, silliness, or even tragedy, if the "equals for equals" rule is followed blindly. The point is that mathematics is being used to model the real world, and the fit isn't perfect, so we need to think about what we are doing:

p92. any statistical study on a structured entity---a game, a welfare system, marriages, a historical era---is likely to be fatally flawed if it fails to take the structure into account, say by mindlessly substituting extensionally equivalent entities for one another within the study.

Converting well-understood English statements into mathematics is non-trivial for a wide range of reasons.

pp100-101. ... interpretation sometimes depending on verb tense, for example. The following two arguments are not equivalent, despite having the same form.

A cat needs water to survive.
Therefore my cat Puffin needs water to survive.

A dog is barking in the backyard.
Therefore my dog Ginger is barking in the backyard.

He then makes a very interesting suggestion. Recently, "situational logic" has tried to take context into account. (I've been reading a bit about this lately. The Liar, for example, uses it to analyse certain logical paradoxes. I've also recently read Vicious Circles, again by Barwise, for reasons that can be found in my review. I read Once Upon a Number in order to have a break, and read something a bit different. So I was surprised partway through when Paulos says that Barwise was his thesis advisor! Fortunately I'm already inured to these kind of coincidences, so I didn't mind that, coincidentally, Paulos also has an explanation of how this sort of thing happens all the time.) Anyway, Paulos suggests that there needs to be a similar situational approach to other areas of mathematical modelling:

p105. Just as situation semantics attempts to accommodate more of the richness of everyday understanding in an extended formal logical calculus, so a "situation statistics" should be developed that builds in some of the checks on wayward probabilities that commonsense narrative suggests.

There is also a section on complexity. In another intriguing suggestion, Paulos speculates about the underlying mathematical cause of self-organisation in Kauffman's Random Boolean Networks:

p165. What happens, however, is that one observes "order for free"---more or less stable cycles of light configurations, different ones for different initial conditions. As far as I know, the result is only empirical, but I suspect it may be a consequence of a Ramsey-type theorem too difficult to prove.

He also gives a solution to the "problem of induction" that I hadn't come across before (the usual problem is that justifying "the future will be like the past" by using the fact that, in the past, the future has been like the past, is circular). I rather like it, since it fits in with ideas of complexity and emergence rather well:

p167. Charles Saunders Peirce and Hans Reichenbach advanced a different pragmatic justification of induction. It amounts roughly to this: Maybe induction does not work, but if anything does, induction will. Maybe there is no persisting order in the universe, but if there is any on any level of abstraction, induction will eventually find it (or its absence) on the next higher level.

This idea that induction works across levels is interesting. Levels are important in complexity and emergence (an emergent property usually being described at a higher level that the properties it emerges from). I'd never thought of stream-of-consciousness literature as an example of emergence:

p170. the stream-of-consciousness novels of James Joyce and Virginia Woolf in the early part of twentieth century can be seen as the beginning of an attempt to discern pattern on one level by simply describing the most mundane actions and thoughts on a lower level.

He then goes on to describe a measure of complexity.

p171. Zurek defined physical entropy to be the sum of Claude Shannon's information content, measuring the improbability or surprise inherent in a yet to be fully revealed entity, and Chaitin's complexity, measuring the algorithmic information content of what's already been revealed.

His speculation that follows is fascinating:

pp171-172. Imagine two readers encountering a new short story or novel. One is a very sophisticated litteratuer, while the other is quite naive. For the first reader the story holds few surprises, its twists and tropes being old hat. The second reader, however, is amazed at the plot, the characters, the verbal artistry. The question is, How much information is in the story? ... The first reader's mind already has encoded within it large portions of the story's complexity; the second reader's mind is relatively unencumbered by such complexity. The Shannon information content of the story---its improbability or the surprise it engenders---is thus less for the first reader whose mind's complexity is, in this regard, greater, whereas the opposite holds for the second reader. As they read the story both readers' judgments of improbability or surprise dwindle, albeit at different rates, and their minds' complexity rises, again, differentially. The sum of these two---the physical entropy---remains constant and is a measure of the information content of the mind-story system.

The reason it is fascinating (to me, at least!) is because of problems I have with some definitions of emergence, that require the observer to experience "surprise" at the emergent property. So, the second time the same thing happens, it's no longer emergent, then? But taking this definition into account, the second time it happens, the complexity of the observer has changed. So maybe there's something more going on here. Hmm.

He finishes off with a nice little example that not all equally probable things are equally probable (depending on how we look at them).

pp182-183. I was told that in some states hand-picked lottery numbers won more often than did machine-generated numbers. ... the claim is not necessarily nonsense. In fact, it nicely illustrates one way in which personal wishes can sometimes seem to affect large, impersonal phenomena.
Consider the following simplified lottery. In a comically small town, the mayor draws a number from a fishbowl every Saturday night. Balls numbered from 1 to 10 are in the bowl, and only two townspeople bet each week. George picks a number between 1 and 10 at random. Martha, on the other hand, always picks 9, her lucky number. Although George and Martha are equally likely to win, the number 9 will win more frequently than will any other number. The reason is that two conditions must be met for a number to win: it must be drawn on Saturday night by the mayor and it must be chosen by one of the participants. Since Martha always picks 9, the second condition in her case is always met, so whenever the mayor draws a 9 from the bowl, 9 wins. This is not the case with, say, 4. The mayor may draw a 4 from the bowl, but chances are George will not have picked a 4 so it will not often be a winning number. George and Martha have equal chances of winning, and each ball has an equal chance of being chosen by the mayor, but not all numbers have the same chance of winning.

So, all in all, some great food for thought.