Statistical Analysis with Excel For Dummies. Joseph Schmuller
Читать онлайн книгу.heads-tails split in the data is consistent with a fair coin. Think of it as the idea that nothing in the results of the study is out of the ordinary.
An alternative hypothesis is possible: The coin isn’t a fair one, and it's loaded to produce an unequal number of heads and tails. This hypothesis says that any heads-tails split is consistent with an unfair coin. The alternative hypothesis is called, believe it or not, the alternative hypothesis. The statistical notation for the alternative hypothesis is H1.
With the hypotheses in place, toss the coin 100 times and note the number of heads and tails. If the results are something like 90 heads and 10 tails, it's a good idea to reject H0. If the results are around 50 heads and 50 tails, don't reject H0. Similar ideas apply to the reading speed example I give earlier, in the section “Samples and populations.” One sample of children receives reading instruction under a new method designed to increase reading speed, and the other learns via a traditional method. Measure the children's reading speeds before and after instruction and tabulate the improvement for each child. The null hypothesis, H0, is that one method isn't different from the other. If the improvements are greater with the new method than with the traditional method — so much greater that it's unlikely that the methods aren't different from one another — reject H0. If they're not greater, don't reject H0.
Notice that I did not say “accept H0.” The way the logic works, you never accept a hypothesis. You either reject H0 or don't reject H0.
Here’s a real-world example to help you understand this idea. Whenever a defendant goes on trial, that person is presumed innocent until proven guilty. Think of innocent as H0. The prosecutor’s job is to convince the jury to reject H0. If the jurors reject, the verdict is guilty. If they don’t reject, the verdict is not guilty. The verdict is never innocent. That would be like accepting H0.
Back to the coin tossing example. Remember I said “around 50 heads and 50 tails” is what you could expect from 100 tosses of a fair coin. What does around mean? Also, I said if it’s 90-10, reject H0. What about 85-15? 80-20? 70-30? Exactly how much different from 50-50 does the split have to be for you to reject H0? In the reading speed example, how much greater does the improvement have to be to reject H0?
I don't answer these questions now. Statisticians have formulated decision rules for situations like this, and you explore those rules throughout the book.
Two types of error
Whenever you evaluate the data from a study and decide to reject H0 or to not reject H0, you can never be absolutely sure. You never really know what the true state of the world is. In the context of the coin tossing example, that means you never know for certain if the coin is fair or not. All you can do is make a decision based on the sample data you gather. If you want to be certain about the coin, you'd have to have the data for the entire population of tosses — which means you'd have to keep tossing the coin until the end of time.
Because you’re never certain about your decisions, it’s possible to make an error regardless of what you decide. As I mention earlier in this chapter, the coin could be fair and you just happen to get 99 heads in 100 tosses. That’s not likely, and that’s why you reject H0. It’s also possible that the coin is biased, yet you just happen to toss 50 heads in 100 tosses. Again, that’s not likely and you don’t reject H0 in that case.
Although not likely, those errors are possible. They lurk in every study that involves inferential statistics. Statisticians have named them Type I and Type II.
If you reject H0 and you shouldn't, that's a Type I error. In the coin example, that's rejecting the hypothesis that the coin is fair, when in reality it’s a fair coin.
If you don't reject H0 and you should have, that's a Type II error. That happens if you don't reject the hypothesis that the coin is fair and in reality it's biased.
How do you know if you've made either type of error? You don't — at least not right after you make your decision to reject or not reject H0. (If it's possible to know, you wouldn't make the error in the first place!) All you can do is gather more data and see if the additional data are consistent with your decision.
If you think of H0 as a tendency to maintain the status quo and not interpret anything as being out of the ordinary (no matter how it looks), a Type II error means you missed out on something big. Looked at in that way, Type II errors form the basis of many historical ironies.
Here’s what I mean: In the 1950s, a particular TV show gave talented young entertainers a few minutes to perform on stage and a chance to compete for a prize. The audience voted to determine the winner. The producers held auditions around the country to find people for the show. Many years after the show went off the air, the producer was interviewed. The interviewer asked him if he had ever turned down anyone at an audition whom he shouldn’t have.
“Well,” said the producer, “once a young singer auditioned for us and he seemed really odd.”
“In what way?” asked the interviewer.
“In a couple of ways,” said the producer. “He sang really loud, gyrated his body and his legs when he played the guitar, and he had these long sideburns. We figured this kid would never make it in show business, so we thanked him for showing up, but we sent him on his way.”
“Wait a minute — are you telling me you turned down …?”
“That's right. We actually said no … to Elvis Presley!”
Now that's a Type II error.
Some Excel Fundamentals
A chapter on data evaluation might seem an odd place to talk about Excel fundamentals. This section and the next one help you get started with the statistical work that begins in Chapter 2 and continues throughout the book.
Figure 1-2 shows the Excel user interface in Windows 10. The tabbed band across the top is called the Ribbon (as it is on the Mac and the iPad).
FIGURE 1-2: The Excel interface in Windows.
Microsoft has developed shorthand for describing a mouse-click on a command button that lives on a tab on the Ribbon, and I use that shorthand throughout this book. The shorthand is
Tab | Command Button
To indicate clicking on the Insert tab’s Recommended Charts category button, for example, I write
Insert | Recommended Charts
When I click that button (with some data-containing cells selected), the Insert Chart dialog box, shown in Figure 1-3, appears.
FIGURE 1-3: Clicking Insert | Recommended Charts opens this box.
Notice that its Recommended Charts tab is open. Clicking the All Charts tab