Practical Data Analysis with JMP, Third Edition. Robert Carver
Читать онлайн книгу.2s, but when displayed in the data table, a 1 appears as the word Male and a 2 as the word Female. This recoding is a column property within JMP.
The asterisk next to RIAGENDR in the Columns panel indicates that this column has one or more special column properties defined.
14. At the bottom of the dialog box, clear the check box marked Use Value Labels and click Apply. Now, look in the Data Table (NHANES 2016 tab) and see that the column displays 1s and 2s rather than the value labels.
15. Click the word Notes under Column Properties (middle left of the dialog box). You will see a brief description of the contents of this column. As you work with data tables and create your own columns, you should get into the habit of annotating columns with informative notes.
In this table, we also encounter missing observations once again. Missing data is a common issue in survey data and it crops up whenever a respondent does not complete a question or an interviewer does not record a response. In a JMP data table, a black dot (·) indicates missing numeric data; missing character data is just a blank. As a general rule, we want to think carefully about missing observations and why they occur in a particular set of data.
For instance, look at row 1 of this data table. We have missing observations for several variables, including the columns representing respondent’s age in months (RIDAGEMN) and pregnancy status (RIDEXPRG). Further inspection shows that respondent #83732 was a 62-year-old male.
Creating a Data Table
In this book, we will almost always analyze the data tables that are available at https://support.sas.com/en/books/authors/robert-carver.html. Once you start to practice data analysis on your own, you will often need to create your own data tables. Refer to Appendix B (“Data Management”) for details about entering data from a keyboard, transferring data from the web, reading in an Excel worksheet, or assembling a data table from several other data tables.
Raw Case Data and Summary Data
This book is accompanied by more than 50 data tables, most of which contain “casewise” data, meaning that each row of the data table refers to one observational unit. For example, each row of the NHANES table represents a person. Each row of the Concrete table is one batch of concrete at a moment in time. Like most statistical software, JMP is intended to analyze raw data. However, sometimes data come to us in a partially processed or summarized state. It is still possible to construct a data table and do some limited analysis of this type of data.
For example, consider public opinion surveys that have been reported in the news. Yale University and George Mason University, collaborating in the Yale Program on Climate Change Communication, published “Politics & Global Warming, April 2019.”3 The research team used “a nationally representative survey (N = 1,291 registered voters)” (Leiserowitz et al., p. 4). Among other issues, they asked respondents if they think that global warming is happening. Among all voters, 70% expressed the view that global warming is real. The researchers broke down the total sample, as shown in Table 2.1 (created from the report’s Executive Summary).
Table 2.1: Sample of U.S. Voters’ Belief that Global Warming Is Happening
Voter Group | Percentage agreeing |
Liberal Democrats | 95 % |
Moderate/Conservative Democrats | 87 |
Liberal/Moderate Republicans | 63 |
Conservative Republicans | 38 |
Tables like this are common in news reports. We could easily transfer this into a data table with one major caveat that often confuses introductory students. It is crucial to understand how the layout of this table relates to its content. In JMP, we usually expect each row to represent one observational unit, each column to represent one variable, and each cell to contain one data value. This table does not satisfy these assumptions.
To see why this is not a raw data table, we should go back and think about how the raw data were generated. We know that respondents to this question—the observational units—were the 1,291 registered voters in the survey sample. The interviewers asked them many questions, and Table 2.1 tallies the responses to two of the questions. First, people reported their voting habits. The second variable is the responses those people gave when asked the question, “Do you think that global warming is happening?” Respondents could agree, disagree, or offer no opinion. Because their responses were categorical rather than numeric, this is a nominal variable.
So, how does this clarify the contents of Table 2.1? The first column lists four voter groups, ordered from most liberal to most conservative; the cell entries are unambiguously categorical. The second column is a little tricky. It appears to be continuous data, looking very much like numbers. However, these numbers are not measurements of observational units (people). They summarize some of the answers provided by the respondents, indicating the fraction of each voter group expressing agreement to the global warming question. More precisely, the values are relative frequencies of each “level” of the ordinal voter group variable. In short, this table represents one ordinal and one nominal variable, summarizing the responses of nearly 1,300 individuals who are “invisible” when the information is presented this way. In later chapters, we will learn how to use columns of frequencies. For now, our introduction to data types and sources is complete.
Application
Now that you have completed all the activities in this chapter, use the concepts and techniques that you have learned to respond to these questions.
1. Use your primary textbook or the daily newspaper to locate a table that summarizes some data. Read the article accompanying the table and identify the variable, data type, and observational units.
2. Return to the Concrete data table. Browse through the column notes, and explain what variables these columns represent: Cement, SPgate, and FineAgg.
3. In the NHANES 2016 data table, one nominal variable appears as continuous data within the Columns panel. Find the misclassified variable and correct it by changing its modeling type to nominal.
4. The NHANES 2016 data table was assembled by scientific researchers. Why don’t we consider this data table to be experimental data?
5. Find the data table called Military. We will use this table in later chapters. This table contains rank, gender, and race information about more than a million U.S. military personnel. Use a technique presented in this chapter to create a data table containing a random sample of 500 individuals from this data table.
6. In Chapter 3, we will work with a data table called NIKKEI225. Open this table and examine the columns and metadata contained within the table. Write a short paragraph describing the contents of the table, explaining how the data were collected (experiment, observation, or survey), and define each variable and its modeling type.
7. In later chapters, we will work with a data table called Earthquakes. Open this table and examine the columns and metadata contained within the table. Write a short paragraph describing the contents of the table, explain how the data were collected (experiment, observation, or survey), and define each variable and its modeling type.
8. In later chapters, we will work with a data table called Tobacco Use. Open this table and examine the columns and metadata contained within the table. Write a short paragraph describing the contents of the table, explain how the data were collected (experiment, observation, or survey), and define each variable and its modeling type.
9. Open the Dolphins data table, which we will work with in a later chapter. What are the variable(s), observational units, and data types represented in this table?
10. Open the data table TimeUse, which we will analyze more fully in later chapters. Write a few sentences to explain the contents of the columns named marst, empstat, sleeping, and telff.
11. Open the States data table, which contains statistics about the 50 U.S. states and the District of Columbia. Write a short paragraph describing the contents of the table and, in particular, define the columns called poverty, fed_spend2010, and homicide.
Endnotes