SAS Statistics by Example. Ron Cody, EdD
Читать онлайн книгу.alt="Image441.png"/>
2. Select Import Data.
3. Choose Microsoft Excel.
4. Select the Excel workbook that you want to convert.
5. Name the SAS data set (SampleData in this example).
6. Click Finish (at the bottom right of the window) to complete the conversion.
Naming conventions for SAS data sets are the same as for SAS variable names. The names must be 32 characters or less in length, they must start with a letter or an underscore, and the remaining characters must be letters, numbers, or underscores.
Now that you have converted your Excel workbook into a temporary SAS data set, you can list the observations in the data set and inspect the descriptor portion of the data set. SAS provides you with several ways to do this.
One way to see a listing of the data in a SAS data set is to use a SAS procedure called PROC PRINT. The following program demonstrates how to use PROC PRINT to list the observations in the SampleData data set:
Program 1.1: Using PROC PRINT to List the Observations in a SAS Data Set
proc print data=SampleData; run; |
Amazingly enough, this is a complete SAS program. Notice that each statement in this two-line SAS program ends in a semicolon. When you write SAS programs, you can use as many lines as you want to write a statement; you can even put more than one statement on a line (though this is not recommended for stylistic reasons). The semicolon is the logical end of a SAS statement. You are free to add extra spaces on a line or place extra blank lines in your program to make it more readable.
To run this program from Display Manager, click the Submit icon:
Here is the output you get from running Program 1.1:
At the top of the three right-most columns, you see the SAS variable names—the same names that were stored in the first row of your workbook. The first column, labeled Obs (short for Observations), was generated by SAS and shows the observation number.
Each row of the listing represents a row from the workbook.
Next, let’s see how to display the data descriptor portion of this data set. Program 1.2 is one way to do this:
Program 1.2: Using PROC CONTENTS to Display the Data Descriptor Portion of a SAS Data Set
title “Displaying the Descriptor Portion of a SAS Data Set”; proc contents data=SampleData; run; |
Notice that I have added a TITLE statement to this program. With a TITLE statement, you can enter a title that will print across the top of every page of output. TITLE statements are in a class of SAS statements known as GLOBAL statements. The title that you enter stays in effect for the remainder of your SAS session, unless you replace it with another TITLE statement. To remove all titles from your output, submit a null title statement like this:
title;
When you submit Program 1.2, you will see the following output:
The first two lines of output show that the data set name is SAMPLEDATA. (The full name is WORK.SAMPLEDATA. The prefix WORK. tells SAS that this is a temporary SAS data set.) Also shown in these lines are the number of observations (5) and the number of variables (3). Let’s skip down to the portion of the output labeled Alphabetic List of Variables and Attributes. Here you see that the variables Age and ID are stored as numeric types and Gender is stored as a character type.
Variable Types in SAS Data Sets
SAS has only two variable types: numeric and character. By default, all numeric values are stored in 8 bytes, allowing for approximately 15 significant figures, depending on your operating system. Character values are stored 1 byte per character and can be from 1 to 32,767 bytes in length.
Temporary versus Permanent SAS Data Sets
SAS data sets can be either temporary or permanent. A temporary SAS data set is one that exists for the duration of your SAS session but is not saved when you exit SAS. Permanent SAS data sets, as the name implies, remain when you exit SAS and can be accessed in future SAS sessions. The Import Wizard example discussed previously used the Work library. Choosing the Work library caused the SAS data set SAMPLEDATA to be a temporary data set.
SAS data set identifiers are divided into two parts, separated by a period. The part before the period is called a library reference (libref for short) and identifies the folder where SAS has stored the data set. The part following the period is the data set name. Both parts of this identifier must satisfy the naming conventions mentioned earlier.
For example, if your data set is called SURVEY and is stored in a library called MYDATA, SAS uses the following notation to identify the file:
mydata.survey
If you wanted to put this file on your disk drive in the C:\MYSASFILES folder, you would write a statement called a LIBNAME statement that associates the c:\sasfiles folder with the MYDATA library reference, like this:
libname mydata “’c:\mysasfiles”’;
Creating a SAS Data Set from Raw Data
If you have your data in a text file, SAS can read the text file and create a SAS data set. The text file can contain either data values separated by delimiters or data values in fixed columns.
Data Values Separated by Delimiters
SAS can read data values from a text file in which each value is separated from the next value by a delimiter. By default, SAS expects one or more spaces between data values. However, it is easy to specify other delimiters, such as commas. Let’s start by reading a small text file in which spaces are used as delimiters. Here’s a listing of this file:
Raw Data with Blanks as Delimiters: File c:\books\Statistics by Example\delim.txt
1 23 M 2 33 F 3 18 F 4 45 M 5 41 M 6 . F |
In this file, the three data values on each line represent an ID number, Age, and Gender, respectively. Before you write a SAS program to read this text file, notice that ID = 6 has a missing value for her age. Because you have delimited data, you need a way to specify that the Age value is missing for that subject. When you have blanks as delimiters, you can use a period to specify that you have a missing value. In the next example, which uses a CSV file, you do not need to use periods for missing values.
Program 1.3 will read this text file and create a SAS data set called Sample2:
Program 1.3: Reading Data from a Text File That Uses Spaces as Delimiters
data Sample2; infile “’c:\books\statistics by example\delim.txt”’; length Gender $ 1; input ID Age Gender $; run; |
The INFILE statement tells SAS