Fundamentals of Programming in SAS. James Blum

Читать онлайн книгу.

Fundamentals of Programming in SAS - James Blum


Скачать книгу
of HomeMETROSouth CarolinaN$0$137,5003South CarolinaN$0$95,0004South CarolinaN$0$45,0003North CarolinaN$0$162,5000North CarolinaN$0$45,0001North CarolinaN$0$5,0001

      Most SAS procedures, including PROC PRINT, can take advantage of BY-group processing for data that is sorted into groups. The procedure must use a BY statement that corresponds to the sorting in the data set. If the data is sorted using PROC SORT, the BY statement in a subsequent procedure does not have to completely match the BY statement in PROC SORT; however, it must match the first level of sorting if only one variable is included, the first two levels if two variables are included, and so forth. It must also match ordering, ascending or descending, on each included variable. Program 2.3.6 groups output from the PRINT procedure based on BY grouping constructed with PROC SORT.

      Program 2.3.6: BY-Group Processing in PROC PRINT

      proc sort data=bookdata.ipums2005mini out= work.sorted;

      by MortgageStatus State descending HomeValue;

      run;

      proc print data= work.sorted noobs label;

      by MortgageStatus State;

      var MortgagePayment HomeValue Metro;

      label HomeValue=’Value of Home’ state=’State’;

      format HomeValue MortgagePayment dollar9. MortgageStatus $9.;

      run;

       The original data is sorted first on MortgageStatus, then on State, and finally in descending order of HomeValue for each combination of MortgageStatus and State.

       PROC PRINT uses a BY statement matching on the MortgageStatus and State variables, which groups the output into sections based on each unique combination of values for these two variables, with the final sorting on HomeValue appearing in each table. Note that a BY statement with only MortgageStatus can be used as well, but a BY statement with only State cannot—the data is not sorted on State primarily.

      Output 2.3.6: BY-Group Processing in PROC PRINT (First 2 of 6 Groups Shown)

      MortgageStatus=No, owned State=North Carolina

MortgagePaymentValue of HomeMETRO
$0$162,5000
$0$45,0001
$0$5,0001

      MortgageStatus=No, owned State=South Carolina

MortgagePaymentValue of HomeMETRO
$0$137,5003
$0$95,0004
$0$45,0003

      The structure of BY groups in PROC PRINT can be altered slightly through use of an ID statement, as shown in Program 2.3.7. Assuming the variables listed in the ID statement match those in the BY statement, BY-group variables are placed as the left-most columns of each table, rather than between tables.

      Program 2.3.7: Using BY and ID Statements Together in PROC PRINT

      proc print data= work.sorted noobs label;

      by MortgageStatus State;

      id MortgageStatus State;

      var MortgagePayment HomeValue Metro;

      label HomeValue=’Value of Home’ state=’State’;

      format HomeValue MortgagePayment dollar9. MortgageStatus $9.;

      run;

      Output 2.3.7: Using BY and ID Statements Together in PROC PRINT (First 2 of 6 Groups Shown)

MortgageStatusStateMortgagePaymentValue of HomeMETRO
No, ownedNorth Carolina$0$162,5000
$0$45,0001
$0$5,0001

MortgageStatusStateMortgagePaymentValue of HomeMETRO
No, ownedSouth Carolina$0$137,5003
$0$95,0004
$0$45,0003

      PROC PRINT is limited in its ability to do computations. (Later in this text, the REPORT procedure is used to create various summary tables.); however, it can do sums of numeric variables with the SUM statement, as shown in Program 2.3.8.

      Program 2.3.8: Using the SUM Statement in PROC PRINT

      proc print data= work.sorted noobs label;

      by MortgageStatus State;

      id MortgageStatus State;

      var MortgagePayment HomeValue Metro;

      sum MortgagePayment HomeValue;

      label HomeValue=’Value of Home’ state=’State’;

      format HomeValue MortgagePayment dollar9. MortgageStatus $9.;

      run;

      Output 2.3.8: Using the SUM Statement in PROC PRINT (Last of 6 Groups Shown)

MortgageStatusStateMortgagePaymentValue of HomeMETRO
Yes, mortSouth Carolina$360$75,0004
$500$65,0003
$200$32,5004
Yes, mortSouth Carolina$1,060$172,500
Yes, mort$2,200$315,000
$4,230$1200000

      Sums are produced at the end of each BY group (and the SUMBY statement is available to modify this behavior), and at the end of the full table. Note that the format applied to the HomeValue column is not sufficient to display the grand total with the dollar sign and comma. If a format is of insufficient width, SAS removes what it determines to be the least important characters. However, it is considered good programming practice to determine the minimum format width needed for all values a format is applied to. If the format does not include sufficient width to display the value with full precision, then SAS may adjust the included format to a different format. See Chapter Note 3 in Section 2.12 for further discussion on format widths.

      Producing tables of statistics like those shown for the case study in Outputs 2.2.1 and 2.2.2 uses MEANS procedure. This section covers the fundamentals of PROC MEANS, including how to select variables for analysis, choosing statistics, and separating analyses across categories.

      To begin, make sure the BookData library is assigned as done in Chapter 1, submit PROC CONTENTS on the IPUMS2005Basic SAS data set from the BookData library, and review the output. Also, to ensure familiarity with the data, open the data set for viewing or run the PRINT procedure to direct it to an output table. Once these steps are complete, enter and submit the code given in Program 2.4.1.

      Program 2.4.1: Default Statistics and Behavior for PROC MEANS

      options nolabel;

      proc means data=BookData.IPUMS2005Basic;

      run;

      For variables that have labels, PROC MEANS includes them as a column in the output table; using NOLABEL in the OPTIONS statement suppresses their use. Here DATA= is technically an option; however, the default data set in any SAS session is the last data set created. If no data sets have been created during the session, which is the most likely scenario currently, PROC MEANS does not have a data set to process unless this option is provided. Beyond having a data set to work with, no other options or statements are required for PROC MEANS to compile and execute successfully. In this case, the default behavior, as shown in Output 2.4.1, is to summarize all numeric variables on a set of five statistics: number of nonmissing


Скачать книгу