Fundamentals of Programming in SAS. James Blum

Читать онлайн книгу.

Fundamentals of Programming in SAS - James Blum


Скачать книгу
observations, mean, standard deviation, minimum, and maximum.

      Output 2.4.1: Default Statistics and Behavior for PROC MEANS

VariableNMeanStd DevMinimumMaximum
SERIALCOUNTYFIPSMETROCITYPOPMortgagePaymentHHIncomeHomeValue1159062115906211590621159062115906211590621159062621592.2442.20629012.52453542916.66500.204263463679.842793526.49359865.4178.95432851.308530212316.27737.988559266295.974294777.182.00000000000-29997.005000.001245246.00810.00000004.000000079561.007900.001739770.009999999.00

      SAS differentiates variable types as numeric and character only; therefore, variables stored as numeric that are not quantitative are summarized even if those summaries do not make sense. Here, the Serial, CountyFIPS, and Metro variables are stored as numbers, but means and standard deviations are of no utility on these since they are nominal. It is, of course, important to understand the true role and level of measurement (for instance, nominal versus ratio) for the variables in the data set being analyzed.

      To select the variables for analysis, the MEANS procedure includes the VAR statement. Any variables listed in the VAR statement must be numeric, but should also be appropriate for quantitative summary statistics. As in the previous example, the summary for each variable is listed in its own row in the output table. (If only one variable is provided, it is named in the header above the table instead of in the first column.) Program 2.4.2 modifies Program 2.4.1 to summarize only the truly quantitative variables from BookData.IPUMS2005Basic, with the results shown in Output 2.4.2.

      Program 2.4.2: Selecting Analysis Variables Using the VAR Statement in MEANS

      proc means data=BookData.IPUMS2005Basic;

      var Citypop MortgagePayment HHIncome HomeValue;

      run;

      Output 2.4.2: Selecting Analysis Variables Using the VAR Statement in MEANS

VariableNMeanStd DevMinimumMaximum
CITYPOPMortgagePaymentHHIncomeHomeValue11590621159062115906211590622916.66500.204263463679.842793526.4912316.27737.988559266295.974294777.1800-29997.005000.0079561.007900.001739770.009999999.00

      The default summary statistics for PROC MEANS can be modified by including statistic keywords as options in the PROC MEANS statement. Several statistics are available, with the available set listed in the SAS Documentation, and any subset of those may be used. The listed order of the keywords corresponds to the order of the statistic columns in the table, and those replace the default statistic set. One common set of statistics is the five-number summary (minimum, first quartile, median, third quartile, and maximum), and Program 2.4.3 provides a way to generate these statistics for the four variables summarized in the previous example.

      Program 2.4.3: Setting the Statistics to the Five-Number Summary in MEANS

      proc means data=BookData.IPUMS2005Basic min q1 median q3 max;

      var Citypop MortgagePayment HHIncome HomeValue;

      run;

      Output 2.4.3: Setting the Statistics to the Five-Number Summary in MEANS

VariableMinimumLower QuartileMedianUpper QuartileMaximum
CITYPOPMortgagePaymentHHIncomeHomeValue00-29997.005000.000024000.00112500.000047200.00225000.000830.000000080900.009999999.0079561.007900.001739770.009999999.00

      Confidence limits for the mean are included in the keyword set, both as a pair with the CLM keyword, and separately with LCLM and UCLM. The default confidence level is 95%, but is changeable by setting the error rate using the ALPHA= option. Consider Program 2.4.4, which constructs the 99% confidence intervals for the means, with the estimated mean between the lower and upper limits.

      Program 2.4.4: Using the ALPHA= Option to Modify Confidence Levels

      proc means data=BookData.IPUMS2005Basic lclm mean uclm alpha=0.01;

      var Citypop MortgagePayment HHIncome HomeValue;

      run;

      Output 2.4.4: Using the ALPHA= Option to Modify Confidence Levels

VariableLower 99%CL for MeanMeanUpper 99%CL for Mean
CITYPOPMortgagePaymentHHIncomeHomeValue2887.19498.438574963521.222783250.942916.66500.204263463679.842793526.492946.12501.969952063838.462803802.04

      There are also options for controlling the column display; rounding can be controlled by the MAXDEC= option (maximum number of decimal places). Program 2.4.5 modifies the previous example to report the statistics to a single decimal place.

      Program 2.4.5: Using MAXDEC= to Control Precision of Results

      proc means data=BookData.IPUMS2005Basic lclm mean uclm alpha=0.01 maxdec=1;

      var Citypop MortgagePayment HHIncome HomeValue;

      run;

      Output 2.4.5: Using MAXDEC= to Control Precision of Results

VariableLower 99%CL for MeanMeanUpper 99%CL for Mean
CITYPOPMortgagePaymentHHIncomeHomeValue2887.2498.463521.22783250.92916.7500.263679.82793526.52946.1502.063838.52803802.0

      MAXDEC= is limited in that it sets the precision for all columns. Also, no direct formatting of the statistics is available. The REPORT procedure, introduced in Chapter 4 and discussed in detail in Chapters 6 and 7, provides much more control over the displayed table at the cost of increased complexity of the syntax.

      In several instances, it is desirable to split an analysis across a set of categories and, if those categories are defined by a variable in the data set, PROC MEANS can separate those analyses using a CLASS statement. The CLASS statement accepts either numeric or character variables; however, the role assigned to class variables by SAS is special. Any variable included in the CLASS statement (regardless of type) is taken as categorical, which results in each distinct value of the variable corresponding to a unique category. Therefore, variables used in the CLASS statement should provide useful groupings or, as shown in Section 2.5, be formatted into a set of desired groups. Two examples follow, the first (Program 2.4.6) providing an illustration of a reasonable class variable, the second (Program 2.4.7) showing a poor choice.

      Program 2.4.6: Setting a Class Variable in PROC MEANS

      proc means data=BookData.IPUMS2005Basic;

      class MortgageStatus;

      var HHIncome;

      run;

      Output 2.4.6: Setting a Class Variable in PROC MEANS

Analysis Variable : HHIncome
MortgageStatusN ObsNMeanStd DevMinimumMaximum
N/A30334230334237180.5939475.13-19998.001070000.00
No, owned free and clear30034930034953569.0863690.40-22298.001739770.00
Yes, contract to purchase9756975651068.5046069.11-7599.00834000.00
Yes, mortgaged/ deed of trust or similar debt54561554561584203.7072997.92-29997.001407000.00

      In this data, MortgageStatus provides a clear set of distinct categories and is potentially useful for subsetting the summarization of the data. In Program 2.4.7, Serial is used as an extreme example of a poor choice since Serial is unique to each household.

      Program 2.4.7: A Poor Choice for a Class Variable

      proc means data=BookData.IPUMS2005Basic;

      class Serial;

      var HHIncome;

      run;

      Output 2.4.7: A Poor Choice for a Class Variable (Partial Table Shown)

Analysis
Скачать книгу