SAS Support

Part 1 (Proc Print, Proc Contents, Proc Means)

Using SAS Procedures

(commands=finan_descriptives.sas)

 This handout covers the use of SAS procedures to get simple descriptive statistics and to carry out a few basic statistical tests, using two data sets: the March data, and the Business data. The procedures introduced are:

 

  • Proc Print
  • Proc Contents
  • Proc Means
  • Proc Freq
  • Proc Boxplot
  • Proc Univariate
  • Proc Gplot

Check the SAS Procedures Guide or SAS online documentation for more information about these procedures.  

Creating the March Data Set:

Commands to read the raw data file, MARFLT.DAT, using a data step are shown below:

 data marflt;

  infile "marflt.dat";

  input    flight 1-3

           @4 date mmddyy6.

           @10 time time5.

           orig $ 15-17

           dest $ 18-20

           @21 miles comma5.

           mail 26-29

           freight 30-33

           boarded 34-36

           transfer 37-39

           nonrev 40-42

           deplane 43-45

           capacity 46-48;

format date mmddyy10. time time5. miles comma5.;

            label flight="Flight number"

                 orig  ="Origination City"

                 dest  ="Destination City";

run;

 Alternatively, you can import the Excel file, MARCH.XLS, by using the SAS Import Wizard, or by using Proc Import commands, as shown below:

 PROC IMPORT OUT= WORK.MARCH

            DATAFILE= "MARCH.XLS"

            DBMS=EXCEL REPLACE;

     SHEET="march$";

     GETNAMES=YES;

     MIXED=NO;

     SCANTEXT=YES;

     USEDATE=YES;

     SCANTIME=YES;

RUN;

Note: if you use the data step commands to read in the raw data, the variables will not have any labels, but if you import the data from Excel, SAS will give each variable a label that corresponds to the name of the variable on the first row of the Excel file.

 

Proc Print:

Proc Print can be used to view a SAS data set. Proc Print is named somewhat deceptively, because it does not actually send data to a printer, but simply lists the values of each variable in the output window. To get a listing of all cases and all variables in a data set, use the following syntax:

proc print;

run;

By default, Proc Print will list values for the most recently created SAS data set. However, to be more specific, you can tell SAS the data set that you wish to have printed by using the data = option in the proc print statement, as shown below.  This option is highly recommended.

 

proc print data = march;

run;

 

To list the first 10 observations in the data set, use the (obs= ) data set option, immediately following the data set name.

 

proc print data = march(obs=10);

run;

 The cases that are listed can be restricted by using combinations of the firstobs= and obs= data set options.  The firstobs= data set option tells SAS the first observation in the data set to process.  The obs= data set option tells SAS the last observation to process. To list observations 182 through 185, the following commands could be used.

 

proc print data = march(firstobs=182 obs=185);

run;

 

Obs FLIGHT      DATE DEPART ORIG DEST MILES MAIL FREIGHT BOARDED TRANSFER NONREV DEPLANE CAPACITY

 

182  921   09MAR1990 17:11  LGA  DFW   1383  284   150      88       6       6      99      180

183  302   09MAR1990 20:22  LGA  WAS    229  454   631     106      21       5     112      180

184  431   09MAR1990 18:50  LGA  LAX   2475  373   339     142      14       4     160      210

185  308   09MAR1990 21:06  LGA  ORD    740  371   408     135      15       3     147      210

 

The variables that are printed in proc print can be restricted by giving a variable list in a var statement after the proc print statement. Variables will be printed in the order they are listed, and the order need not follow the order of the variables in the data set. Some examples of listing variables are shown below:

 

proc print data=march;

  var date depart orig dest miles;

run;

 

proc print data=march;

  var date -- miles;

run;

 

To get a listing of the values in a data set with the variable labels (if any) displayed, use the label option:

 

proc print data = march label;

  var date -- miles;

run;

 

To get a listing of a data set without the observation numbers, use the noobs option:

 

proc print data = march label noobs;

   var date -- miles;

run;

 

Proc Contents:

 This procedure gives information on a SAS data set, including the name of the data set, the number of observations, the names of variables, the type of each variable (numeric-num or character-char), and any labels or formats that have been assigned to variables. By default, the variables are listed in alphabetic order. The position of each variable in the data set is listed in the # column of the output. If the data set has been sorted, information about the sorting variable(s) is also displayed.  A simple example of Proc Contents is shown in the example below.

 

proc contents data = march;

run;

                                     The CONTENTS Procedure

 

      Data Set Name        WORK.MARCH                             Observations          635

      Member Type          DATA                                   Variables             13

      Engine               V9                                     Indexes               0

      Created              Friday, August 18, 2006 06:03:28 PM    Observation Length    96

      Last Modified        Friday, August 18, 2006 06:03:28 PM    Deleted Observations  0

      Protection                                                  Compressed            NO

      Data Set Type                                               Sorted                NO

      Label

      Data Representation  WINDOWS_32

      Encoding             wlatin1  Western (Windows)

 

 

                                Engine/Host Dependent Information

 

Data Set Page Size          8192

Number of Data Set Pages    8

First Data Page             1

Max Obs per Page            84

Obs in First Data Page      61

Number of Data Set Repairs  0

File Name                   C:\DOCUME~1\kwelch\LOCALS~1\Temp\SAS Temporary

                            Files\_TD2508\march.sas7bdat

Release Created             9.0101M3

Host Created                XP_PRO

 

                           Alphabetic List of Variables and Attributes

 

                  #    Variable    Type    Len    Format       Label

 

                  9    boarded     Num       8

                 13    capacity    Num       8

                  2    date        Num       8    MMDDYY10.

                 12    deplane     Num       8

                  5    dest        Char      3                 Destination City

                  1    flight      Num       8                 Flight number

                  8    freight     Num       8

                  7    mail        Num       8

                  6    miles       Num       8    COMMA5.

                 11    nonrev      Num       8

                  4    orig        Char      3                 Origination City

                  3    time        Num       8    TIME5.

                 10    transfer    Num       8

 

If you wish to get a list of variables in numeric order, use the varnum option:

 

proc contents data = march varnum;

run;

 

These commands list the variables in the format shown below:

                              Variables in Creation Order

 

                  #    Variable    Type    Len    Format       Label

 

                  1    flight      Num       8                 Flight number

                  2    date        Num       8    MMDDYY10.

                  3    time        Num       8    TIME5.

                  4    orig        Char      3                 Origination City

                  5    dest        Char      3                 Destination City

                  6    miles       Num       8    COMMA5.

                  7    mail        Num       8

                  8    freight     Num       8

                  9    boarded     Num       8

                 10    transfer    Num       8

                 11    nonrev      Num       8

                 12    deplane     Num       8

                 13    capacity    Num       8

 

Proc Means:

This procedure generates simple descriptive statistics for numeric variables in a SAS data set.  The following syntax is the simplest version of Proc Means. By default it produces descriptive statistics for all numeric variables in the most recently created data set, in the order in which they were originally entered. The default statistics produced are the n, mean, standard deviation, minimum, and maximum.

 

proc means;

run;

 

Getting Descriptive Statistics for Selected Variables

 

SAS will give descriptive statistics for all numeric variables in the data set by default.  To get descriptive statistics for specific variables, list them, separated by blanks. SAS will display the variables in the order that you specify.

 

proc means data = march;

  var mail freight boarded transfer nonrev deplane;

run;

 

You can also use a variable list, as shown below:

 

proc means data = march;

  var mail -- deplane;

run;

                                The MEANS Procedure

 

         Variable      N            Mean         Std Dev         Minimum         Maximum

         -------------------------------------------------------------------------------

         mail        634     381.0031546      74.6288128     195.0000000     622.0000000

         freight     634     333.9511041      98.1122248      21.0000000     631.0000000

         boarded     633     132.3570300      43.4883098      13.0000000     241.0000000

         transfer    635      14.4062992       5.3362008               0      29.0000000

         nonrev      635       4.1133858       1.9243731               0       9.0000000

         deplane     635     146.7842520      45.4289656      18.0000000     250.0000000

         -------------------------------------------------------------------------------

 

Getting Descriptive Statistics for Groups of Cases Using the Class Statement:

 

Proc Means can produce statistics for subgroups of cases by using a CLASS statement.  The data do not need to be sorted to have this method work. SAS will produce one output table with separate statistics for each destination (DEST). Partial output is shown below the commands. 

 

proc means data = march;

    class dest;

run;

                                        The MEANS Procedure

 

 Destination     N

 City          Obs   Variable   Label             N           Mean        Std Dev        Minimum

 -----------------------------------------------------------------------------------------------

 CPH            27   flight     Flight number    27    387.0000000              0    387.0000000

                     date                        27       11031.93      9.2691628       11017.00

                     time                        27       42000.00              0       42000.00

                     miles                       27        3856.00              0        3856.00

                     mail                        26    401.3846154     78.6359088    271.0000000

                     freight                     27    331.3703704     96.0361103     71.0000000

                     boarded                     27    132.1851852     24.4383637     81.0000000

                     transfer                    27     14.1851852      5.2185839      5.0000000

                     nonrev                      27      4.0740741      1.8589891      1.0000000

                     deplane                     27    150.4444444     24.9728057    103.0000000

                     capacity                    27    250.0000000              0    250.0000000

 

 DFW            62   flight     Flight number    62    951.5000000     30.7489837    921.0000000

                     date                        62       11032.00      9.0172876       11017.00

                     time                        62       49770.00       12188.70       37680.00

                     miles                       62        1383.00              0        1383.00

                     mail                        62    370.3870968     84.2615668    195.0000000

                     freight                     62    338.0806452    101.1148184    132.0000000

                     boarded                     61    107.0491803     32.4491532     31.0000000

                     transfer                    62     14.8870968      5.4203770      5.0000000

                     nonrev                      62      4.2741935      1.9432047              0

                     deplane                     62    116.3225806     33.6587378     35.0000000

                     capacity                    62    180.0000000              0    180.0000000

 

 FRA            27   flight     Flight number    27    622.0000000              0    622.0000000

                     date                        27       11031.93      9.2691628       11017.00

                     time                        27       44340.00              0       44340.00

                     miles                       27        3857.00              0        3857.00

                     mail                        27    375.3703704     88.8447861    239.0000000

                     freight                     26    333.8076923     95.1146757    175.0000000

                     boarded                     27    178.1111111     27.4202807    110.0000000

                     transfer                    27     13.3333333      5.8572769              0

                     nonrev                      27      4.4814815      1.5284575      2.0000000

                     deplane                     27    195.9259259     28.1478595    135.0000000

                     capacity                    27    250.0000000              0    250.0000000

 

 LAX           123   flight     Flight number   123    464.2032520    271.7639393    114.0000000

                     date                       123       11032.07      8.9879152       11017.00

                     time                       123       46186.34       15015.08       25800.00

                     miles                      123        2475.00              0        2475.00

 -----------------------------------------------------------------------------------------------

 

You can use more than one variable in the class statement, as in the example below. SAS will produce one block of output for each date, and for each destination within a date. Be careful that you don’t produce too much output with this!

 

proc means data = march n mean min max;

    class date dest;

run;

 

Getting Additional Statistics from Proc Means:

Additional statistics can be requested by the use of keywords in the proc statement. The list below shows the statistics that can be requested from Proc Means.

 

N:                    Number of nonmissing cases.

NMISS:           Number of missing cases.

MEAN:           Sample mean.

MEDIAN:       50th percentile

                        Also available: P1, P5, P10, P25, P75, P90, P95,P99

STD:                Standard deviation

MIN:              Minimum value.

MAX:              Maximum value.

RANGE:         Range of values.

SUM:              Sum of all values.

VAR:              Variance.

USS:                Uncorrected Sum of Squares.

CSS:                Corrected Sum of Squares.

CV:                 Coefficient of variation.

STDERR:        Standard error of the mean.

T:                     student's t statistic for testing if the population mean

is equal to zero.

PRT:                The p-value of the t-statistic testing whether the

population mean is zero.

SUMWGT:     The sum of the weights.  If there are no sample weights,

then SUMWGT=N (the number of non-missing cases).

SKEWNESS: Skewness.

KURTOSIS:  Kurtosis.

CLM:              Two-sided confidence limit for the mean. 

95% CI is the default.

LCLM:            Lower one-sided confidence limit for the mean.

95% one-sided CI is the default.

UCLM:           Upper one-sided confidence limit for the mean. 

95% one-sided CI is the default.

 

Any number of statistics can be requested. You must list all statistics that are desired, because the defaults will no longer be in effect once you begin listing statistics to display. Here are some examples of using Proc Means, with selected statistics being requested:

 

proc means data = march n mean min max skewness kurtosis;

  var boarded transfer;

run;

 

The following commands will produce a 95% 2-sided confidence limit for the mean of the variables BOARDED and TRANSFER.

 

proc means data = march n mean clm;

  var boarded transfer;

run;

 

                                           The MEANS Procedure

 

                                                       Lower 95%       Upper 95%

                 Variable      N            Mean     CL for Mean     CL for Mean

                 ---------------------------------------------------------------

                 boarded     633     132.3570300     128.9627219     135.7513382

                 transfer    635      14.4062992      13.9904621      14.8221363

---------------------------------------------------------------

 

To produce a 99% 2-sided confidence limit use the alpha= option.

 

proc means data = march n mean clm alpha=.01;

  var boarded transfer;

run;

 

 

Part 2 (Proc Freq, Proc Boxplot)

Proc Freq:

This procedure produces frequency tables for either character or numeric variables, and can also produce cross-tabulations of two variables, as well as calculate many statistics for two-way tables.  Note: this procedure is most useful for categorical variables with not too many categories. In general it is not recommended that this procedure be used for continuous variables that can have many possible values, which may generate a great deal of output.

Oneway frequencies:

The example below shows how to produce oneway frequency tables. 

proc freq data = march;

   tables date dest;

run;

 

                                    The FREQ Procedure

                                                        Cumulative    Cumulative

                       date    Frequency     Percent     Frequency      Percent

                 ---------------------------------------------------------------

                 03/01/1990          21        3.31            21         3.31

                 03/02/1990          21        3.31            42         6.62

                 03/03/1990          21        3.31            63         9.94

                 03/04/1990          21        3.31            84        13.25

                 03/05/1990          20        3.15           104        16.40

                 03/06/1990          18        2.84           122        19.24

                   . . .

                 03/27/1990          18        2.84           550        86.75

                 03/28/1990          21        3.31           571        90.06

                 03/29/1990          21        3.31           592        93.38

                 03/30/1990          21        3.31           613        96.69

                 03/31/1990          21        3.31           634       100.00

 

                                      Frequency Missing = 1

 

                              

                                           Destination City

 

                                                     Cumulative    Cumulative

                    dest    Frequency     Percent     Frequency      Percent

                    ---------------------------------------------------------

                    CPH           27        4.26            27         4.26

                    DFW           62        9.78            89        14.04

                    FRA           27        4.26           116        18.30

                    LAX          123       19.40           239        37.70

                    LON           58        9.15           297        46.85

                    ORD           92       14.51           389        61.36

                    PAR           27        4.26           416        65.62

                    PRD            1        0.16           417        65.77

                    QAS            1        0.16           418        65.93

                    WAS          154       24.29           572        90.22

                    YYZ           62        9.78           634       100.00

 

                                      Frequency Missing = 1

 

Two-Way Cross-Tabulations:

Two-way frequency tables, or cross-tabulations, can also be generated by listing 2 variables with an asterisk (*) between them.  List the row variable first, followed by the column variable. To illustrate cross-tabulations, we use the SAS data set SASDATA2.BUSINESS. We first submit a libname statement to define the library where the data set is stored.

 

libname sasdata2 “c:\ emp\sasdata2”;

proc freq data = sasdata2.business;

   tables industry * nation;

run;

 

                             Table of INDUSTRY by NATION

 

                INDUSTRY(Industry)     NATION(Nationality)

 

                Frequency   |

                Percent     |

                Row Pct     |

                Col Pct     |Britain |France  |Germany |Japan   |U.S.    |  Total

                ------------+--------+--------+--------+--------+--------+

                Automobiles |      2 |      3 |      6 |     14 |      7 |     32

                            |   1.57 |   2.36 |   4.72 |  11.02 |   5.51 |  25.20

                            |   6.25 |   9.38 |  18.75 |  43.75 |  21.88 |

                            |  12.50 |  30.00 |  75.00 |  32.56 |  14.00 |

                ------------+--------+--------+--------+--------+--------+

                Electronics |      1 |      3 |      1 |     12 |     11 |     28

                            |   0.79 |   2.36 |   0.79 |   9.45 |   8.66 |  22.05

                            |   3.57 |  10.71 |   3.57 |  42.86 |  39.29 |

                            |   6.25 |  30.00 |  12.50 |  27.91 |  22.00 |

                ------------+--------+--------+--------+--------+--------+

                Food        |     11 |      2 |      0 |     11 |     19 |     43

                            |   8.66 |   1.57 |   0.00 |   8.66 |  14.96 |  33.86

                            |  25.58 |   4.65 |   0.00 |  25.58 |  44.19 |

                            |  68.75 |  20.00 |   0.00 |  25.58 |  38.00 |

                ------------+--------+--------+--------+--------+--------+

                Oil         |      2 |      2 |      1 |      6 |     13 |     24

                            |   1.57 |   1.57 |   0.79 |   4.72 |  10.24 |  18.90

                            |   8.33 |   8.33 |   4.17 |  25.00 |  54.17 |

                            |  12.50 |  20.00 |  12.50 |  13.95 |  26.00 |

                ------------+--------+--------+--------+--------+--------+

                Total             16       10        8       43       50      127

                               12.60     7.87     6.30    33.86    39.37   100.00

 

By default, Proc Freq produces a frequency table with the count (Frequency) in each cell, the total percent (Percent, which adds to 100% across all cells in the table), the row percent (Row Pct, which adds to 100% across a given row), and column percent (Col Pct, which adds to 100% down a given column). To omit any of these items, specify options in the tables statement, as shown below:

 

proc freq data = sasdata2.business;

   tables industry * nation/ norow nocol nopercent;

run;

                                Table of INDUSTRY by NATION

 

                INDUSTRY(Industry)     NATION(Nationality)

 

                Frequency   |Britain |France  |Germany |Japan   |U.S.    |  Total

                ------------+--------+--------+--------+--------+--------+

                Automobiles |      2 |      3 |      6 |     14 |      7 |     32

                ------------+--------+--------+--------+--------+--------+

                Electronics |      1 |      3 |      1 |     12 |     11 |     28

                ------------+--------+--------+--------+--------+--------+

                Food        |     11 |      2 |      0 |     11 |     19 |     43

                ------------+--------+--------+--------+--------+--------+

                Oil         |      2 |      2 |      1 |      6 |     13 |     24

                ------------+--------+--------+--------+--------+--------+

                Total             16       10        8       43       50      127

 

You can use the list option after a slash to get the output as a list, rather than in a table.

 

proc freq data = sasdata2.business;

   tables industry * nation/ list;

run;

 

                                       The FREQ Procedure

                                                              Cumulative    Cumulative

           INDUSTRY       NATION     Frequency     Percent     Frequency      Percent

           ---------------------------------------------------------------------------

           Automobiles    Britain           2        1.57             2         1.57

           Automobiles    France            3        2.36             5         3.94

           Automobiles    Germany           6        4.72            11         8.66

           Automobiles    Japan            14       11.02            25        19.69

           Automobiles    U.S.              7        5.51            32        25.20

           Electronics    Britain           1        0.79            33        25.98

           Electronics    France            3        2.36            36        28.35

           Electronics    Germany           1        0.79            37        29.13

           Electronics    Japan            12        9.45            49        38.58

           Electronics    U.S.             11        8.66            60        47.24

           Food           Britain          11        8.66            71        55.91

           Food           France            2        1.57            73        57.48

           Food           Japan            11        8.66            84        66.14

           Food           U.S.             19       14.96           103        81.10

           Oil            Britain           2        1.57           105        82.68

           Oil            France            2        1.57           107        84.25

           Oil            Germany           1        0.79           108        85.04

           Oil            Japan             6        4.72           114        89.76

           Oil            U.S.             13       10.24           127       100.00

 

Proc Boxplot:

This procedure produces side-by-side box and whisker plots for a continuous variable, displayed for each level of a categorical variable. The data set must first be sorted by the categorical variable. The syntax to produce a box plot is shown below. The plot statement first lists the continuous variable you wish to display, the second variable after the * is the categorical variable that will form the X-Axis categories.

 

proc sort data = sasdata2.business;

  by industry;

run;

proc boxplot data = sasdata2.business;

  plot sales * industry ;

run;

 By default, the characteristics of the box plot are as follows (modified from the SAS 9.1 documentation):

  • The length of the box represents the interquartile range (the distance between the 25th and the 75th percentiles).
  • The plus inside the box represents the mean of the continuous variable.
  • The horizontal line inside the box represents the median of the continuous variable.
  • The vertical lines at the top and bottom of the box extend to the minimum and maximum values of the continuous variable.

You can change the display, so that SAS shows outliers in the graph, by using the boxstyle=schematic option.

 

proc boxplot data = sasdata2.business;

  plot sales * industry /boxstyle=schematic;

run;

For further options for Proc Boxplot, see the SAS online documentation at:

http://support.sas.com/onlinedoc/913/docMainpage.jsp

Part 3 (Proc Univariate, Proc Gplot)

Proc Univariate:

This procedure is useful for getting in-depth numeric descriptions and graphical information on the distribution of a continuous numeric variable. Proc Univariate by default generates simple descriptive statistics, information on selected quantiles (e.g., the median, 5th, 25th , 75th,  and 95th percentiles), and one-sample tests of H0m=0, including a one-sample t-test, sign test and one-sample Wilcoxon signed-rank test. It can also produce simple text-based graphics, including a box-plot, a stem-and-leaf plot or histogram, and a normal q-q plot, and publication-quality graphics. Simple syntax to invoke Proc Univariate and the default output are shown below:

 

proc univariate data = march;

   var boarded;

run;

                                                               

                                    The UNIVARIATE Procedure

                                       Variable:  boarded

 

                                             Moments

 

                 N                         633    Sum Weights                633

                 Mean                132.35703    Sum Observations         83782

                 Std Deviation      43.4883098    Variance            1891.23309

                 Skewness            -0.171214    Kurtosis            -0.5806126

                 Uncorrected SS       12284396    Corrected SS        1195259.31

                 Coeff Variation     32.856819    Std Error Mean      1.72850513

 

 

                                   Basic Statistical Measures

 

                         Location                    Variability

 

                     Mean     132.3570     Std Deviation           43.48831

                     Median   136.0000     Variance                    1891

                     Mode      88.0000     Range                  228.00000

                                           Interquartile Range     66.00000

 

 

                                   Tests for Location: Mu0=0

 

                        Test           -Statistic-    -----p Value------

 

                        Student's t    t  76.57312    Pr > |t|    <.0001

                        Sign           M     316.5    Pr >= |M|   <.0001

                        Signed Rank    S  100330.5    Pr >= |S|   <.0001

 

 

                                    Quantiles (Definition 5)

                                     Quantile      Estimate

 

                                     100% Max           241

                                     99%                223

                                     95%                199

                                     90%                188

                                     75% Q3             165

                                     50% Median         136

                                     25% Q1              99

                                     10%                 75

                                     5%                  58

                                     1%                  34

                                     0% Min              13

 

 

                                      Extreme Observations

 

                              ----Lowest----        ----Highest---

 

                              Value      Obs        Value      Obs

                                 13       96          225      561

                                 14      633          229      231

                                 21      508          232       67

                                 25      448          232      339

                                 30       91          241      126

 

 

                                          Missing Values

 

                                                  -----Percent Of-----

                           Missing                             Missing

                             Value       Count     All Obs         Obs

 

                                 .           2        0.31      100.00

 

Proc Univariate displays the values of the five highest and five lowest cases by default. If you wish these values to be identified by the value of a particular variable, use the ID statement. Only the first 8 characters of an ID variable will be displayed in the output. Note, in the output below, all five of the lowest values of PULSE2 were for subjects with RAN=2 (Didn’t run), while the five highest values were for subjects with RAN=1 (Ran).

 

proc univariate data = march;

   var boarded;

   id dest;

run;

                                    Extreme Observations

                       --------Lowest-------        -------Highest-------

                       Value   dest      Obs        Value   dest      Obs

                          13   WAS        96          225   FRA       561

                          14   WAS       633          229   LON       231

                          21   WAS       508          232   LON        67

                          25   WAS       448          232   LON       339

                          30   WAS        91          241   LON       126

 

To get text-based graphics, including a box plot and histogram or stem and leaf plot, depending on the sample size (for smaller samples, SAS produces a stem and leaf plot, for larger samples, a histogram is produced), use the plot option. The histogram statement will cause SAS to produce a graphics-based histogram in the graph window. The qqplot statement will produce a normal qplot that can be used to compare the distribution of a variable to that of a normal distribution with the same mean and standard deviation (mu=est sigma=est). These commands will produce all the descriptive statistics shown above, plus text-based graphs in the output window, and high-quality graphs in the SAS/Graph window:

 

proc univariate data = march plot;

   histogram;

   qqplot / normal(mu=est sigma=est);

   var boarded;

run;

 

The text-based graphs are shown on this page, while the high-quality graphics are on the following page:

 

                                    The UNIVARIATE Procedure

                                       Variable:  boarded

 

                                  Histogram                #             Boxplot

                     245+*                                 1                |

                        .*                                 2                |

                        .**                                4                |

                        .***                               6                |

                        .*********                        17                |

                        .***************                  30                |

                        .****************                 32                |

                        .**********************           43                |

                        .***************************      54             +-----+

                        .*****************************    57             |     |

                        .**************************       51             |     |

                        .***********************          45             *--+--*

                        .********************             40             |     |

                        .**************************       52             |     |

                        .*******************              38             |     |

                        .*************************        49             +-----+

                        .****************                 31                |

                        .***************                  29                |

                        .**********                       19                |

                        .********                         16                |

                        .****                              7                |

                        .***                               6                |

                        .*                                 2                |

                      15+*                                 2                |

                         ----+----+----+----+----+----

                         * may represent up to 2 counts

 

                              Normal Probability Plot

                       245+                                                  *

                          |                                                ++*

                          |                                              ++***

                          |                                           +++***

                          |                                         +****

                          |                                      ****

                          |                                    ***

                          |                                 ***

                          |                              ****

                          |                            ***+

                          |                          ***+

                          |                        ***

                          |                      ***

                          |                    ***

                          |                  ***

                          |                ***

                          |             +***

                          |           ****

                          |         ***

                          |      ****

                          |    **+

                          | ***

                          |*+

                        15+*

                           +----+----+----+----+----+----+----+----+----+----+

                               -2        -1         0        +1        +2

 

 

 

Proc Univariate can also be used with a class statement to produce descriptive statistics for numeric variables across levels of a categorical variable. The following syntax shows how to get information for the variable BOARDED, for each destination:

 

proc univariate data = march plot;

   class dest;

   histogram; 

   qqplot / normal(mu=est sigma=est);

   var boarded;

run;

 

Proc Gplot:

 

This procedure is used to produce publication-quality bivariate scatter plots. One or more plot statements are given to tell SAS the two variables that are to be plotted. Give the goptions statement prior to running the plots to set up the output for your printer. The option target=winprtm indicates a monochrome printer, use target = winprtg for a grayscale printer and target=winprtc for a color printer. Use the quit statement to stop Proc Gplot.

 

goptions reset=all;

goptions device=win target=winprtm;

proc gplot data = sasdata2.business;

   plot sales*employs;

run; quit;

 

 

To produce a plot using given plotting symbols, use symbol statements:

 

symbol1 color=black value = dot;

proc gplot data = sasdata2.business;

   plot sales*employs;

run; quit;

To get different plotting symbols for different groups of cases, use syntax similar to that shown below:

 

goptions reset=all;

goptions device=win target=winprtm;

proc gplot data = sasdata2.business;

   plot sales*employs = industry;

run; quit;

 

 

 

Regression lines can be added to the plot by using symbol statements, with interpol=rl for a linear regression interpolation.

 

goptions reset=all;

goptions device=win target=winprtm;

symbol1 color=black value=star interpol=rl line=1;

symbol2 color=black value=dot interpol=rl line=2;

symbol3 color=black value=circle interpol=rl line=3;

symbol4 color=black value=triangle interpol=rl line=4;

 

proc gplot data = sasdata2.business;

   plot sales*employs = industry;

run; quit;

 

Selected Books on SAS

Administrative Healthcare Data: A Guide to Its Origin, Content, and Application Using SAS®  

Craig Dickstein, Renu Gehring

Epub ISBN# 978-1-62959-381-4

Mobi ISBN# 978-1-62959-382-1

PDF ISBN# 978-1-62959-380-7

Hardcopy ISBN# 978-1-61290-886-1

Pages 250

http://www.sas.com/store/prodBK_66981_en.html

 

The Little SAS® Book: A Primer, Fifth Edition

Lora D. Delwiche, Susan J. Slaughter

Epub ISBN# 978-1-61290-400-9

Mobi ISBN# 978-1-61290-945-5

PDF ISBN# 978-1-62959-013-4

Hardcopy ISBN# 978-1-61290-343-9

Pages 376

Learning SAS® by Example: A Programmer's Guide

Ron Cody

Epub ISBN# 978-1-59994-426-5

Mobi ISBN# 978-1-61290-946-2

PDF ISBN# 978-1-62959-014-1

Hardcopy ISBN# 978-1-59994-165-3

Pages 664

 

Step-by-Step Programming with Base SAS® 9.4

Publisher: SAS Institute

Copyright Date: July 2013

The PDF file of this book can be found at:

http://support.sas.com/documentation/cdl/en/basess/64003/HTML/default/viewer.htm#titlepage.htm

Longitudinal Data and SAS®: A Programmer's Guide

Ron Cody

Epub ISBN# 978-1-62959-249-7

Mobi ISBN# 978-1-62959-248-0

PDF ISBN# 978-1-62959-247-3

Hardcopy ISBN# 978-1-58025-924-8

Pages 208

http://www.sas.com/store/books/categories/usage-and-reference/longitudinal-data-and-sas-a-programmer-s-guide/prodBK_58176_en.html

 
Marriott Library Eccles Library Quinney Law Library