Tips for SAS Users

Tips for SAS Users

 

Last updated: September 1999

 

The following are some good habits when running SAS jobs. These will help you minimize the amount of space you require for your temporary files and for your permanent SAS data sets.

It also includes specific commands you can use to:

*Work with tapes in SAS (tips #9 and #10).

 *Delete your SAS working dirs. Without going to /tmp (tip #11).

*Work with files bigger than 2 GB (tip #12).

*Increase the amount of memory that SAS allocates to the prog. by default (tip #13).

 

1) Use the LENGTH statement in your data step to set the default length of variables or to define the length of specific variables.

 To define a default length, the line to add would be:

length default=3;

 Example:

 data new2;

length default=3;

set lib.medpar84;

 

 

To define the length of specific variables you will list these in a

length statement:

 data new2;

length id 3. todo 3. gam5 3.;

 

For binary variables length=3 is appropriate.

For variables that take on relatively limited values, length=5 is appropriate.

The longest length you will need with numeric variables is 6.

 

You can also change the length of variables defined in previous data sets to

make them shorter using the LENGTH statement.

 Example:

data new2;

length id 3. state 3. county 3.;

set lib1.medicare88;

 

2) Delete a temporary file once you're done with it.

To do so use the DATASETS procedure with the DELETE statement.

Example:

 

data info851;

set info85;

if aicadp91=1 and aicadp88=0;

 

[SAS procedures here]

 

proc datasets nolist;

delete info851;

 

3) Use KEEP or DROP when defining new data sets to keep only the variables

you are interested in.

Example:

data hosp88;

set aha.newaha88;

year=1988;

mprovno=pps;

keep mprovno bsc88 cicbed88 cicbd288 micbed88 micbd288 occ88 medres88

cclab88 ohsurg88 ftmres88 hospbd88 rankbd88 year;

 

4) Compress observations in an output SAS data set using the COMPRESS option.

Example:

data new (compress=yes);

set lib.medicare85;

 

You can also use this as a system option and it will be applied automatically to all datasets:

options compress=yes [other SAS system options here];

 

 

5) Limit the size of the data set you use to run preliminary tests of your program using the OBS= option.

 

Example:

data sub85;

set lib.all85 (obs=1000);

proc glm data=sub85;

model icd90=female black state1-stat49;

 

 

Again, you can also use this as a system option:

 

options obs=1000 [other SAS system options here];

 

You can remove it once the program is running smoothly.

 

6)Use space in your own directory to write temporary datasets.

To do so, add the option -work /path when submitting your SAS job.

Example:

 

sas -work /home/veronica prog1.sas &

 

This command will run the SAS job in prog1.sas in the background.

It will also write the temporary files in the directory home/veronica.

(Do NOT specify someone else's dir...)

 

7)Give a temporary dataset a permanent name and then delete it later in the program. This will save some space in the /tmp directory.

Example:

 

data lib.reg85;

set lib.medicare85;

 

[SAS procedures here]

 

proc datasets lib=test nolist;

delete reg85;

 

8) If you want to be sent an email as soon as your program completes, try:

 

sas program.sas ; echo program.sas is done | mail youruserid

 

This can be useful to let you know that your tape is done and you can throw in the next one, start your next job, etc. This way you won't have to regularly check the drive's status.

 If you only want an email if your program bugs out, try:

 

sas program.sas || echo your program didn't work | mail youruserid

 

Of course, write anything you want to appear in your email after the echo.

 

9) When using tapes with SAS, always use the TAPE engine in the libname specification.

Example:

libname anyname tape '/dev/rmt/0mn';

data new;

set anyname.data8801;

 

10) If you need to write more than one data file to a tape, use the following SAS data step option:

fileclose=leave.

 

example:

libname tape1 tape '/dev/rmt/0mn';

data tape1.data87 (fileclose=leave);

set new.data87;

 

data tape1.data88 (fileclose=leave);

set new.data88;

 

11) If you need to kill a SAS job, after issuing the

kill -9 job#

command, use the following command to delete the SAS work directory related to this job:

 

cleanwork /tmp

 

=> you don't need to go to the /tmp directory to locate and remove your SAS

working directory.

 

Hint:

For this to work, add the following to your setenv PATH statement in your .cshrc file:

/usr/local/src/sas612/utilities/bin

(if you put this at the end of setenv PATH statement, don't forget the ":"

before this path).

 

12) SAS 6.12 ON SOLARIS 2.6

When working with a data set bigger than 2 GB, you do not need to modify your code for this. Solaris 2.6 and SAS 6.12 can handle files larger than 2 GB. However, if you are running SAS earlier than 6.12 or Solaris 2.51 or earlier see (12b).

 

 

12b) If you are running into problems when trying to write a data set bigger than 2 GB, you can use partitioned libraries to overcome this problem. Use the following syntax:

  LIBNAME anyname 'path' TYPE=PARTITION PARTSIZE=2G;

 

Example 1:

 LIBNAME bigdata '/dirA' TYPE=PARTITION PARTSIZE=2G;

 

 This will create a partitioned library named bigdata.

The path dirA is used as the path for all partitions.

SAS will crate the file system in that directory in more than 1 partition.

 

You can omit TYPE=PARTITION when including PARTSIZE because partition is the default.

 

For files in the working dir to be able to get larger than 2 GB, use the WORK option when submitting your job.

Example:

sas -WORK \(/tmp \) myprog.sas &

The "\" are not typos. You should include them in order for SAS to create

the SAS working dir. in /tmp AND to allow files to get bigger than 2 GB.

 

Example 2:

 LIBNAME example1 ('/part3','/part4') partsize=1.8g;

 

creates a partitioned library named example1 that includes the directories /part3 and /part4.

 The SAS System treats these two directories as a single library with 3.6G of disk space. The SAS System assumes TYPE=PARTITION.

 

 Notes

1) If you try to write more data than will fit onto the disk(s) where the partitions are located, SAS treats this situation as a normal disk full situtation.

 

2) How SAS Names the Partitions

The SAS System creates one partition for each path you specify. The first partition resides in the first path specified and has a normal SAS dataset name (such as x.ssd01). Subsequent partitions reside in subsequent paths and have names in the form x.ssd01.1

where the trailing number increases by one with each partition.

 

 Just to show you how the partsize option works, here's a sample prog.:

 

options ls=72;

libname test '/tmp' partsize=10M;

libname old '/data19/va/veronica/VA_WAREHOUS/BMAD8590';

data test.test91;

set old.new91;

 

/*The next procedure accesses the "big" data set*/

proc contents data=test.test91;

proc freq data=test.test91;

tables year;

run;

 

*****************************************

The original data set is:

-rw-r--r-- 1 veronica veronica 21094400 Mar 17 15:33 new91.ssd01

 

 

After using the partsize option in the libname statement I get:

 

-rw-r--r-- 1 veronica veronica 10485760 Mar 18 15:13 test91.ssd01

-rw-r--r-- 1 veronica veronica 10485760 Mar 18 15:13 test91.ssd01.1

-rw-r--r-- 1 veronica veronica 139264 Mar 18 15:13 test91.ssd01.2

 

 => you don't have to list the separate files in your program when accessing the data set the "big" data set. Just use the "original" name of the file, eventhough it is now divided into more than 1 file.

 

 The proc contents of the "big" file will list the expansion files as follows:

 

The SAS System 1

15:16 Tuesday, March 18, 1997

 

CONTENTS PROCEDURE

 

Data Set Name: TEST.TEST91 Observations: 285651

Member Type: DATA Variables: 10

Engine: V612 Indexes: 0

Created: 15:16 Tuesday, Mar 18, 1997 Observation Length: 73

Last Modified: 15:17 Tuesday, Mar 18, 1997 Deleted Observations: 0

Protection: Compressed: NO

Data Set Type: Sorted: NO

Label:

 

-----Engine/Host Dependent Information-----

 

Data Set Page Size: 8192

Number of Data Set Pages: 2574

File Format: 607

First Data Page: 1

Max Obs per Page: 111

Obs in First Data Page: 90

File Name: /tmp/test91.ssd01

Inode Number: 13

Access Permission: rw-r--r--

Owner Name: veronica

File Size (bytes): 21110784

Expansion Files: /tmp/test91.ssd01.1, /tmp/test91.ssd01.2

 

-----Alphabetic List of Variables and Attributes-----

 

# Variable Type Len Pos Label

---------------------------------------------------------------------

3 B Num 8 19

7 BALLOWD Num 8 41 61 Allowed charges

8 BEXPDT1 Num 8 49 67 Date 1st Expense

5 BHCPCS Char 5 28 57 HCFA Common Procedure Code

4 BPROCIN Char 1 27 51 Processing Indicator Code

9 BREIMB Num 8 57 75 Claim payment amount

6 BSUBCHG Num 8 33 60 Billed charges

1 HICBIC Char 11 0 Health insurace claim number

2 SINDXDTE Num 8 11

10 YEAR Num 8 65

^L The SAS System 2

15:16 Tuesday, March 18, 1997

 

Cumulative Cumulative

YEAR Frequency Percent Frequency Percent

--------------------------------------------------

85 15488 5.4 15488 5.4

86 21721 7.6 37209 13.0

87 26874 9.4 64083 22.4

88 32852 11.5 96935 33.9

89 35238 12.3 132173 46.3

90 48912 17.1 181085 63.4

91 104566 36.6 285651 100.0

 

 

13) To increase the amount of RAM available for a SAS program, use the -memsize 0

option when submitting your job.

Example:

sas -memsize 0 myprog.sas &

 ****************************************************************

For questions, send e-mail to veronica@newage3.stanford.edu