Experience Reading SAS Portable Format Datasets
Experience Reading SAS Portable Format DatasetsDecember 22, 1993
Revised September 2, 1994
Revised (Vendor info only) April 14, 1997
There are two main kinds of SAS portable format datasets. These are CPORT and XPORT. Both are popularly referred to as Transport datasets, but they are quite different, and completely incompatible.
The CPORT datasets have the advantage that they can contain a wide variety of SAS objects, not just datasets. However, they have no backwards portability at all. Indeed, I am informed by SAS tech support that even lateral portability is not to be taken for granted. The earliest version of SAS for Unix to be able to read CPORT files generated under MVS 6.07 is 6.07.3 - not the more widespread release 6.07.2.
The true portable format is the XPORT format. Supposedly this has full forwards and backwards compatibility (but see below). Only datasets can be transferred with XPORT, not catalogs, formats, etc. This is the format you would use to send data to another site.
Using SAS to Read the Portable FormatThe first positive step towards reading a SAS portable file is identification. Files obtained from other users, or from far away, are seldom what they are described to be. This is especially true with SAS portable files, since the procedures to produce them vary across SAS versions and platforms, and procedures such as PROC COPY and PROC XCOPY can produce either kind of file, depending on circumstances. So, take the file and using some program capable of displaying the first few bytes of a non-line structured file, find the contents of the first 16 bytes.
type first 16 bytes system 'SAS 6.03 ' XPORT 'HEADER RECORD***' CPORT 'LIB CONTROL PC D' CPORT '**COMPRESSED** *'I have included the binary file for completeness (note the version number is version dependent), but remember that there is some portability among the Unix platforms, at least I find that SUN and HP Unix SAS system files are interchangeable.
If the file is one of the two CPORT structures, look down a few bytes to find the SAS version number (it will be in ASCII). If your version of SAS is that or later, try reading the file with PROC CIMPORT:
libname save 'directoryname'; proc cimport library=save infile='stuff.portable';Here and below, assume the putative portable file is called stuff.portable.
If the file is type XPORT, try the following:
libname trans xport 'stuff.portable'; proc contents data=trans._all_;This will give you the SAS dataset names in the listing file. With that information you can read the SAS datasets with an ordinary data step:
libname trans xport 'stuff.portable'; libname save 'directoryname'; data save.name1; set trans.sasfilename;With SAS/PC use 'sasv5xpt' instead of 'xport' in the libname statement. Of course a lot can go wrong. Some of it is easy to diagnose from the hex dump. For example, if the dataset has undergone a character conversion, or had bytes swapped, that will stand out immediately. If line ending have been inserted or deleted or modified, that isn't so easy to detect, but look for a suspicious 0D 0A in the first 100 bytes. Trailing null characters will add zeros to the data. I don't know what a cntl-z at the end would do. Except for byte swapping, there is no recovery from any error other than to return to the original platform and recreate the export file.
If the source is an IBM mainframe there are still more subtle error possibilities. According to the documentation, unless files are written with a record format of FB, a record length of 80 and a blocksize of 8000, they will not be readable. Less plausibly, the documentation asserts that XPORT datasets must reside on unlabeled tapes (how would SAS know?) and that the Unix dd command can be used to reblock disk datasets to conform to this requirement. (Disk files under Unix don't have block sizes, and most tape formats under Unix have fixed 512 byte blocks). Nevertheless, it seems clear that SAS depends on properties of the dataset beyond the actual string of bytes, although how or why remains unclear.
SAS technical support has prepared a 9 page article with the title ``An Overview of Transporting SAS Files between Hosts'' by David Shinn and David Driggs. This is a bit more forthcoming than the 167 pages of SAS Technical Report P-195, ``Transporting SAS Files between Host Systems'' which is offered as the official guide, but Shinn and Driggs really only cover the old mainframe versions (MVS, CMS, VMS). Their discussion of the PC and Unix SAS is the source of the most dubious advice in the last paragraph.
In some environments it turns out to be hard to keep binary datasets from being mangled. The sas listserve has many postings from MVS users about an MVS ftp program that unblocks data, even if binary transfer mode is used. The same listserv has postings from VMS users who were unable to export valid transport datasets, because VMS was adding carriage control as it copied the file. Starting with SAS Release 6.08, there is a SAS option CC=NONE that can be specified on the libname statement when the transport file is created. This will indicate to VMS that subsequent copies do not need carriage control added.
Doing without SASThere are three programs I know of for converting statistical and database files among formats. DBMS/COPY, STAT/Transfer and Data Junction.
STAT/Transfer version 2.0 is straightforward enough, it reads and writes XPORT format (although STAT/Transfer calls it ``SAS Transport format'') only. In brief testing, the files were readable, and SAS could read the files STAT/Transfer wrote. This is a very easy to use package, both from the command line and through the menus, but given a CPORT file, it attempted to read the whole file before croaking. A nice feature of the program is variable selection (which is speeded up in the most recent version of the program).
DBMS/COPY version 3 offers four possible SAS portable formats. Two of these, called ``SAS Transport V6'' and ``SAS Transport V6 Compressed'' are in fact CPORT format, although any CPORT files I wrote with SAS version 6.04 went unrecognized by this package. This is presumably a backwards compatibility problem. The remaining formats, called ``SAS Xport Transport Engine'' and ``SAS Transport V5'' are actual XPORT formats, and did read the XPORT files I used for testing. I was able to read all four files produced by DBMS/COPY with the appropriate language in SAS 6.07.2.
This package covered more formats than STAT/Transfer, and included a separate module (dbmsplus) that did variable and record selection, as well as new variable generation. Unfortunately DBMS/Copy suffered from numerous user interface defects. In particular, the menus never ask for a name for the output dataset, but the transformation process doesn't start until the F2 key is pressed and a name is supplied. Also, I find it incredibly annoying that the package does not look for files in the current directory. I did not test the Unix version.
Data Junction version 4.2 runs on PCs and various flavors of Unix, and claims to be able to convert into and out of XPORT format. I tested only the Unix version, but the user interface was so clumsy I was unable to conduct a test, or even exit from the program. This prevented the next person from using it, as the license was not returned when I killed it.
Out of this bunch, STAT/Transfer is the program you want to buy.
Vendors Mentioned:Circle Systems (Stat/Transfer $145/$95 academic)
1001 Fourth Ave Suite 3200
Seattle WA 98154
Conceptual Software (DBMS/Copy)
Tools and Techniques (Data Junction)