Delwiche, Lora D., and Susan J. Slaughter. The Little SAS Book: A Primer, 5th Edition [Chapter 2.5-2.7]
Reading Raw Data Arranged in Columns
Use column input if each of the variable’s values is found in the same place in the data line, and all the values are character or standard numeric, which contains only numerals, decimal points, plus and minus signs, and E for scientific notation, so anytime you have non-standard data, informants become handy. A good example of column input is survey data, which are often coded into single digits (0-9). The advantages of column input over list input include:
- spaces are not required between values
- missing values can be left blank
- character data can have embedded spaces
- you can skip unwanted variables
How does it work?
After the INPUT keyword, list the first variable’s name. If the variable is character, leave a space; then place a $. If the variable name is numeric, leave a space. Then list the column or range of columns for that variable.
Example ----+----1----+----2----+----3----+----4- INPUT Name $ 1-10 Age 11-13 Height 14-18;
Note: the columns are positions of the characters or numbers in the data line, NOT columns like those you see in a spreadsheet. <— I was very confused about this in the beginning, so be aware.
Reading Raw Data Not in Standard Format
Again, standard numeric data contains only numerals, decimal points, plus and minus signs, and E for scientific notation, so anytime you have non-standard data, informants become handy.
The most common non-standard data is dates. Using date informats, SAS will convert conventional forms of dates into a number, the number of days since January 1, 1960. Don’t ask me why January 1, 1960, please. The author of the book didn’t know either.
There are 3 general types of informats: character, numeric, and date.
Character $informatw. Numeric informatw.d Date informatw.
$ indicates character informats, informat is the name of the informat, w is the total width, and d is the number of decimal places. The period is very important as well. Without a period, SAS may try to interpret the informat as a variable name, which by default, cannot contain any special characters except the underscore.
Example: Pumpkin-Carving Contest name, age, type (carved or decorated), date entered, scores Alicia Grossman 13 c 10-28-2012 7.8 6.5 7.2 8.0 7.9 Matthew Lee 9 D 10-30-2012 6.5 5.9 6.8 6.0 8.1 Elizabeth Garcia 10 C 10-29-2012 8.9 7.9 8.5 9.0 8.8 Lori Newcombe 6 D 10-30-2012 6.7 5.6 4.9 5.2 6.1 Jose Martinez 7 d 10-31-2012 8.9 9.510.0 9.7 9.0 Brian Williams 11 C 10-29-2012 7.8 8.4 8.5 7.9 8.0
* Create a SAS data set named contest; * Read the file Pumpkin.dat using formatted input; DATA contest; INFILE 'C:\MyRawData\Pumpkin.dat'; INPUT Name $16. Age 3. +1 Type $1. +1 Date MMDDYY10. (Score1 Score2 Score3 Score4 Score5) (4.1); RUN; * Print the data set to make sure the file was read correctly; PROC PRINT DATA = contest; TITLE 'Pumpkin Carving Contest'; RUN;
The variable Name has an informat of $16., meaning that it is a character variable 16 columns wide. Variable Age has an informal of 3, is numeric, three columns wide, and has no decimal places. Variable Date has an informat MMDDYY10. and reads dates in the form of MM-DD-YYY or MM/DD/YYYY, each 10 columns wide. The remaining variable, Score1 through Score5, all require the same informal, 4.1. By putting the variables and the informat in separate sets of parentheses, you only have to list the informat once.
Want to see the results of the PRINT procedure? Why don’t you try it yourself. =]
Honestly, if you understand 2.7 Reading Raw Data Not in Standard Format, 2.5 and 2.6 are very easy to comprehend, because reading data not in standard format is built upon reading data separated by spaces and arranged in columns. I will move on with my study. However, if you are curious to know what is covered in 2.5, but are too lazy to read the book, please comment below so we can have a thorough discussion on how to read raw data in SAS.