read.spss {foreign} | R Documentation |
read.spss
reads a file stored by the SPSS save
or
export
commands.
read.spss(file, use.value.labels = TRUE, to.data.frame = FALSE, max.value.labels = Inf, trim.factor.names = FALSE, trim_values = TRUE, reencode = NA)
file |
Character string: the name of the file to read. |
use.value.labels |
Convert variables with value labels into R factors with those levels? |
to.data.frame |
return a data frame? |
max.value.labels |
Only variables with value labels and at most this
many unique values will be converted to factors if
use.value.labels = TRUE . |
trim.factor.names |
Logical: trim trailing spaces from factor levels? |
trim_values |
logical: should values and value labels have
trailing spaces ignored when matching for use.value.labels = TRUE ? |
reencode |
logical: should character strings be re-encoded to the
current locale. The default, NA , means to do so in a UTF-*
locale, only. Alternatively character, specifying an encoding to
assume. |
This uses modified code from the PSPP project (http://www.gnu.org/software/pspp/ for reading the SPSS formats.
Occasionally in SPSS value labels will be added to some values of a
continuous variable (eg to distinguish different types of missing
data), and you will not want these variables converted to factors. By
setting max.val.labels
you can specify that variables with a
large number of distinct values are not converted to factors even if
they have value labels. In addition, variables will not be converted
to factors if there are non-missing values that have no value label.
The value labels are then returned in the "value.labels"
attribute of the variable.
If SPSS variable labels are present, they are returned as the
"variable.labels"
attribute of the answer.
Fixed length strings (including value labels) are padded on the right
with spaces by SPSS, and so are read that way by R. The default
argument trim_values=TRUE
causes trailing spaces to be ignored
when matching to value labels, as examples have been seen where the
strings and the value labels had different amounts of padding. See
the examples for sub
for ways to remove trailing spaces
in charcter data.
A list (or data frame) with one component for each variable in
the saved data set.
If what looks like a Windows codepage was recorded in the SPSS file,
it is attached (as a number) as attribute "codepage"
to the
result.
There may be attributes "label.table"
and
"variable.labels"
. Attribute "label.table"
is a named
list of value labels with one element per variable, either NULL
or a names character vector.
Attribute "variable.labels"
is a named character vector with
names the short variable names and elements the long names.
If SPSS value labels are converted to factors the underlying numerical codes will not in general be the same as the SPSS numerical values, since the numerical codes in R are always 1,2,3,...
You may see warnings about the file encoding for SPSS save
files: it is possible such files contain non-ASCII character data
which need re-encoding. The most common occurrence is Windows codepage
1252, a superset of Latin-1. The encoding is recorded (as an integer)
in attribute "codepage"
of the result if it looks like a
Windows codepage. Automatic re-encoding is done only in UTF-8
locales: see reencode
.
Saikat DebRoy and the R Core team
## Not run: read.spss("datafile") ## don't convert value labels to factor levels read.spss("datafile", use.value.labels = FALSE) ## convert value labels to factors for variables with at most ## ten distinct values. read.spss("datafile", max.val.labels = 10) ## End(Not run)