Convert input to a factor. — to

The base function base::as.factor() is not a generic, but this variant is. By default, to_factor() is a wrapper for base::as.factor(). Please note that to_factor() differs slightly from haven::as_factor() method provided by haven package.

unlabelled(x) is a shortcut for to_factor(x, strict = TRUE, unclass = TRUE, labelled_only = TRUE).

Usage

to_factor(x, ...)

# S3 method for class 'haven_labelled'
to_factor(
  x,
  levels = c("labels", "values", "prefixed"),
  ordered = FALSE,
  nolabel_to_na = FALSE,
  sort_levels = c("auto", "none", "labels", "values"),
  decreasing = FALSE,
  drop_unused_labels = FALSE,
  user_na_to_na = FALSE,
  strict = FALSE,
  unclass = FALSE,
  explicit_tagged_na = FALSE,
  ...
)

# S3 method for class 'data.frame'
to_factor(
  x,
  levels = c("labels", "values", "prefixed"),
  ordered = FALSE,
  nolabel_to_na = FALSE,
  sort_levels = c("auto", "none", "labels", "values"),
  decreasing = FALSE,
  labelled_only = TRUE,
  drop_unused_labels = FALSE,
  strict = FALSE,
  unclass = FALSE,
  explicit_tagged_na = FALSE,
  ...
)

unlabelled(x, ...)

Arguments

x: Object to coerce to a factor.
...: Other arguments passed down to method.
levels: What should be used for the factor levels: the labels, the values or labels prefixed with values?
ordered: TRUE for ordinal factors, FALSE (default) for nominal factors.
nolabel_to_na: Should values with no label be converted to NA?
sort_levels: How the factor levels should be sorted? (see Details)
decreasing: Should levels be sorted in decreasing order?
drop_unused_labels: Should unused value labels be dropped? (applied only if strict = FALSE)
user_na_to_na: Convert user defined missing values into NA?
strict: Convert to factor only if all values have a defined label?
unclass: If not converted to a factor (when strict = TRUE), convert to a character or a numeric factor by applying base::unclass()?
explicit_tagged_na: Should tagged NA (cf. haven::tagged_na()) be kept as explicit factor levels?
labelled_only: for a data.frame, convert only labelled variables to factors?

Details

If some values doesn't have a label, automatic labels will be created, except if nolabel_to_na is TRUE.

If sort_levels == 'values', the levels will be sorted according to the values of x. If sort_levels == 'labels', the levels will be sorted according to labels' names. If sort_levels == 'none', the levels will be in the order the value labels are defined in x. If some labels are automatically created, they will be added at the end. If sort_levels == 'auto', sort_levels == 'none' will be used, except if some values doesn't have a defined label. In such case, sort_levels == 'values' will be applied.

When applied to a data.frame, only labelled vectors are converted by default to a factor. Use labelled_only = FALSE to convert all variables to factors.

unlabelled() is a shortcut for quickly removing value labels of a vector or of a data.frame. If all observed values have a value label, then the vector will be converted into a factor. Otherwise, the vector will be unclassed. If you want to remove value labels in all cases, use remove_val_labels().

Examples

v <- labelled(
  c(1, 2, 2, 2, 3, 9, 1, 3, 2, NA),
  c(yes = 1, no = 3, "don't know" = 9)
)
to_factor(v)
#>  [1] yes        2          2          2          no         don't know
#>  [7] yes        no         2          <NA>      
#> Levels: yes 2 no don't know
to_factor(v, nolabel_to_na = TRUE)
#>  [1] yes        <NA>       <NA>       <NA>       no         don't know
#>  [7] yes        no         <NA>       <NA>      
#> Levels: yes no don't know
to_factor(v, "p")
#>  [1] [1] yes        [2] 2          [2] 2          [2] 2          [3] no        
#>  [6] [9] don't know [1] yes        [3] no         [2] 2          <NA>          
#> Levels: [1] yes [2] 2 [3] no [9] don't know
to_factor(v, sort_levels = "v")
#>  [1] yes        2          2          2          no         don't know
#>  [7] yes        no         2          <NA>      
#> Levels: yes 2 no don't know
to_factor(v, sort_levels = "n")
#>  [1] yes        2          2          2          no         don't know
#>  [7] yes        no         2          <NA>      
#> Levels: yes no don't know 2
to_factor(v, sort_levels = "l")
#>  [1] yes        2          2          2          no         don't know
#>  [7] yes        no         2          <NA>      
#> Levels: 2 don't know no yes

x <- labelled(c("H", "M", "H", "L"), c(low = "L", medium = "M", high = "H"))
to_factor(x, ordered = TRUE)
#> [1] high   medium high   low   
#> Levels: low < medium < high

# Strict conversion
v <- labelled(c(1, 1, 2, 3), labels = c(No = 1, Yes = 2))
to_factor(v)
#> [1] No  No  Yes 3  
#> Levels: No Yes 3
to_factor(v, strict = TRUE) # Not converted because 3 does not have a label
#> <labelled<double>[4]>
#> [1] 1 1 2 3
#> 
#> Labels:
#>  value label
#>      1    No
#>      2   Yes
to_factor(v, strict = TRUE, unclass = TRUE)
#> [1] 1 1 2 3
#> attr(,"labels")
#>  No Yes 
#>   1   2 

df <- data.frame(
  a = labelled(c(1, 1, 2, 3), labels = c(No = 1, Yes = 2)),
  b = labelled(c(1, 1, 2, 3), labels = c(No = 1, Yes = 2, DK = 3)),
  c = labelled(
    c("a", "a", "b", "c"),
    labels = c(No = "a", Maybe = "b", Yes = "c")
  ),
  d = 1:4,
  e = factor(c("item1", "item2", "item1", "item2")),
  f = c("itemA", "itemA", "itemB", "itemB"),
  stringsAsFactors = FALSE
)
if (require(dplyr)) {
  glimpse(df)
  glimpse(unlabelled(df))
}
#> Rows: 4
#> Columns: 6
#> $ a <dbl+lbl> 1, 1, 2, 3
#> $ b <dbl+lbl> 1, 1, 2, 3
#> $ c <chr+lbl> "a", "a", "b", "c"
#> $ d <int> 1, 2, 3, 4
#> $ e <fct> item1, item2, item1, item2
#> $ f <chr> "itemA", "itemA", "itemB", "itemB"
#> Rows: 4
#> Columns: 6
#> $ a <dbl> 1, 1, 2, 3
#> $ b <fct> No, No, Yes, DK
#> $ c <fct> No, No, Maybe, Yes
#> $ d <int> 1, 2, 3, 4
#> $ e <fct> item1, item2, item1, item2
#> $ f <chr> "itemA", "itemA", "itemB", "itemB"