R/lookfor.R
look_for.Rd
look_for
emulates the lookfor
Stata command in R. It supports
searching into the variable names of regular R data frames as well as into
variable labels descriptions, factor levels and value labels.
The command is meant to help users finding variables in large datasets.
look_for(
data,
...,
labels = TRUE,
values = TRUE,
ignore.case = TRUE,
details = c("basic", "none", "full")
)
lookfor(
data,
...,
labels = TRUE,
values = TRUE,
ignore.case = TRUE,
details = c("basic", "none", "full")
)
generate_dictionary(
data,
...,
labels = TRUE,
values = TRUE,
ignore.case = TRUE,
details = c("basic", "none", "full")
)
# S3 method for look_for
print(x, ...)
look_for_and_select(
data,
...,
labels = TRUE,
values = TRUE,
ignore.case = TRUE
)
convert_list_columns_to_character(x)
lookfor_to_long_format(x)
Based on the behavior of the lookfor
command in Stata.
a data frame or a survey object
optional list of keywords, a character string (or several character strings), which can be
formatted as a regular expression suitable for a base::grep()
pattern, or a vector of keywords;
displays all variables if not specified
whether or not to search variable labels (descriptions); TRUE
by default
whether or not to search within values (factor levels or value labels); TRUE
by default
whether or not to make the keywords case sensitive;
TRUE
by default (case is ignored during matching)
add details about each variable (full details could be time consuming for big data frames, FALSE
is equivalent to "none"
and TRUE
to "full"
)
a tibble returned by look_for()
a tibble data frame featuring the variable position, name and description (if it exists) in the original data frame
When no keyword is provided, it will produce a data dictionary of the overall data frame.
The function looks into the variable names for matches to the keywords. If available,
variable labels are included in the search scope.
Variable labels of data.frame imported with foreign or
memisc packages will also be taken into account (see to_labelled()
). If no keyword is
provided, it will return all variables of data
.
look_for()
, lookfor()
and generate_dictionary()
are equivalent.
By default, results will be summarized when printing. To deactivate default printing,
use dplyr::as_tibble()
.
lookfor_to_long_format()
could be used to transform results with one row per factor level
and per value label.
Use convert_list_columns_to_character()
to convert named list columns into character vectors
(see examples).
look_for_and_select()
is a shortcut for selecting some variables and
applying dplyr::select()
to return a data frame with only the selected
variables.
look_for(iris)
#> pos variable label col_type values
#> 1 Sepal.Length — dbl
#> 2 Sepal.Width — dbl
#> 3 Petal.Length — dbl
#> 4 Petal.Width — dbl
#> 5 Species — fct setosa
#> versicolor
#> virginica
# Look for a single keyword.
look_for(iris, "petal")
#> pos variable label col_type values
#> 3 Petal.Length — dbl
#> 4 Petal.Width — dbl
look_for(iris, "s")
#> pos variable label col_type values
#> 1 Sepal.Length — dbl
#> 2 Sepal.Width — dbl
#> 5 Species — fct setosa
#> versicolor
#> virginica
iris %>% look_for_and_select("s") %>% head()
#> Sepal.Length Sepal.Width Species
#> 1 5.1 3.5 setosa
#> 2 4.9 3.0 setosa
#> 3 4.7 3.2 setosa
#> 4 4.6 3.1 setosa
#> 5 5.0 3.6 setosa
#> 6 5.4 3.9 setosa
# Look for with a regular expression
look_for(iris, "petal|species")
#> pos variable label col_type values
#> 3 Petal.Length — dbl
#> 4 Petal.Width — dbl
#> 5 Species — fct setosa
#> versicolor
#> virginica
look_for(iris, "s$")
#> pos variable label col_type values
#> 5 Species — fct setosa
#> versicolor
#> virginica
# Look for with several keywords
look_for(iris, "pet", "sp")
#> pos variable label col_type values
#> 3 Petal.Length — dbl
#> 4 Petal.Width — dbl
#> 5 Species — fct setosa
#> versicolor
#> virginica
look_for(iris, "pet", "sp", "width")
#> pos variable label col_type values
#> 2 Sepal.Width — dbl
#> 3 Petal.Length — dbl
#> 4 Petal.Width — dbl
#> 5 Species — fct setosa
#> versicolor
#> virginica
look_for(iris, "Pet", "sp", "width", ignore.case = FALSE)
#> pos variable label col_type values
#> 3 Petal.Length — dbl
#> 4 Petal.Width — dbl
# Look_for can search within factor levels or value labels
look_for(iris, "vers")
#> pos variable label col_type values
#> 5 Species — fct setosa
#> versicolor
#> virginica
# Quicker search without variable details
look_for(iris, details = "none")
#> pos variable label
#> 1 Sepal.Length —
#> 2 Sepal.Width —
#> 3 Petal.Length —
#> 4 Petal.Width —
#> 5 Species —
# To obtain more details about each variable
look_for(iris, details = "full")
#> pos variable label col_type values
#> 1 Sepal.Length — dbl range: 4.3 - 7.9
#> 2 Sepal.Width — dbl range: 2 - 4.4
#> 3 Petal.Length — dbl range: 1 - 6.9
#> 4 Petal.Width — dbl range: 0.1 - 2.5
#> 5 Species — fct setosa
#> versicolor
#> virginica
# To deactivate default printing, convert to tibble
look_for(iris, details = "full") %>%
dplyr::as_tibble()
#> # A tibble: 5 × 13
#> pos varia…¹ label col_t…² levels value…³ class type na_va…⁴ na_ra…⁵ uniqu…⁶
#> <int> <chr> <chr> <chr> <name> <named> <nam> <chr> <named> <named> <int>
#> 1 1 Sepal.… NA dbl <NULL> <NULL> <chr> doub… <NULL> <NULL> 35
#> 2 2 Sepal.… NA dbl <NULL> <NULL> <chr> doub… <NULL> <NULL> 23
#> 3 3 Petal.… NA dbl <NULL> <NULL> <chr> doub… <NULL> <NULL> 43
#> 4 4 Petal.… NA dbl <NULL> <NULL> <chr> doub… <NULL> <NULL> 22
#> 5 5 Species NA fct <chr> <NULL> <chr> inte… <NULL> <NULL> 3
#> # … with 2 more variables: n_na <int>, range <named list>, and abbreviated
#> # variable names ¹variable, ²col_type, ³value_labels, ⁴na_values, ⁵na_range,
#> # ⁶unique_values
# To convert named lists into character vectors
look_for(iris) %>% convert_list_columns_to_character()
#> # A tibble: 5 × 6
#> pos variable label col_type levels value_labels
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Sepal.Length NA dbl "" ""
#> 2 2 Sepal.Width NA dbl "" ""
#> 3 3 Petal.Length NA dbl "" ""
#> 4 4 Petal.Width NA dbl "" ""
#> 5 5 Species NA fct "setosa; versicolor; virginica" ""
# Long format with one row per factor and per value label
look_for(iris) %>% lookfor_to_long_format()
#> # A tibble: 7 × 6
#> pos variable label col_type levels value_labels
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Sepal.Length NA dbl NA NA
#> 2 2 Sepal.Width NA dbl NA NA
#> 3 3 Petal.Length NA dbl NA NA
#> 4 4 Petal.Width NA dbl NA NA
#> 5 5 Species NA fct setosa NA
#> 6 5 Species NA fct versicolor NA
#> 7 5 Species NA fct virginica NA
# Both functions can be combined
look_for(iris) %>%
lookfor_to_long_format() %>%
convert_list_columns_to_character()
#> # A tibble: 7 × 6
#> pos variable label col_type levels value_labels
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Sepal.Length NA dbl NA NA
#> 2 2 Sepal.Width NA dbl NA NA
#> 3 3 Petal.Length NA dbl NA NA
#> 4 4 Petal.Width NA dbl NA NA
#> 5 5 Species NA fct setosa NA
#> 6 5 Species NA fct versicolor NA
#> 7 5 Species NA fct virginica NA
# Labelled data
if (FALSE) {
data(fertility, package = "questionr")
look_for(children)
look_for(children, "id")
children %>% look_for_and_select("id")
look_for(children) %>%
lookfor_to_long_format() %>%
convert_list_columns_to_character()
}