When your data contains missing values, concerned observations are removed from a model. However, then at a later stage, you try to apply a descending stepwise approach to reduce your model by minimization of AIC, you may encounter an error because the number of rows has changed.
Usage
step_with_na(model, ...)
# Default S3 method
step_with_na(model, ..., full_data = eval(model$call$data))
# S3 method for class 'svyglm'
step_with_na(model, ..., design)
Arguments
- model
A model object.
- ...
Additional parameters passed to
stats::step()
.- full_data
Full data frame used for the model, including missing data.
- design
Survey design previously passed to
survey::svyglm()
.
Details
step_with_na()
applies the following strategy:
recomputes the models using only complete cases;
applies
stats::step()
;recomputes the reduced model using the full original dataset.
step_with_na()
has been tested with stats::lm()
, stats::glm()
,
nnet::multinom()
and survey::svyglm()
. It may be working with other
types of models, but with no warranty.
In some cases, it may be necessary to provide the full dataset initially used to estimate the model.
step_with_na()
may not work inside other functions. In that case, you
may try to pass full_data
to the function.
Examples
set.seed(42)
d <- titanic |>
dplyr::mutate(
Group = sample(
c("a", "b", NA),
dplyr::n(),
replace = TRUE
)
)
mod <- glm(as.factor(Survived) ~ ., data = d, family = binomial())
# step(mod) should produce an error
mod2 <- step_with_na(mod)
#> Error in eval(model$call$data): object 'd' not found
mod2
#> Error: object 'mod2' not found
# \donttest{
## WITH SURVEY ---------------------------------------
library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#>
#> Attaching package: ‘survey’
#> The following object is masked from ‘package:graphics’:
#>
#> dotchart
ds <- d |>
dplyr::mutate(Survived = as.factor(Survived)) |>
srvyr::as_survey()
mods <- survey::svyglm(
Survived ~ Class + Group + Sex,
design = ds,
family = quasibinomial()
)
mod2s <- step_with_na(mods, design = ds)
#> Start: AIC=1471.56
#> Survived ~ Class + Group + Sex
#>
#> Df Deviance AIC
#> - Group 1 1462.6 1469.9
#> <none> 1462.2 1471.6
#> - Class 3 1527.9 1530.3
#> - Sex 1 1712.6 1716.5
#>
#> Step: AIC=1469.94
#> Survived ~ Class + Sex
#>
#> Df Deviance AIC
#> <none> 1462.6 1469.9
#> - Class 3 1528.4 1528.8
#> - Sex 1 1713.1 1714.8
mod2s
#> Independent Sampling design (with replacement)
#> Called via srvyr
#> Sampling variables:
#> - ids: `1`
#>
#> Call: svyglm(formula = Survived ~ Class + Sex, design = design, family = quasibinomial())
#>
#> Coefficients:
#> (Intercept) Class2nd Class3rd ClassCrew SexMale
#> 2.0682 -0.9526 -1.6582 -0.8808 -2.4213
#>
#> Degrees of Freedom: 2200 Total (i.e. Null); 2196 Residual
#> Null Deviance: 2769
#> Residual Deviance: 2229 AIC: NA
# }