Add the number of observations in a new column n_obs
, taking into account any
weights if they have been defined.
Usage
tidy_add_n(x, model = tidy_get_model(x))
Details
For continuous variables, it corresponds to all valid observations contributing to the model.
For categorical variables coded with treatment or sum contrasts,
each model term could be associated to only one level of the original
categorical variable. Therefore, n_obs
will correspond to the number of
observations associated with that level. n_obs
will also be computed for
reference rows. For polynomial contrasts (defined with stats::contr.poly()
),
all levels will contribute to the computation of each model term. Therefore,
n_obs
will be equal to the total number of observations. For Helmert and custom
contrasts, only rows contributing positively (i.e. with a positive contrast)
to the computation of a term will be considered for estimating n_obs
. The
result could therefore be difficult to interpret. For a better understanding
of which observations are taken into account to compute n_obs
values, you
could look at model_compute_terms_contributions()
.
For interaction terms, only rows contributing to all the terms of the
interaction will be considered to compute n_obs
.
For binomial logistic models, tidy_add_n()
will also return the
corresponding number of events (n_event
) for each term, taking into account
any defined weights. Observed proportions could be obtained as n_obs / n_event
.
Similarly, a number of events will be computed for multinomial logistic
models (nnet::multinom()
) for each level of the outcome (y.level
),
corresponding to the number of observations equal to that outcome level.
For Poisson models, n_event
will be equal to the number of counts per term.
In addition, a third column exposure
will be computed. If no offset is
defined, exposure is assumed to be equal to 1 (eventually multiplied by
weights) per observation. If an offset is defined, exposure
will be equal
to the (weighted) sum of the exponential of the offset (as a reminder, to
model the effect of x
on the ratio y / z
, a Poisson model will be defined
as glm(y ~ x + offset(log(z)), family = poisson)
). Observed rates could be
obtained with n_event / exposure
.
For Cox models (survival::coxph()
), an individual could be coded
with several observations (several rows). n_obs
will correspond to the
weighted number of observations which could be different from the number of
individuals n_ind
. tidy_add_n()
will also compute a (weighted) number of
events (n_event
) according to the definition of the survival::Surv()
object.
Exposure time is also returned in exposure
column. It is equal to the
(weighted) sum of the time variable if only one variable time is passed to
survival::Surv()
, and to the (weighted) sum of time2 - time
if two time
variables are defined in survival::Surv()
.
For competing risk regression models (tidycmprsk::crr()
), n_event
takes
into account only the event of interest defined by failcode.
The (weighted) total number of observations (N_obs
), of individuals
(N_ind
), of events (N_event
) and of exposure time (Exposure
) are
stored as attributes of the returned tibble.
See also
Other tidy_helpers:
tidy_add_coefficients_type()
,
tidy_add_contrasts()
,
tidy_add_estimate_to_reference_rows()
,
tidy_add_header_rows()
,
tidy_add_pairwise_contrasts()
,
tidy_add_reference_rows()
,
tidy_add_term_labels()
,
tidy_add_variable_labels()
,
tidy_attach_model()
,
tidy_disambiguate_terms()
,
tidy_identify_variables()
,
tidy_plus_plus()
,
tidy_remove_intercept()
,
tidy_select_variables()
Examples
if (FALSE) { # interactive()
lm(Petal.Length ~ ., data = iris) |>
tidy_and_attach() |>
tidy_add_n()
lm(Petal.Length ~ ., data = iris, contrasts = list(Species = contr.sum)) |>
tidy_and_attach() |>
tidy_add_n()
lm(Petal.Length ~ ., data = iris, contrasts = list(Species = contr.poly)) |>
tidy_and_attach() |>
tidy_add_n()
lm(Petal.Length ~ poly(Sepal.Length, 2), data = iris) |>
tidy_and_attach() |>
tidy_add_n()
df <- Titanic |>
dplyr::as_tibble() |>
dplyr::mutate(Survived = factor(Survived, c("No", "Yes")))
glm(
Survived ~ Class + Age + Sex,
data = df, weights = df$n, family = binomial,
contrasts = list(Age = contr.sum, Class = "contr.helmert")
) |>
tidy_and_attach() |>
tidy_add_n()
glm(
Survived ~ Class * (Age:Sex),
data = df, weights = df$n, family = binomial,
contrasts = list(Age = contr.sum, Class = "contr.helmert")
) |>
tidy_and_attach() |>
tidy_add_n()
glm(response ~ age + grade * trt, gtsummary::trial, family = poisson) |>
tidy_and_attach() |>
tidy_add_n()
glm(
response ~ trt * grade + offset(log(ttdeath)),
gtsummary::trial,
family = poisson
) |>
tidy_and_attach() |>
tidy_add_n()
}