| Title: | Post-Linkage Data Analysis |
|---|---|
| Description: | Provides a suite of statistical tools for post-linkage data analysis (PLDA), designed to account for record linkage errors in downstream modeling. The package implements a familiar, formula-based regression interface that adjusts for linkage uncertainty, accommodating workflows where direct access to unlinked primary files is restricted. It consolidates diverse adjustment methodologies, all of which support generalized linear models (linear, logistic, Poisson, and Gamma). These methodologies include weighting approaches (Chambers (2009) <https://hdl.handle.net/10779/uow.27788247>; Chambers et al. (2023) <doi:10.1002/wics.1596>), mixture modeling (Slawski et al. (2025) <doi:10.1093/jrsssa/qnae083>), and Bayesian mixture modeling (Gutman et al. (2016) <doi:10.1002/sim.6586>). For time-to-event data, both the weighting (Vo et al. (2024) <doi:10.1002/sim.9960>) and mixture modeling approaches accommodate Cox proportional hazards models, while the Bayesian approaches extend to parametric survival analysis. Additionally, the package leverages mixture modeling for contingency table analyses and Bayesian methods to enable the multiple imputation of latent match status. |
| Authors: | Priyanjali Bukke [aut, cre], Gauri Kamat [aut], Jiahao Cui [aut], Roee Gutman [aut], Martin Slawski [aut], Zhenbang Wang [ctb], Brady T. West [ctb], Emanuel Ben-David [ctb], Guoqing Diao [ctb] |
| Maintainer: | Priyanjali Bukke <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0.9000 |
| Built: | 2026-05-13 02:40:45 UTC |
| Source: | https://github.com/postlink-group/postlink |
The postlink package provides a unified suite of statistical tools
designed to rigorously account for record linkage errors in downstream modeling.
Record linkage is often error-prone. When datasets are merged using noisy or non-unique
identifiers, mismatches (false links) are inadvertently introduced. Ignoring these errors
acts as a contaminant in regression analysis, typically leading to significantly attenuated
estimates and biased statistical inference. postlink equips researchers with
methodologies to propagate linkage uncertainty into their models, specifically accommodating
"secondary analysis" workflows where direct access to the primary, unlinked files is restricted.
A Two-Phase Workflow
The package is built on a modular, object-oriented S3 architecture that decouples the specification of linkage error from the substantive statistical modeling. This provides a familiar, standard formula-based modeling interface.
Phase 1: Adjustment Specification
The analyst encapsulates the linked data and the chosen error-adjustment methodology
by using one of the adj* constructor functions. These functions validate the
data and return a structured adjustment object:
adjELE(): Specifies the Exchangeable Linkage Error (ELE) model,
which corrects for bias using known or audited mismatch rates.
adjMixture(): Specifies a frequentist mixture model approach
that treats match status as a latent variable, estimating error rates directly from
data (e.g., using match scores) via the EM algorithm.
adjMixBayes(): Specifies a Bayesian mixture model approach,
enabling parameter estimation and multiple imputation of latent match statuses
using Stan.
Phase 2: Estimation & Inference
The adjustment object is subsequently passed to a standard modeling wrapper, integrating the error correction into the familiar R modeling syntax:
plglm(): Generalized Linear Models (linear, logistic, Poisson, Gamma).
plcoxph(): Cox Proportional Hazards models.
plctable(): Contingency table analyses.
plsurvreg(): Parametric survival models.
As a note, estimation and inference supported for each type of adjustment
object vary. Refer to the adj* documentation for models currently supported.
Post-estimation tools function as expected in standard R workflows (e.g.,
summary(), predict(), vcov(), and confint()).
These methods specially are derived to account for the additional steps
introduced by the linkage error adjustment.
Note:
While the two-phase workflow is recommended for standard analyses, the package's
architecture intentionally isolates the core logic of each method into independent
internal routines. Advanced users, developers, or researchers running large-scale
simulations can bypass the wrapper functions and formula-parsing overhead entirely.
By supplying pre-computed design matrices and response vectors, the underlying
computational functions can be directly used if preferred (e.g., coxphELE(),
glmMixture(), survregMixBayes()).
Maintainer: Priyanjali Bukke [email protected]
Authors:
Gauri Kamat
Jiahao Cui
Roee Gutman
Martin Slawski
Other contributors:
Zhenbang Wang [contributor]
Brady T. West [contributor]
Emanuel Ben-David [contributor]
Guoqing Diao [contributor]
Chambers, R. (2009). Regression analysis of probability-linked data. Official Statistics Research Series, 4, 1-15.
Chambers, R. L., Fabrizi, E., Ranalli, M. G., Salvati, N., & Wang, S. (2023). Robust regression using probabilistically linked data. Wiley Interdisciplinary Reviews: Computational Statistics, 15(2), e1596. doi:10.1002/wics.1596
Gutman, R., Sammartino, C., Green, T., & Montague, B. (2016). Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+. Statistics in Medicine, 35(1), 115–129. doi:10.1002/sim.6586
Slawski, M., West, B. T., Bukke, P., Wang, Z., Diao, G., & Ben-David, E. (2025). A general framework for regression with mismatched data based on mixture modelling. Journal of the Royal Statistical Society Series A: Statistics in Society, 188(3), 896-919. doi:10.1093/jrsssa/qnae083
Vo, T. H., Garès, V., Zhang, L. C., Happe, A., Oger, E., Paquelet, S., & Chauvet, G. (2024). Cox regression with linked data. Statistics in Medicine, 43(2), 296-314. doi:10.1002/sim.9960
Useful links:
Report bugs at https://github.com/postlink-group/postlink/issues
Specifies the linked data and information on the underlying record linkage process for regression of linked data assuming exchangeable linkage errors (ELE) as developed by Chambers (2009) and Vo et al. (2024). These approaches correct for bias from mismatch error via weighting matrices estimated using known mismatch rates or clerical reviews (audit samples).
adjELE( linked.data, m.rate, audit.size = NULL, blocks = NULL, weight.matrix = c("ratio", "LL", "BLUE", "all") )adjELE( linked.data, m.rate, audit.size = NULL, blocks = NULL, weight.matrix = c("ratio", "LL", "BLUE", "all") )
linked.data |
A data.frame containing the data after record linkage. |
m.rate |
Numeric vector; known or estimated probability of mismatch for
each record or block. Values must be between 0 and 1. Can be a single global rate,
a vector of length equal to the number of unique blocks, or a vector of
length equal to the number of rows in |
audit.size |
Numeric vector; If the m.rate is estimated, provide sample
sizes for the clerical review audit. Used for variance estimation.
If provided, must align with |
blocks |
A vector or an unquoted variable name found in |
weight.matrix |
Character; the method for estimating the weight matrix. Must be one of "ratio" (default), "LL", "BLUE", or "all". |
The constructor validates consistency between the mismatch rates, audit sizes,
and block identifiers. If blocks are provided, m.rate must
be specified either per-block (length equals number of unique blocks) or
per-record (length equals number of rows).
Explicit provision of linked.data is strongly recommended for
reproducibility and to ensure the adjustment object fully encapsulates
the necessary data for downstream model fitting.
An object of class c("adjELE", "adjustment"). To minimize
memory overhead, the underlying linked.data is stored by reference
within an environment inside this object.
The internal algorithmic structure for the estimating equations was informed by the foundational work presented in Chambers (2009) and Vo et al. (2024). Additionally, the original code provided in the Appendix of Chambers (2009) was utilized as a benchmark oracle during the unit testing phase of package development to check for numerical accuracy and validity of the implementation.
Chambers, R. (2009). Regression analysis of probability-linked data. Official Statistics Research Series, 4, 1-15.
Chambers, R. L., Fabrizi, E., Ranalli, M. G., Salvati, N., & Wang, S. (2023). Robust regression using probabilistically linked data. Wiley Interdisciplinary Reviews: Computational Statistics, 15(2), e1596. doi:10.1002/wics.1596
Vo, T. H., Garès, V., Zhang, L. C., Happe, A., Oger, E., Paquelet, S., & Chauvet, G. (2024). Cox regression with linked data. Statistics in Medicine, 43(2), 296-314. doi:10.1002/sim.9960
plglm() for generalized linear regression modeling
plcoxph() for Cox proportional hazards regression modeling
data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE")data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE")
Specifies the linked data and information on the underlying record linkage process for a mismatch error adjustment using a Bayesian framework based on mixture modeling as developed by Gutman et al. (2016). This framework uses a mixture model for pairs of linked records whose two components reflect distributions conditional on match status, i.e., correct match or mismatch. Posterior inference is carried out via data augmentation or multiple imputation.
adjMixBayes(linked.data = NULL, priors = NULL)adjMixBayes(linked.data = NULL, priors = NULL)
linked.data |
A data.frame containing the linked dataset. |
priors |
A named For GLM models, intercept and slope priors are decoupled. Use
|
Explicit provision of linked.data is strongly recommended for
reproducibility and to ensure the adjustment object fully encapsulates
the necessary data for downstream model fitting.
An object of class c("adjMixBayes", "adjustment"). To minimize
memory overhead, the underlying linked.data is stored by reference
within an environment inside this object.
Gutman, R., Sammartino, C., Green, T., & Montague, B. (2016). Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+. Statistics in Medicine, 35(1), 115–129. doi:10.1002/sim.6586
plglm() for generalized linear regression modeling
plsurvreg() for parametric survival modeling
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) # Construct the Bayesian mixture adjustment object adj_bayes2 <- adjMixBayes( linked.data = lifem_small, priors = list( intercept2 = "normal(0, 10)", beta2 = "normal(0, 0.01)", theta = "beta(2, 2)" ) ) class(adj_bayes2)data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) # Construct the Bayesian mixture adjustment object adj_bayes2 <- adjMixBayes( linked.data = lifem_small, priors = list( intercept2 = "normal(0, 10)", beta2 = "normal(0, 0.01)", theta = "beta(2, 2)" ) ) class(adj_bayes2)
Specifies the linked data and information on the underlying record linkage process for the "General Framework for Regression with Mismatched Data" developed by Slawski et al. (2025). This framework uses a mixture model for pairs of linked records whose two components reflect distributions conditional on match status, i.e., correct match or mismatch. Inference is based on composite likelihood and the EM algorithm. Examples of information about the underlying record linkage process that can be incorporated into the method if available are the assumed overall mismatch rate, safe matches, predictors of match status, or predicted probabilities of correct matches.
adjMixture( linked.data = NULL, m.formula = ~1, m.rate = NULL, safe.matches = NULL )adjMixture( linked.data = NULL, m.formula = ~1, m.rate = NULL, safe.matches = NULL )
linked.data |
A data.frame containing the linked dataset. If |
m.formula |
A one-sided formula object for the mismatch indicator model, with the covariates on the right of "~". The default is an intercept-only model corresponding to a constant mismatch rate. |
m.rate |
Numeric; an optional estimate (a proportion between 0 and 1)
to constrain the global mismatch rate estimate. Defaults to |
safe.matches |
A logical vector or an unquoted variable name found in
|
The constructor assumes that any variables defined in m.formula and
safe.matches are in linked.data or in the same environment.
Explicit provision of linked.data is strongly recommended for
reproducibility and to ensure the adjustment object fully encapsulates
the necessary data for downstream model fitting.
An object of class c("adjMixture", "adjustment"). To minimize
memory overhead, the underlying linked.data is stored by reference
within an environment inside this object.
Slawski, M., West, B. T., Bukke, P., Wang, Z., Diao, G., & Ben-David, E. (2025). A general framework for regression with mismatched data based on mixture modelling. Journal of the Royal Statistical Society Series A: Statistics in Society, 188(3), 896-919. doi:10.1093/jrsssa/qnae083
Bukke, P., Ben-David, E., Diao, G., Slawski, M., & West, B. T. (2025). Cox Proportional Hazards Regression Using Linked Data: An Approach Based on Mixture Modeling. In Frontiers of Statistics and Data Science (pp. 181-200). Singapore: Springer Nature Singapore. doi:10.1007/978-981-96-0742-6_8
Slawski, M., Diao, G., Ben-David, E. (2021). A pseudo-likelihood approach to linear regression with partially shuffled data. Journal of Computational and Graphical Statistics. 30(4), 991-1003. doi:10.1080/10618600.2020.1870482
plglm() for generalized linear regression modeling
plcoxph() for Cox proportional hazards regression modeling
plctable() for contingency table analysis
# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk )# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk )
The brfss data set contains a randomly selected subsample of 2,000 respondents
from the 2013 Behavioral Risk Factor Surveillance System (BRFSS) survey.
The BRFSS is an ongoing, state-based telephone survey conducted by the U.S.
Centers for Disease Control and Prevention (CDC) that collects information on
health-related risk behaviors, chronic health conditions, and the use of preventive
health services from non-institutionalized adults.
data(brfss)data(brfss)
A data frame with 2,000 rows and 8 variables.
In the context of post-linkage data analysis (PLDA), this data set is partitioned into two separate files to simulate a secondary analysis scenario where disparate data sources must be linked using quasi-identifiers (e.g., state, month of interview, sex, and marital status) prior to performing regression analyses (e.g., predicting weight based on height, physical health, and mental health).
Pre-processing steps have been applied to this subsample to remove missing values
("Refused" or NA) and to filter out extreme outliers in physical measurements.
X: State of residence (categorical).
imonth: Month of interview (categorical).
Weight: Weight of the respondent in pounds. Records with a weight
less than 50 lbs or greater than 650 lbs were excluded.
Height: Height of the respondent, formulated as (feet * 100 + inches).
Records with a height less than 54 inches or greater than 84 inches were excluded.
Physhlth: Number of days with poor physical health. Records marked
"Refused" or NA were excluded.
Menthlth: Number of days with poor mental health. Records marked
"Refused" or NA were excluded.
Exerany: Indicator of whether the respondent engaged in physical
activity. Records marked "Refused" or NA were excluded.
m.rate: Block-wise (by interview month) mismatch rate.
Centers for Disease Control and Prevention (CDC) (2013). "Behavioral Risk Factor Surveillance System Survey Data." U.S. Department of Health and Human Services, Centers for Disease Control and Prevention. Available at https://www.cdc.gov/brfss/annual_data/annual_2013.html.
Computes Wald confidence intervals for the model coefficients.
## S3 method for class 'coxphELE' confint(object, parm, level = 0.95, ...)## S3 method for class 'coxphELE' confint(object, parm, level = 0.95, ...)
object |
An object of class |
parm |
A specification of which parameters are to be given confidence intervals, either a vector of numbers or names. If missing, all parameters are considered. |
level |
The confidence level required (default 0.95). |
... |
Additional arguments (currently ignored). |
A matrix (or vector) with lower and upper confidence limits for each parameter.
library(survival) set.seed(104) n <- 200 # 1. Simulate covariates age_centered <- rnorm(n, 0, 5) treatment <- rbinom(n, 1, 0.5) # 2. Simulate true survival times true_time <- rexp(n, rate = exp(0.05 * age_centered - 0.6 * treatment)) cens_time <- rexp(n, rate = 0.2) time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 3. Induce 15% Exchangeable Linkage Error (ELE) mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age_centered linked_trt <- treatment # False links drawn uniformly from the target population false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age_centered[false_link_idx] linked_trt[mis_idx] <- treatment[false_link_idx] linked_data <- data.frame(time = time, status = status, age = linked_age, treatment = linked_trt) # 4. Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ age + treatment, adjustment = adj) # 5. Compute confidence intervals confint(fit) # 95% CI for all coefficients confint(fit, parm = "treatment", level = 0.90) # 90% CI for a specific parameterlibrary(survival) set.seed(104) n <- 200 # 1. Simulate covariates age_centered <- rnorm(n, 0, 5) treatment <- rbinom(n, 1, 0.5) # 2. Simulate true survival times true_time <- rexp(n, rate = exp(0.05 * age_centered - 0.6 * treatment)) cens_time <- rexp(n, rate = 0.2) time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 3. Induce 15% Exchangeable Linkage Error (ELE) mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age_centered linked_trt <- treatment # False links drawn uniformly from the target population false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age_centered[false_link_idx] linked_trt[mis_idx] <- treatment[false_link_idx] linked_data <- data.frame(time = time, status = status, age = linked_age, treatment = linked_trt) # 4. Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ age + treatment, adjustment = adj) # 5. Compute confidence intervals confint(fit) # 95% CI for all coefficients confint(fit, parm = "treatment", level = 0.90) # 90% CI for a specific parameter
Computes Wald confidence intervals for the coefficients of the outcome model and the mismatch indicator model.
## S3 method for class 'coxphMixture' confint(object, parm, level = 0.95, ...)## S3 method for class 'coxphMixture' confint(object, parm, level = 0.95, ...)
object |
An object of class |
parm |
A specification of which parameters are to be given confidence intervals, either a vector of numbers or names. If missing, all parameters are considered. |
level |
The confidence level required (default is 0.95). |
... |
Additional arguments (currently ignored). |
The intervals are calculated based on the variance-covariance matrix returned
by vcov.coxphMixture, using the standard normal approximation:
Estimate +/- z_crit * SE.
A matrix (or vector) with lower and upper confidence limits for each parameter.
library(survival) set.seed(201) # Simulate survival data (N = 200) n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch match_score <- runif(n, 0.5, 1.0) is_mismatch <- rbinom(n, 1, prob = 1 - match_score) # Induce linkage errors by shuffling covariates of mismatched records linked_x1 <- x1 linked_x2 <- x2 mis_idx <- which(is_mismatch == 1) shuffled_idx <- sample(mis_idx) linked_x1[mis_idx] <- x1[shuffled_idx] linked_x2[mis_idx] <- x2[shuffled_idx] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = linked_x1, x2 = linked_x2, match_score = match_score ) # Fit the Cox PH Mixture Model (Slawski et al., 2023) adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, control = list(max.iter = 15)) # Extract Confidence Intervals confint(fit) confint(fit, parm = "treatment", level = 0.90)library(survival) set.seed(201) # Simulate survival data (N = 200) n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch match_score <- runif(n, 0.5, 1.0) is_mismatch <- rbinom(n, 1, prob = 1 - match_score) # Induce linkage errors by shuffling covariates of mismatched records linked_x1 <- x1 linked_x2 <- x2 mis_idx <- which(is_mismatch == 1) shuffled_idx <- sample(mis_idx) linked_x1[mis_idx] <- x1[shuffled_idx] linked_x2[mis_idx] <- x2[shuffled_idx] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = linked_x1, x2 = linked_x2, match_score = match_score ) # Fit the Cox PH Mixture Model (Slawski et al., 2023) adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, control = list(max.iter = 15)) # Extract Confidence Intervals confint(fit) confint(fit, parm = "treatment", level = 0.90)
Computes Wald-type confidence intervals for the estimated cell probabilities of the correctly matched population.
## S3 method for class 'ctableMixture' confint(object, parm, level = 0.95, ...)## S3 method for class 'ctableMixture' confint(object, parm, level = 0.95, ...)
object |
An object of class |
parm |
A specification of which parameters are to be given confidence intervals. If missing, all parameters (cells) are considered. |
level |
The confidence level required. Defaults to 0.95. |
... |
Additional arguments (currently ignored). |
The intervals are calculated using the standard error estimates derived from
vcov.ctableMixture. The lower and upper bounds are truncated
at 0 and 1, respectively, to ensure valid probability estimates.
A matrix with columns giving lower and upper confidence limits for each parameter.
set.seed(125) n <- 300 # 1. Simulate true categorical data with dependency exposure <- sample(c("low", "high"), n, replace = TRUE) # Induce dependency - High exposure -> higher disease probability prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_disease <- ifelse(runif(n) < prob_disease, "yes", "no") # 2. Induce 15% linkage error mis_idx <- sample(1:n, size = floor(0.15 * n)) obs_disease <- true_disease obs_disease[mis_idx] <- sample(obs_disease[mis_idx]) linked_df <- data.frame(exposure = exposure, disease = obs_disease) # 3. Fit the adjusted contingency table model adj <- adjMixture(linked.data = linked_df, m.rate = 0.15) fit <- plctable(~ exposure + disease, adjustment = adj) # 4. Compute confidence intervals # 95% CI for all cell probabilities confint(fit) # 90% CI for specific cells by name confint(fit, parm = c("(low, yes)", "(high, no)"), level = 0.90)set.seed(125) n <- 300 # 1. Simulate true categorical data with dependency exposure <- sample(c("low", "high"), n, replace = TRUE) # Induce dependency - High exposure -> higher disease probability prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_disease <- ifelse(runif(n) < prob_disease, "yes", "no") # 2. Induce 15% linkage error mis_idx <- sample(1:n, size = floor(0.15 * n)) obs_disease <- true_disease obs_disease[mis_idx] <- sample(obs_disease[mis_idx]) linked_df <- data.frame(exposure = exposure, disease = obs_disease) # 3. Fit the adjusted contingency table model adj <- adjMixture(linked.data = linked_df, m.rate = 0.15) fit <- plctable(~ exposure + disease, adjustment = adj) # 4. Compute confidence intervals # 95% CI for all cell probabilities confint(fit) # 90% CI for specific cells by name confint(fit, parm = c("(low, yes)", "(high, no)"), level = 0.90)
glmELE ObjectsComputes Wald confidence intervals for one or more parameters in a glmELE object.
## S3 method for class 'glmELE' confint(object, parm, level = 0.95, weight.matrix = NULL, ...)## S3 method for class 'glmELE' confint(object, parm, level = 0.95, weight.matrix = NULL, ...)
object |
An object of class |
parm |
A specification of which parameters are to be given confidence intervals, either a vector of numbers or names. If missing, all parameters are considered. |
level |
The confidence level required. |
weight.matrix |
Character string specifying the weighting method to use. Defaults to the first method found. |
... |
Additional arguments passed to methods. |
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter.
data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) confint(fit)data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) confint(fit)
Computes posterior credible intervals for the regression coefficients in a
fitted glmMixBayes model. By default, intervals are returned for all
coefficients in component 1 of the mixture model. A subset of
coefficients can be selected using parm.
## S3 method for class 'glmMixBayes' confint(object, parm = NULL, level = 0.95, ...)## S3 method for class 'glmMixBayes' confint(object, parm = NULL, level = 0.95, ...)
object |
A |
parm |
Optional. Names or numeric indices of coefficients for which
credible intervals should be returned. If |
level |
Probability level for the credible intervals. Defaults to
|
... |
Not used. |
A matrix with one row per coefficient and two columns giving the lower and upper credible interval bounds. Row names correspond to coefficient names.
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) confint(fit)data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) confint(fit)
Computes Wald confidence intervals for one or more parameters in a glmMixture object.
## S3 method for class 'glmMixture' confint(object, parm, level = 0.95, ...)## S3 method for class 'glmMixture' confint(object, parm, level = 0.95, ...)
object |
An object of class |
parm |
A specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
The confidence level required. |
... |
Additional arguments (currently ignored). |
The intervals are calculated based on the sandwich variance estimator:
Estimate +/- z_crit * SE.
For Gaussian and Gamma families, a t-distribution is used with residual degrees of freedom.
For Binomial and Poisson families, a standard normal distribution is used.
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter.
# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) confint(fit)# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) confint(fit)
Computes posterior credible intervals for the regression coefficients,
mixing weight, and family-specific distribution parameters from a fitted
survMixBayes model. The returned intervals are organized by mixture
component and parameter type. Component 1 corresponds to the correct-match
component and component 2 to the incorrect-match component.
## S3 method for class 'survMixBayes' confint(object, parm = NULL, level = 0.95, ...)## S3 method for class 'survMixBayes' confint(object, parm = NULL, level = 0.95, ...)
object |
An object of class |
parm |
Optional character vector selecting which interval blocks to
return. For example, |
level |
Probability level for the credible intervals. Defaults to
|
... |
Further arguments (unused). |
A named list of credible intervals. Elements coef1 and
coef2 are matrices with one row per regression coefficient and two
columns giving the lower and upper interval bounds for components 1 and 2,
respectively, where component 1 is the correct-match component and
component 2 is the incorrect-match component. Elements such as theta, shape1,
shape2, scale1, and scale2 are numeric vectors of
length 2 containing the lower and upper credible interval bounds for the
corresponding scalar parameters.
set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) # Calculate 95% credible intervals for all parameters confint(fit, level = 0.95) # Extract credible intervals specifically for the mixing weight confint(fit, parm = "theta", level = 0.90)set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) # Calculate 95% credible intervals for all parameters confint(fit, level = 0.95) # Extract credible intervals specifically for the mixing weight confint(fit, parm = "theta", level = 0.90)
Fit Cox proportional hazards regression adjusted for mismatched data based on the approach developed in Vo et al., 2024 assuming exchangeable linkage errors. Block-wise mismatch rates are assumed to be known.
coxphELE( x, y, cens, m.rate, blocks, audit.size = NULL, control = list(init.beta = NULL), ... )coxphELE( x, y, cens, m.rate, blocks, audit.size = NULL, control = list(init.beta = NULL), ... )
x |
A matrix or data.frame of covariates (design matrix). |
y |
A numeric vector of observed time-to-event outcomes. |
cens |
A numeric vector indicating censoring status (1 = censored, 0 = event).
Note: This is the reverse of the standard |
m.rate |
block-wise mismatch rates (should be a vector with length equal to the number of blocks) - by default assume a single block. |
blocks |
block indicators. |
audit.size |
a vector of block sizes in the audit sample (selected by simple random sampling) if used to estimate the m.rate (optional). If a single value is provided, assume the same value for all blocks and put out a warning. |
control |
an optional list variable to of control arguments including "init.beta" for the initial outcome model coefficient estimates) - by default is the naive estimator. |
... |
the option to directly pass "control" arguments |
a list of results from the function called depending on the "family" specified.
coefficients |
the outcome model coefficient estimates |
var |
the variance-covariance matrix |
linear.predictors |
the linear predictors |
means |
Column means of the covariate matrix |
n |
Number of observations. |
nevent |
Number of events. |
Vo, T. H., Garès, V., Zhang, L. C., Happe, A., Oger, E.,
Paquelet, S., & Chauvet, G. (2024). Cox regression with linked data.
Statistics in Medicine, 43(2), 296-314.
set.seed(101) n <- 250 # 1. Simulate true data age <- rnorm(n, 60, 8) treatment <- rbinom(n, 1, 0.5) # Simulate true hazard # Older age and lack of treatment increase hazard true_hazard <- exp(0.05 * (age - 60) - 0.5 * treatment) true_time <- rexp(n, rate = true_hazard) cens_time <- rexp(n, rate = 0.1) obs_time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 1 = event, 0 = censored # 2. Induce 15% exchangeable linkage errors mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age linked_treatment <- treatment false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age[false_link_idx] linked_treatment[mis_idx] <- treatment[false_link_idx] # 3. Prepare matrices for the internal routine cens_ele <- 1 - status X_mat <- cbind(age = linked_age, treatment = linked_treatment) # 4. Fit the model directly # cens = 1 for censored, 0 for event fit_ele <- coxphELE(x = X_mat, y = obs_time, cens = cens_ele, m.rate = 0.15) print(fit_ele$coefficients)set.seed(101) n <- 250 # 1. Simulate true data age <- rnorm(n, 60, 8) treatment <- rbinom(n, 1, 0.5) # Simulate true hazard # Older age and lack of treatment increase hazard true_hazard <- exp(0.05 * (age - 60) - 0.5 * treatment) true_time <- rexp(n, rate = true_hazard) cens_time <- rexp(n, rate = 0.1) obs_time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 1 = event, 0 = censored # 2. Induce 15% exchangeable linkage errors mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age linked_treatment <- treatment false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age[false_link_idx] linked_treatment[mis_idx] <- treatment[false_link_idx] # 3. Prepare matrices for the internal routine cens_ele <- 1 - status X_mat <- cbind(age = linked_age, treatment = linked_treatment) # 4. Fit the model directly # cens = 1 for censored, 0 for event fit_ele <- coxphELE(x = X_mat, y = obs_time, cens = cens_ele, m.rate = 0.15) print(fit_ele$coefficients)
Fits a Cox proportional hazards regression adjusting for mismatched data using a mixture modeling framework in the secondary analysis setting. The method relies on a two-component mixture model where true matches follow the Cox model and mismatches follow the marginal distribution of the survival outcome. Variance estimates are obtained via Louis' method.
coxphMixture( x, y, cens, z, m.rate = NULL, safe.matches = NULL, control = list(), ... )coxphMixture( x, y, cens, z, m.rate = NULL, safe.matches = NULL, control = list(), ... )
x |
A matrix or data.frame of covariates (design matrix). |
y |
A numeric vector of observed time-to-event outcomes. |
cens |
A numeric vector indicating censoring status (1 = censored, 0 = event).
Note: This is the reverse of the standard |
z |
A matrix or data.frame of mismatch covariates (e.g., match scores, blocking variables). Used to model the probability of a mismatch. |
m.rate |
An optional numeric value between 0 and 1 specifying the assumed overall mismatch rate upper bound. If provided, the mismatch indicator model is constrained such that the average estimated mismatch rate does not exceed this bound. |
safe.matches |
A logical vector indicating records known to be correct matches (TRUE). These records are fixed as matches (probability 1) during estimation. Defaults to all FALSE. |
control |
An optional list of control parameters. Parameters can also be
passed directly via
|
... |
Additional arguments passed to |
An list of results:
coefficients |
Estimated coefficients for the outcome model (beta). |
m.coefficients |
Estimated coefficients for the mismatch model (gamma). |
var |
Variance-covariance matrix of the estimates. |
linear.predictors |
Linear predictors for the outcome model. |
means |
Column means of the covariate matrix |
n |
Number of observations. |
nevent |
Number of events. |
match.prob |
Posterior probabilities that each observation is a correct match. |
objective |
Value of the negative log pseudo-likelihood at each iteration. |
converged |
Logical indicating if the algorithm converged. |
Lambdahat0 |
Estimated baseline cumulative hazard. |
gLambdahat0 |
the baseline cumulative hazard for the marginal density of the response variable (using Nelson-Aalen estimator) |
Bukke, P., Ben-David, E., Diao, G., Slawski, M., & West, B. T. (2025). Cox Proportional Hazards Regression Using Linked Data: An Approach Based on Mixture Modelling.
library(survival) set.seed(123) n <- 200 # Generate covariates x_cov <- seq(-3, 3, length = n) d_cov <- rep(0:1, each = n/2) X <- cbind(d_cov, x_cov, x_cov * d_cov) # True parameters b <- c(-1.5, 1, 0.5) sigma <- 0.25 mu <- X %*% b y <- exp(drop(mu)) * rweibull(n, shape = 1/sigma) # Censoring cens <- (y >= 1.5) y[cens] <- 1.5 # Generate mismatch errors ps <- rbeta(n, 4.5, 0.5) logit_ps <- log(ps / (1 - ps)) mp <- cbind(1, logit_ps) gamma_true <- c(-0.5, 1) m <- 1 - rbinom(n, prob = plogis(mp %*% gamma_true), size = 1) yperm <- y shuffled_ix <- sample(which(m == 1)) yperm[shuffled_ix] <- yperm[sample(shuffled_ix)] # Fit model fit <- coxphMixture(x = X, y = yperm, cens = as.numeric(cens), z = matrix(logit_ps, ncol = 1), control = list(max.iter = 50)) print(fit) summary(fit)library(survival) set.seed(123) n <- 200 # Generate covariates x_cov <- seq(-3, 3, length = n) d_cov <- rep(0:1, each = n/2) X <- cbind(d_cov, x_cov, x_cov * d_cov) # True parameters b <- c(-1.5, 1, 0.5) sigma <- 0.25 mu <- X %*% b y <- exp(drop(mu)) * rweibull(n, shape = 1/sigma) # Censoring cens <- (y >= 1.5) y[cens] <- 1.5 # Generate mismatch errors ps <- rbeta(n, 4.5, 0.5) logit_ps <- log(ps / (1 - ps)) mp <- cbind(1, logit_ps) gamma_true <- c(-0.5, 1) m <- 1 - rbinom(n, prob = plogis(mp %*% gamma_true), size = 1) yperm <- y shuffled_ix <- sample(which(m == 1)) yperm[shuffled_ix] <- yperm[sample(shuffled_ix)] # Fit model fit <- coxphMixture(x = X, y = yperm, cens = as.numeric(cens), z = matrix(logit_ps, ncol = 1), control = list(max.iter = 50)) print(fit) summary(fit)
Estimates the cell probabilities of a two-way contingency table in the presence of linkage errors. The function implements the methodology described in Slawski et al. (2025), modeling the observed table as a mixture of correctly matched records (following a saturated model) and mismatched records (assumed to follow an independence model).
ctableMixture(tab, m.rate, control = list(), ...)ctableMixture(tab, m.rate, control = list(), ...)
tab |
A numeric matrix or table of counts representing the observed two-way contingency table. |
m.rate |
A numeric value between 0 and 1 indicating the assumed rate of mismatched records in the data. |
control |
A list of control parameters. See |
... |
Additional control arguments. If not provided in
|
In the absence of linkage errors, the standard estimator for cell probabilities is the matrix of observed relative frequencies. When linkage errors are present (specifically mismatches), this estimator is biased.
This function corrects for this bias using an Expectation-Maximization (EM) algorithm. The observed data is modeled as a mixture:
where:
(m.rate) is the mismatch rate (fixed/known).
is the distribution of correct matches (saturated model).
is the distribution of mismatches (independence model).
The algorithm iteratively updates the posterior probability of a record being a correct match (E-step) and the estimates for the saturated and independence distributions (M-step).
A list of results:
phat: Matrix of estimated cell probabilities for the correctly matched population (the target parameter).
phat0: Matrix of estimated cell probabilities for the mismatched population (independence model).
var: Estimated variance-covariance matrix of the estimators in phat.
ftable: The estimated contingency table of counts for the correctly matched population (adjusted for bias).
objective: The final value of the negative log-likelihood.
converged: Logical indicating if the algorithm converged within max.iter.
Slawski, M., West, B. T., Bukke, P., Wang, Z., Diao, G., & Ben-David, E. (2025). A general framework for regression with mismatched data based on mixture modelling. Journal of the Royal Statistical Society Series A.
# Generate Synthetic Data set.seed(1234) K <- 3; L <- 4 n <- 1000 # Define true probabilities for a KxL table cellprobs <- c(0.18, 0.05, 0.03, 0.04, 0.02, 0.14, 0.02, 0.02, 0.10, 0.21, 0.15, 0.04) matrix_probs <- matrix(cellprobs, nrow = K, ncol = L) # Generate multinomial counts dat <- stats::rmultinom(n = n, size = 1, prob = cellprobs) obs_idx <- apply(dat, 2, function(x) which(x == 1)) X <- ceiling(obs_idx / L) # Row indices Y <- (obs_idx %% L); Y[Y == 0] <- L # Col indices # Introduce Linkage Error (Mismatches) alpha <- 0.20 # 20% mismatch rate n_mismatch <- round(n * alpha) Y_perm <- Y # Shuffle the first n_mismatch Y values to break dependence Y_perm[1:n_mismatch] <- sample(Y[1:n_mismatch]) # Create Observed Table (with error) tab_obs <- table(X, Y_perm) # Apply Adjustment Method fit <- ctableMixture(tab = tab_obs, m.rate = alpha) # Inspect Results print(fit$converged) # Compare estimated Correct Counts vs True Counts (approx) print(round(fit$ftable)) print(round(table(X, Y))) # True table without errors# Generate Synthetic Data set.seed(1234) K <- 3; L <- 4 n <- 1000 # Define true probabilities for a KxL table cellprobs <- c(0.18, 0.05, 0.03, 0.04, 0.02, 0.14, 0.02, 0.02, 0.10, 0.21, 0.15, 0.04) matrix_probs <- matrix(cellprobs, nrow = K, ncol = L) # Generate multinomial counts dat <- stats::rmultinom(n = n, size = 1, prob = cellprobs) obs_idx <- apply(dat, 2, function(x) which(x == 1)) X <- ceiling(obs_idx / L) # Row indices Y <- (obs_idx %% L); Y[Y == 0] <- L # Col indices # Introduce Linkage Error (Mismatches) alpha <- 0.20 # 20% mismatch rate n_mismatch <- round(n * alpha) Y_perm <- Y # Shuffle the first n_mismatch Y values to break dependence Y_perm[1:n_mismatch] <- sample(Y[1:n_mismatch]) # Create Observed Table (with error) tab_obs <- table(X, Y_perm) # Apply Adjustment Method fit <- ctableMixture(tab = tab_obs, m.rate = alpha) # Inspect Results print(fit$converged) # Compare estimated Correct Counts vs True Counts (approx) print(round(fit$ftable)) print(round(table(X, Y))) # True table without errors
Fits a generalized linear model (GLM) accounting for exchangeable linkage errors (ELE) as defined by Chambers (2009). It solves the unbiased estimating equations resulting from the modified mean function induced by mismatch errors.
glmELE( x, y, family = "gaussian", m.rate, audit.size = NULL, blocks, weight.matrix = "all", control = list(init.beta = NULL), ... )glmELE( x, y, family = "gaussian", m.rate, audit.size = NULL, blocks, weight.matrix = "all", control = list(init.beta = NULL), ... )
x |
A numeric matrix of predictors (design matrix). |
y |
A numeric vector of responses. |
family |
the type of regression model ("gaussian" - default, "poisson", "binomial", "gamma"). Standard link functions are used ("identity" for Gaussian, "log" for Poisson and Gamma, and "logit" for binomial). |
m.rate |
A numeric vector of mismatch rates. If the length is 1, it is replicated for all blocks. If length > 1, it must match the number of unique blocks. |
audit.size |
a vector of block sizes in the audit sample (selected by simple random sampling) if used to estimate the m.rate (optional). If a single value is provided, assume the same value for all blocks and put out a warning. |
blocks |
A vector indicating the block membership of each observation. |
weight.matrix |
A character string specifying the weighting method ("ratio-type", "Lahiri-Larsen", "BLUE", or "all" (default)) |
control |
an optional list variable to of control arguments including "init.beta" for the initial outcome model coefficient estimates) - by default is the naive estimator when the weight matrix is ratio-type or Lahiri-Larsen and is the Lahiri-Larsen estimator for the BLUE weight matrix. |
... |
Pass control arguments directly. |
A list of results:
coefficients |
A named vector (or matrix) of coefficients for the outcome model. |
residuals |
The working residuals, defined as |
fitted.values |
The fitted mean values of the outcome model, obtained by transforming the linear predictors by the inverse of the link function. |
linear.predictors |
The linear fit on the link scale. |
deviance |
The deviance of the weighted outcome model at convergence. |
null.deviance |
The deviance of the weighted null outcome model. |
var |
The estimated variance-covariance matrix of the parameters (sandwich estimator). |
dispersion |
The estimated dispersion parameter (e.g., variance for Gaussian, 1/shape for Gamma). |
rank |
The numeric rank of the fitted linear model. |
df.residual |
The residual degrees of freedom. |
df.null |
The residual degrees of freedom for the null model. |
family |
The |
call |
The matched call. |
Chambers, R. (2009). Regression analysis of probability-linked data. Official Statistics Research Series, 4, 1-15.
data(brfss) brfss <- na.omit(brfss) x <- cbind(1, subset(brfss, select = c(Height,Physhlth,Menthlth,Exerany))) y <- brfss$Weight fit <- glmELE(x, y, family = "gaussian", m.rate = unique(brfss$m.rate), blocks = brfss$imonth, weight.matrix = "BLUE")data(brfss) brfss <- na.omit(brfss) x <- cbind(1, subset(brfss, select = c(Height,Physhlth,Menthlth,Exerany))) y <- brfss$Weight fit <- glmELE(x, y, family = "gaussian", m.rate = unique(brfss$m.rate), blocks = brfss$imonth, weight.matrix = "BLUE")
Fits a Bayesian two-component mixture generalized linear model (GLM) using Stan. Each observation is assumed to arise from one of two latent components with component-specific regression coefficients.
glmMixBayes( X, y, family = "gaussian", priors = NULL, control = list(iterations = 10000, burnin.iterations = 1000, seed = sample.int(.Machine$integer.max, 1), cores = getOption("mc.cores", 1L)), ... )glmMixBayes( X, y, family = "gaussian", priors = NULL, control = list(iterations = 10000, burnin.iterations = 1000, seed = sample.int(.Machine$integer.max, 1), cores = getOption("mc.cores", 1L)), ... )
X |
A numeric design matrix (N x K), typically from |
y |
A response vector of length N. For |
family |
One of |
priors |
A named Intercept and slope priors are decoupled. Use |
control |
A named
Values supplied through |
... |
Optional arguments that override elements of
|
The function supports Gaussian, Poisson, Binomial, and Gamma outcome families and returns posterior samples of the component-specific regression parameters and mixture weight.
An object of class "glmMixBayes" containing (at least):
m_samplesPosterior draws of aligned latent component labels (matrix of size draws × N), where component 1 corresponds to the correct-match component and component 2 to the incorrect-match component.
estimates$coefficientsPosterior draws of regression coefficients for the correct-match component (component 1; draws × K).
estimates$m.coefficientsPosterior draws of regression coefficients for the incorrect-match component (component 2; draws × K).
estimates$dispersionPosterior draws of the dispersion parameter for the correct-match component (component 1; family-specific).
estimates$m.dispersionPosterior draws of the dispersion parameter for the incorrect-match component (component 2; family-specific).
familyThe GLM family used in the model.
callThe matched function call.
Mixture models are invariant to permutations of component labels, which can lead to label switching in MCMC output. To ensure interpretable posterior summaries, this function applies a post-processing step that aligns component labels across posterior draws.
First, an optional global swap of labels (1 and 2) is performed
if component 2 is more frequent overall. Then, labels are
aligned across draws using the ECR-ITERATIVE-1
relabeling algorithm.
Gutman, R., Sammartino, C., Green, T., & Montague, B. (2016). Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+. Statistics in Medicine, 35(1), 115–129. doi:10.1002/sim.6586
Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4), 795–809. doi:10.1111/1467-9868.00265
Papastamoulis, P. (2016). label.switching: An R package for dealing with the label switching problem in MCMC outputs. Journal of Statistical Software, 69(1), 1–24. doi:10.18637/jss.v069.c01
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death fit <- glmMixBayes( X = x, y = y, family = "gaussian", control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) )data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death fit <- glmMixBayes( X = x, y = y, family = "gaussian", control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) )
Fits a generalized linear model (GLM) accounting for mismatch errors using a mixture model framework in the secondary analysis setting. The variance-covariance matrix is estimated using the sandwich formula.
glmMixture( x, y, family, z = cbind(rep(1, nrow(x))), m.rate = NULL, safe.matches = NULL, control = list(), ... )glmMixture( x, y, family, z = cbind(rep(1, nrow(x))), m.rate = NULL, safe.matches = NULL, control = list(), ... )
x |
Design matrix for the primary outcome model (numeric matrix or data frame). |
y |
Response vector for the primary outcome model. |
family |
A family object (e.g., |
z |
Design matrix for the mismatch indicator model (mismatch covariates). If NULL, an intercept-only model is assumed. |
m.rate |
The assumed overall mismatch rate (a proportion between 0 and 1). If provided, it imposes a constraint on the mismatch model intercept. |
safe.matches |
Logical vector; |
control |
An optional list of control parameters. Arguments passed via
|
... |
Additional arguments passed to |
A list of results:
coefficients |
A named vector of coefficients for the outcome model. |
m.coefficients |
A named vector of coefficients for the mismatch indicator model (gamma). |
match.prob |
The posterior correct match probabilities (weights) for each observation. |
residuals |
The working residuals, defined as |
fitted.values |
The fitted mean values of the outcome model, obtained by transforming the linear predictors by the inverse of the link function. |
linear.predictors |
The linear fit on the link scale. |
deviance |
The deviance of the weighted outcome model at convergence. |
null.deviance |
The deviance of the weighted null outcome model. |
var |
The estimated variance-covariance matrix of the parameters (sandwich estimator). |
dispersion |
The estimated dispersion parameter (e.g., variance for Gaussian, 1/shape for Gamma). |
objective |
A vector tracking the negative log pseudo-likelihood at each iteration of the EM algorithm. |
converged |
Logical indicating if the EM algorithm converged within |
rank |
The numeric rank of the fitted linear model. |
df.residual |
The residual degrees of freedom. |
df.null |
The residual degrees of freedom for the null model. |
family |
The |
call |
The matched call. |
Slawski, M.*, West, B. T., Bukke, P., Wang, Z., Diao, G., & Ben-David, E. (2025). A general framework for regression with mismatched data based on mixture modelling. Journal of the Royal Statistical Society Series A: Statistics in Society, 188(3), 896-919. doi:10.1093/jrsssa/qnae083
Slawski, M.*, Diao, G., Ben-David, E. (2021). A pseudo-likelihood approach to linear regression with partially shuffled data. Journal of Computational and Graphical Statistics. 30(4), 991-1003. doi:10.1080/10618600.2020.1870482
data(lifem) x <- cbind(1, poly(lifem$unit_yob, 3, raw = TRUE)) y <- lifem$age_at_death z <- cbind(1, lifem$commf, lifem$comml) fit <- glmMixture(x, y, family = "gaussian", z, m.rate = 0.05, safe.matches = lifem$hndlnk)data(lifem) x <- cbind(1, poly(lifem$unit_yob, 3, raw = TRUE)) y <- lifem$age_at_death z <- cbind(1, lifem$commf, lifem$comml) fit <- glmMixture(x, y, family = "gaussian", z, m.rate = 0.05, safe.matches = lifem$hndlnk)
The lifem data set contains a subset of data from the Life-M project (https://life-m.org/) on 3,238 individuals born between 1883
to 1906. These records were obtained from linking birth certificates and death certificates either of two
ways. A fraction of the records (2,159 records) were randomly sampled to be “hand-linked at some level” (HL).
These records are high quality and were manually linked at some point by trained research assistants.
The remaining records were “purely machine-linked” (ML) based on probabilistic record linkage without clerical
review. The Life-M team expects the mismatch rate among these records to be around 5% (Bailey et al.
2022). Of interest is the relationship between age at death and year of birth. The lifem demo data set consists of 2,159 hand-linked records
and 1,079 records that were randomly sampled from the purely machine-linked records (~2:1 HL-ML ratio).
data(lifem)data(lifem)
a data frame with 3,238 rows and 6 variables
yob: year of birth (value from 1883 and 1906)
unit_yob: yob re-scaled to the unit interval for analysis (between 0 and 1). If X is the yob, we use the following: (X – min(X)) / (max(X) – min(X)) = a * X + b, a = 1/(max(X) – min(X)), b = -min(X)*a
age_at_death: age at death (in years)
hndlnk: whether record was purely machine-linked or hand-linked at some level.
commf: commonness score of first name (between 0 and 1). It is based on the 1940 census. It is a ratio of the log count of the individual’s first name over the log count of the most commonly occurring first name in the census.
comml: commonness score of last name (between 0 and 1). It is based on the 1940 census. It is a ratio of the log count of the individual’s last name over the log count of the most commonly occurring last name in the census.
Bailey, Martha J., Lin, Peter Z., Mohammed, A.R. Shariq, Mohnen, Paul, Murray, Jared, Zhang, Mengying, and Prettyman, Alexa. LIFE-M: The Longitudinal, Intergenerational Family Electronic Micro-Database. Ann Arbor, MI: Inter-university Consortium for Political and Social Research (distributor), 2022-12-21. < doi:10.3886/E155186V5 >
Generic function for pooling parameter estimates from Bayesian mixture models using posterior draws of the latent component indicators.
mi_with(object, ...)mi_with(object, ...)
object |
A fitted Bayesian mixture model object. |
... |
Additional arguments passed to methods. |
A pooled model object combining parameter estimates across posterior component-indicator draws.
# mi_with() is a generic function for posterior allocation–based pooling. # See ?mi_with.glmMixBayes for a complete example illustrating its use # with Bayesian GLM mixture models.# mi_with() is a generic function for posterior allocation–based pooling. # See ?mi_with.glmMixBayes for a complete example illustrating its use # with Bayesian GLM mixture models.
Use posterior draws of the latent match indicators from glmMixBayes()
to repeatedly identify which records are treated as correct matches, refit the
requested regression model on those records, and pool the resulting estimates.
Each retained posterior draw defines one subset of records classified as
correct matches. The function fits the specified lm() or glm()
model to that subset, extracts the estimated coefficients and their covariance
matrix, and combines the results across draws using multiple-imputation
pooling rules.
## S3 method for class 'glmMixBayes' mi_with( object, data, formula, family = NULL, min_n = NULL, quietly = TRUE, ... )## S3 method for class 'glmMixBayes' mi_with( object, data, formula, family = NULL, min_n = NULL, quietly = TRUE, ... )
object |
A |
data |
A data.frame with all candidate records in the same row order as used in the model. |
formula |
Model formula for refitting on each draw (required). |
family |
A |
min_n |
Minimum number of records required to fit the model for a given
posterior draw. The default is |
quietly |
If |
... |
Additional arguments passed through (currently unused). |
An object of class c("mi_link_pool_glm", "mi_link_pool")
containing pooled coefficient estimates, standard errors, confidence
intervals, and related summary information.
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) pooled_fit <- mi_with( object = fit, data = lifem_small, formula = age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = gaussian() ) print(pooled_fit)data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) pooled_fit <- mi_with( object = fit, data = lifem_small, formula = age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = gaussian() ) print(pooled_fit)
Use posterior draws of the latent match indicators from survregMixBayes()
to repeatedly identify which records are treated as correct matches, refit a
Cox proportional hazards model on those records, and pool the resulting
estimates using multiple-imputation pooling rules.
Each retained posterior draw defines one subset of records classified as
correct matches. The function fits the specified survival::coxph()
model to that subset, extracts the estimated coefficients and covariance
matrix, and combines the results across draws using Rubin's rules.
## S3 method for class 'survMixBayes' mi_with( object, data, formula, min_n = NULL, quietly = TRUE, ties = "efron", ... )## S3 method for class 'survMixBayes' mi_with( object, data, formula, min_n = NULL, quietly = TRUE, ties = "efron", ... )
object |
A |
data |
A data.frame with all candidate records in the same row order as used in the model. |
formula |
Model formula for refitting on each draw (required), typically
of the form |
min_n |
Minimum number of records required to fit the model for a given
posterior draw. The default is |
quietly |
If |
ties |
Method for handling tied event times in |
... |
Additional arguments passed to |
An object of class c("mi_link_pool_survreg", "mi_link_pool")
containing pooled coefficient estimates, standard errors, confidence
intervals, and related summary information.
set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) pooled_obj <- mi_with( object = fit, data = linked_df, formula = survival::Surv(time, status) ~ trt ) print(pooled_obj)set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) pooled_obj <- mi_with( object = fit, data = linked_df, formula = survival::Surv(time, status) ~ trt ) print(pooled_obj)
plcoxph fits Cox proportional hazards models to linked data, adjusting for
potential mismatch errors. It serves as a wrapper around the internal fitcoxph
function, for compatibility with the coxph syntax.
plcoxph( formula, adjustment, subset, na.action, model = TRUE, x = FALSE, y = FALSE, control = list(), ... )plcoxph( formula, adjustment, subset, na.action, model = TRUE, x = FALSE, y = FALSE, control = list(), ... )
formula |
A formula object, with the response on the left of a ~ operator,
and the terms on the right. The response must be a survival object as returned
by the |
adjustment |
An object inheriting from class |
subset |
An optional vector specifying a subset of observations. |
na.action |
A function for handling NAs. |
model |
Logical; if |
x, y
|
Logical; if |
control |
A list of parameters for controlling the linkage error adjustment process. |
... |
Additional arguments passed to the internal function. |
An object representing the fitted model. The specific class and structure of the
returned object depend directly on the adjustment method provided:
If adjustment is of class adjELE, returns an object of class coxphELE.
If adjustment is of class adjMixture, returns an object of class coxphMixture.
adjELE, adjMixture, coxphELE, coxphMixture
library(survival) set.seed(101) n <- 250 # Simulate true survival data x <- rnorm(n) true_hazard <- exp(0.5 * x) true_time <- rexp(n, true_hazard) true_status <- rbinom(n, 1, 0.8) # Induce linkage mismatch errors match_score <- rbeta(n, 5, 1) is_mismatch <- rbinom(n, 1, 1 - match_score) obs_time <- true_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) # Shuffle time and status together for mismatched records shuffled_idx <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled_idx] obs_status[mismatch_idx] <- obs_status[shuffled_idx] linked_data <- data.frame(time = obs_time, status = obs_status, x = x, match_score) # Specify the Adjustment Method adj <- adjMixture( linked.data = linked_data, m.formula = ~ match_score ) # Fit the Adjusted Cox Proportional Hazards Model fit <- plcoxph( Surv(time, status) ~ x, adjustment = adj, control = list(max.iter = 50) )library(survival) set.seed(101) n <- 250 # Simulate true survival data x <- rnorm(n) true_hazard <- exp(0.5 * x) true_time <- rexp(n, true_hazard) true_status <- rbinom(n, 1, 0.8) # Induce linkage mismatch errors match_score <- rbeta(n, 5, 1) is_mismatch <- rbinom(n, 1, 1 - match_score) obs_time <- true_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) # Shuffle time and status together for mismatched records shuffled_idx <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled_idx] obs_status[mismatch_idx] <- obs_status[shuffled_idx] linked_data <- data.frame(time = obs_time, status = obs_status, x = x, match_score) # Specify the Adjustment Method adj <- adjMixture( linked.data = linked_data, m.formula = ~ match_score ) # Fit the Adjusted Cox Proportional Hazards Model fit <- plcoxph( Surv(time, status) ~ x, adjustment = adj, control = list(max.iter = 50) )
plctable constructs a contingency table and adjusts the fitted model for
mismatch errors.
plctable( formula, adjustment, subset, na.action, exclude = c(NA, NaN), control = list(), ... )plctable( formula, adjustment, subset, na.action, exclude = c(NA, NaN), control = list(), ... )
formula |
a formula object with the left and right hand sides specifying the column and row variable of the flat table, respectively. |
adjustment |
An object inheriting from class |
subset |
An optional vector specifying a subset of observations to be used. |
na.action |
A function which indicates what should happen when the data contain NAs. |
exclude |
Vector of values to be excluded when forming the table (passed to |
control |
A list of parameters for controlling the linkage error adjustment process. |
... |
Additional arguments passed to the internal function. |
An object representing the fitted model. The specific class and structure of the
returned object depend directly on the adjustment method provided:
If adjustment is of class adjMixture, returns an object of class ctableMixture.
set.seed(102) n <- 400 # Simulate true categorical data exposure <- sample(c("low", "high"), n, replace = TRUE) # True relationship: high exposure -> higher chance of disease prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_outcome <- ifelse(runif(n) < prob_disease, "disease", "healthy") # Induce linkage (mismatch) errors at a fixed overall rate true_mismatch_rate <- 0.20 is_mismatch <- rbinom(n, 1, true_mismatch_rate) obs_outcome <- true_outcome mismatch_idx <- which(is_mismatch == 1) # Shuffle outcomes for the mismatched records obs_outcome[mismatch_idx] <- sample(obs_outcome[mismatch_idx]) linked_df <- data.frame(exposure, outcome = obs_outcome) # Specify the Adjustment Method adj <- adjMixture( linked.data = linked_df, m.rate = true_mismatch_rate ) # Fit the adjusted contingency table model fit <- plctable( ~ exposure + outcome, adjustment = adj )set.seed(102) n <- 400 # Simulate true categorical data exposure <- sample(c("low", "high"), n, replace = TRUE) # True relationship: high exposure -> higher chance of disease prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_outcome <- ifelse(runif(n) < prob_disease, "disease", "healthy") # Induce linkage (mismatch) errors at a fixed overall rate true_mismatch_rate <- 0.20 is_mismatch <- rbinom(n, 1, true_mismatch_rate) obs_outcome <- true_outcome mismatch_idx <- which(is_mismatch == 1) # Shuffle outcomes for the mismatched records obs_outcome[mismatch_idx] <- sample(obs_outcome[mismatch_idx]) linked_df <- data.frame(exposure, outcome = obs_outcome) # Specify the Adjustment Method adj <- adjMixture( linked.data = linked_df, m.rate = true_mismatch_rate ) # Fit the adjusted contingency table model fit <- plctable( ~ exposure + outcome, adjustment = adj )
plglm fits generalized linear models (GLMs) to linked data, incorporating
adjustments for linkage error as specified in the provided adjustment object.
It mimics the interface of glm to ensure familiarity for users.
plglm( formula, family = gaussian, adjustment, subset, na.action, model = TRUE, x = FALSE, y = FALSE, control = list(), ... )plglm( formula, family = gaussian, adjustment, subset, na.action, model = TRUE, x = FALSE, y = FALSE, control = list(), ... )
formula |
A symbolic description of the model to be fitted. |
family |
A description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. |
adjustment |
An object inheriting from class |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
A function which indicates what should happen when the data
contain NAs. The default is set by the |
model |
Logical; if |
x, y
|
Logical; if |
control |
A list of parameters for controlling the linkage error adjustment process. |
... |
Additional arguments passed to the underlying fitting function. |
This function attempts to extract the linked data from the adjustment
object. It supports both reference-based storage (via data_ref) and
direct list components (adjustment$data). If the data is not present
(e.g., NULL), the function will attempt to resolve variables from the
environment of the formula.
It applies the standard model.frame processing steps (formula parsing,
subsetting, NA handling) and dispatches the resulting design matrix and
response vector to the appropriate fitglm method.
An object representing the fitted model. The specific class and structure of the
returned object depend directly on the adjustment method provided:
If adjustment is of class adjELE, returns an object of class glmELE.
If adjustment is of class adjMixture, returns an object of class glmMixture.
If adjustment is of class adjMixBayes, returns an object of class glmMixBayes.
adjELE, adjMixture, adjMixBayes, glmELE, glmMixture, glmMixBayes
# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object )# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object )
plsurvreg fits parametric survival models (Accelerated Failure Time models)
to linked data. It serves as a wrapper for the internal engines, compatible with
survreg specifications.
plsurvreg( formula, adjustment, subset, na.action, dist = "weibull", model = TRUE, x = FALSE, y = FALSE, control = list(), ... )plsurvreg( formula, adjustment, subset, na.action, dist = "weibull", model = TRUE, x = FALSE, y = FALSE, control = list(), ... )
formula |
A formula object, with the response on the left of a ~ operator,
and the terms on the right. The response must be a survival object as returned
by the |
adjustment |
An object inheriting from class |
subset |
An optional vector specifying a subset of observations. |
na.action |
A function for handling NAs. |
dist |
Character string specifying the survival distribution (currently it must be "weibull" or "gamma"). |
model |
Logical; if |
x, y
|
Logical; if |
control |
A list of control parameters. |
... |
Additional arguments passed to the internal engine. |
An object representing the fitted model. The specific class and structure of the
returned object depend directly on the adjustment method provided:
If adjustment is of class adjMixBayes, returns an object of class survregMixBayes.
library(survival) set.seed(202) n <- 200 # Simulate Weibull AFT data trt <- rbinom(n, 1, 0.5) true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, 0.05) true_obs_time <- pmin(true_time, cens_time) true_status <- as.numeric(true_time <= cens_time) # Induce linkage mismatch errors by... is_mismatch <- rbinom(n, 1, 0.2) # ~20% overall mismatch rate obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) # Shuffle time and status together for mismatched records shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt) # Specify the Bayesian Mixture Adjustment adj <- adjMixBayes(linked.data = linked_df) # Fit the Adjusted Parametric Survival Model fit <- plsurvreg( Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 2000, burnin.iterations = 500) )library(survival) set.seed(202) n <- 200 # Simulate Weibull AFT data trt <- rbinom(n, 1, 0.5) true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, 0.05) true_obs_time <- pmin(true_time, cens_time) true_status <- as.numeric(true_time <= cens_time) # Induce linkage mismatch errors by... is_mismatch <- rbinom(n, 1, 0.2) # ~20% overall mismatch rate obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) # Shuffle time and status together for mismatched records shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt) # Specify the Bayesian Mixture Adjustment adj <- adjMixBayes(linked.data = linked_df) # Fit the Adjusted Parametric Survival Model fit <- plsurvreg( Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 2000, burnin.iterations = 500) )
Computes linear predictors or risk scores from a fitted coxphELE model.
If newdata is provided, the method attempts to construct the model matrix
from the call stored in the object.
## S3 method for class 'coxphELE' predict(object, newdata = NULL, type = c("lp", "risk"), ...)## S3 method for class 'coxphELE' predict(object, newdata = NULL, type = c("lp", "risk"), ...)
object |
An object of class |
newdata |
Optional new data frame to obtain predictions for. If omitted, the linear predictors of the original data are returned. |
type |
The type of prediction required. Type |
... |
Additional arguments (currently ignored). |
If newdata is supplied, the function attempts to retrieve the formula
from object$call or object$formula. If the original model was fitted
using the internal engine directly without a wrapper that stores the call/formula,
prediction on new data will fail.
A numeric vector of predictions.
library(survival) set.seed(105) # Simulate linked data subject to mismatch error n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.7) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Induce 15% linkage error mis_idx <- sample(1:n, size = 0.15 * n) x1[mis_idx] <- x1[sample(mis_idx)] x2[mis_idx] <- x2[sample(mis_idx)] # Linked data linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = x1, x2 = x2 ) # Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj) # 1. Extract linear predictors for the original data lp_train <- predict(fit, type = "lp") head(lp_train) # 2. Predict hazard ratios (risk) for a new cohort new_cohort <- data.frame(x1 = c(0, 1.5, -1), x2 = c(0, 1, 1)) risk_scores <- predict(fit, newdata = new_cohort, type = "risk") print(risk_scores)library(survival) set.seed(105) # Simulate linked data subject to mismatch error n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.7) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Induce 15% linkage error mis_idx <- sample(1:n, size = 0.15 * n) x1[mis_idx] <- x1[sample(mis_idx)] x2[mis_idx] <- x2[sample(mis_idx)] # Linked data linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = x1, x2 = x2 ) # Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj) # 1. Extract linear predictors for the original data lp_train <- predict(fit, type = "lp") head(lp_train) # 2. Predict hazard ratios (risk) for a new cohort new_cohort <- data.frame(x1 = c(0, 1.5, -1), x2 = c(0, 1, 1)) risk_scores <- predict(fit, newdata = new_cohort, type = "risk") print(risk_scores)
Compute fitted values and predictions for the outcome model component of the mixture. The predictions are conditional on the latent status being a "correct match".
## S3 method for class 'coxphMixture' predict( object, newdata, type = c("lp", "risk", "expected", "survival"), se.fit = FALSE, na.action = stats::na.pass, reference = "strata", ... )## S3 method for class 'coxphMixture' predict( object, newdata, type = c("lp", "risk", "expected", "survival"), se.fit = FALSE, na.action = stats::na.pass, reference = "strata", ... )
object |
An object of class |
newdata |
Optional new data frame. If missing, predictions are for the original data. |
type |
The type of prediction.
|
se.fit |
Logical; whether to compute standard errors (based on the sandwich/Louis variance). |
na.action |
Function to handle missing values in |
reference |
Reference for centering (currently ignored, defaults to uncentered). |
... |
Additional arguments passed to methods. |
When newdata is supplied, the function constructs the model matrix using
the terms from the original fit. Standard errors are computed using the
estimated variance-covariance matrix of the mixture model coefficients.
For type = "expected" and "survival", the function reconstructs the
cumulative baseline hazard step function using the Breslow
estimator stored in the object and evaluates it at the time points found in
newdata.
A vector or matrix of predictions, or a list containing fit and
se.fit if standard errors are requested.
library(survival) set.seed(205) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) match_score <- runif(n, 0.5, 1.0) mis_idx <- which(rbinom(n, 1, prob = 1 - match_score) == 1) x1[mis_idx] <- x1[sample(mis_idx)] x2[mis_idx] <- x2[sample(mis_idx)] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = x1, x2 = x2, match_score = match_score ) # Fit the Cox PH Mixture Model # Note: We set `y = TRUE` to store the response for baseline hazard reconstruction adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, y = TRUE, control = list(max.iter = 15)) # 1. Extract linear predictors for the original data lp_train <- predict(fit, type = "lp") head(lp_train) # 2. Predict hazard ratios (risk) and expected events for a new cohort new_cohort <- data.frame( time = c(1.0, 2.5, 0.5), # Required for expected/survival types x1 = c(0, 1.5, -1), x2 = c(0, 1, 1) ) # Predict risk (exp(lp)) risk_scores <- predict(fit, newdata = new_cohort, type = "risk") print(risk_scores) # Predict expected number of events based on the reconstructed baseline hazard exp_events <- predict(fit, newdata = new_cohort, type = "expected") print(exp_events)library(survival) set.seed(205) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) match_score <- runif(n, 0.5, 1.0) mis_idx <- which(rbinom(n, 1, prob = 1 - match_score) == 1) x1[mis_idx] <- x1[sample(mis_idx)] x2[mis_idx] <- x2[sample(mis_idx)] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = x1, x2 = x2, match_score = match_score ) # Fit the Cox PH Mixture Model # Note: We set `y = TRUE` to store the response for baseline hazard reconstruction adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, y = TRUE, control = list(max.iter = 15)) # 1. Extract linear predictors for the original data lp_train <- predict(fit, type = "lp") head(lp_train) # 2. Predict hazard ratios (risk) and expected events for a new cohort new_cohort <- data.frame( time = c(1.0, 2.5, 0.5), # Required for expected/survival types x1 = c(0, 1.5, -1), x2 = c(0, 1, 1) ) # Predict risk (exp(lp)) risk_scores <- predict(fit, newdata = new_cohort, type = "risk") print(risk_scores) # Predict expected number of events based on the reconstructed baseline hazard exp_events <- predict(fit, newdata = new_cohort, type = "expected") print(exp_events)
glmELE ObjectsObtains predictions and optionally standard errors from a fitted glmELE object.
This method handles new data, factor level consistency, offsets, and confidence intervals.
## S3 method for class 'glmELE' predict( object, newdata = NULL, weight.matrix = NULL, type = c("link", "response"), se.fit = FALSE, interval = c("none", "confidence"), level = 0.95, na.action = stats::na.pass, ... )## S3 method for class 'glmELE' predict( object, newdata = NULL, weight.matrix = NULL, type = c("link", "response"), se.fit = FALSE, interval = c("none", "confidence"), level = 0.95, na.action = stats::na.pass, ... )
object |
An object of class |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors from the object are used. |
weight.matrix |
Character string specifying the weighting method to use (e.g., "ratio", "LL", "BLUE"). Defaults to the first method found in the object. |
type |
The type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. |
se.fit |
Logical; if |
interval |
Type of interval calculation. Can be "none" or "confidence". |
level |
The confidence level required (default 0.95). |
na.action |
A function determining what should be done with missing values in |
... |
Additional arguments passed to methods. |
If se.fit = FALSE and interval = "none", a vector of predictions.
Otherwise, a list containing:
fit |
Predictions (or a matrix with columns |
se.fit |
Estimated standard errors (if requested). |
residual.scale |
The dispersion parameter used. |
data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) predict(fit)data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) predict(fit)
Predictions from a glmMixBayes model
## S3 method for class 'glmMixBayes' predict( object, newx, type = c("link", "response"), se.fit = FALSE, interval = c("none", "credible"), level = 0.95, ... )## S3 method for class 'glmMixBayes' predict( object, newx, type = c("link", "response"), se.fit = FALSE, interval = c("none", "credible"), level = 0.95, ... )
object |
A |
newx |
A numeric matrix of new observations (n_new x K) with columns aligned
to the design matrix |
type |
Either |
se.fit |
Logical; if |
interval |
Either |
level |
Probability level for the credible interval (default 0.95). |
... |
Not used. |
If se.fit = FALSE and interval = "none", a numeric vector of predicted values.
Otherwise, a matrix with columns for the fit, (optional) se.fit, and (optional)
credible interval bounds.
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) newx <- cbind(1, poly(c(0.2, 0.5, 0.8), 3, raw = TRUE)) predict(fit, newx = newx, type = "response")data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) newx <- cbind(1, poly(c(0.2, 0.5, 0.8), 3, raw = TRUE)) predict(fit, newx = newx, type = "response")
Obtains predictions and optionally estimates standard errors of those predictions
from a fitted glmMixture object.
## S3 method for class 'glmMixture' predict( object, newdata = NULL, type = c("link", "response"), se.fit = FALSE, interval = c("none", "confidence"), level = 0.95, na.action = stats::na.pass, ... )## S3 method for class 'glmMixture' predict( object, newdata = NULL, type = c("link", "response"), se.fit = FALSE, interval = c("none", "confidence"), level = 0.95, na.action = stats::na.pass, ... )
object |
An object of class |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used. |
type |
The type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. |
se.fit |
Logical switch indicating if standard errors are required. |
interval |
Type of interval calculation ("none" or "confidence"). |
level |
Confidence level. |
na.action |
Function determining what should be done with missing values in |
... |
Additional arguments (currently ignored). |
If newdata is omitted, the predictions are based on the data used for the fit.
In that case, type = "link" corresponds to object$linear.predictors and
type = "response" corresponds to object$fitted.values. If newdata is supplied, the function manually constructs the design matrix
from the terms object stored in the model. Standard errors are computed using the
sandwich covariance matrix (object$var).
If se.fit = FALSE, a vector of predictions.
If se.fit = TRUE, a list with components:
fit |
Predictions. |
se.fit |
Estimated standard errors. |
residual.scale |
A scalar giving the square root of the dispersion used in computing the standard errors. |
# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) predict(fit)# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) predict(fit)
Computes posterior predictions for each latent component of a
survMixBayes model. By default, predictions are returned on the
linear predictor scale for both components.
## S3 method for class 'survMixBayes' predict( object, newdata = NULL, se.fit = FALSE, interval = c("none", "credible"), level = 0.95, ... )## S3 method for class 'survMixBayes' predict( object, newdata = NULL, se.fit = FALSE, interval = c("none", "credible"), level = 0.95, ... )
object |
A |
newdata |
A numeric matrix of new observations ( |
se.fit |
Logical; if |
interval |
Either |
level |
Probability level for the credible interval (default 0.95). |
... |
Not used. |
Component 1 is interpreted as the correct-match component and component 2 as the incorrect-match component (after label-switching correction).
A list with two components, component1 and component2,
corresponding to the two latent mixture components.
If se.fit = FALSE and interval = "none", each element is a
numeric vector of posterior mean linear predictors.
Otherwise, each element is a matrix containing the fitted values and,
optionally, posterior SDs and credible interval bounds.
set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) # Create a design matrix for new covariate values newdata <- stats::model.matrix(~ trt, data = data.frame(trt = c(0, 1))) # Predict posterior mean linear predictors for each latent component preds <- predict(fit, newdata = newdata, se.fit = TRUE, interval = "credible") print(preds$component1) print(preds$component2)set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) # Create a design matrix for new covariate values newdata <- stats::model.matrix(~ trt, data = data.frame(trt = c(0, 1))) # Predict posterior mean linear predictors for each latent component preds <- predict(fit, newdata = newdata, se.fit = TRUE, interval = "credible") print(preds$component1) print(preds$component2)
adjELE ObjectsProvides a concise summary of the adjustment object created by adjELE,
including linkage error assumptions, blocking structure, and weight estimation settings.
## S3 method for class 'adjELE' print(x, digits = 3, ...)## S3 method for class 'adjELE' print(x, digits = 3, ...)
x |
An object of class |
digits |
Integer; the number of significant digits to use when printing numeric values. Defaults to 3. |
... |
Additional arguments passed to methods. |
This method inspects the internal structure of the adjustment object. It calculates summaries for mismatch rates and audit sizes (e.g., means/ranges) if they vary across blocks, providing a snapshot of the error assumption complexity. It safely handles cases where the reference data is missing or empty.
Invisibly returns the input object x.
data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") print(adj_object)data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") print(adj_object)
adjMixBayes ObjectsProvides a concise summary of the Bayesian adjustment object created by
adjMixBayes.
## S3 method for class 'adjMixBayes' print(x, ...)## S3 method for class 'adjMixBayes' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to methods. |
This method inspects the reference-based data environment to report the number of linked records without copying the full dataset. It safely handles cases where the linked data is unspecified (NULL). It also prints the user-specified priors or outlines the defaults that will be used.
Invisibly returns x.
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) adj_obj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) # Implicitly calls print.adjMixBayes() adj_objdata(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) adj_obj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) # Implicitly calls print.adjMixBayes() adj_obj
adjMixture ObjectsProvides a concise summary of the adjustment object created by adjMixture,
including dataset dimensions, model specifications, and constraints.
## S3 method for class 'adjMixture' print(x, digits = 3, ...)## S3 method for class 'adjMixture' print(x, digits = 3, ...)
x |
An object of class |
digits |
Integer; the number of significant digits to use when printing numeric values (e.g., mismatch rates). Defaults to 3. |
... |
Additional arguments passed to methods. |
This method inspects the reference-based data environment to report the number of linked records without copying the full dataset. It considers cases where components (like constraints or safe matches) are unspecified.
Invisibly returns the input object x.
# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Print specified adjustment print(adj_object)# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Print specified adjustment print(adj_object)
Prints a short summary of the fitted CoxPH model with linkage error adjustment, displaying the call (if available) and the estimated coefficients.
## S3 method for class 'coxphELE' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'coxphELE' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
An object of class |
digits |
The number of significant digits to use when printing. |
... |
Additional arguments passed to |
Invisibly returns the input object x.
library(survival) set.seed(104) n <- 200 # 1. Simulate covariates age_centered <- rnorm(n, 0, 5) treatment <- rbinom(n, 1, 0.5) # 2. Simulate true survival times true_time <- rexp(n, rate = exp(0.05 * age_centered - 0.6 * treatment)) cens_time <- rexp(n, rate = 0.2) time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 3. Induce 15% Exchangeable Linkage Error (ELE) mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age_centered linked_trt <- treatment # False links drawn uniformly from the target population false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age_centered[false_link_idx] linked_trt[mis_idx] <- treatment[false_link_idx] linked_data <- data.frame(time = time, status = status, age = linked_age, treatment = linked_trt) # 4. Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ age + treatment, adjustment = adj) # 5. Print the basic model output print(fit)library(survival) set.seed(104) n <- 200 # 1. Simulate covariates age_centered <- rnorm(n, 0, 5) treatment <- rbinom(n, 1, 0.5) # 2. Simulate true survival times true_time <- rexp(n, rate = exp(0.05 * age_centered - 0.6 * treatment)) cens_time <- rexp(n, rate = 0.2) time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 3. Induce 15% Exchangeable Linkage Error (ELE) mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age_centered linked_trt <- treatment # False links drawn uniformly from the target population false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age_centered[false_link_idx] linked_trt[mis_idx] <- treatment[false_link_idx] linked_data <- data.frame(time = time, status = status, age = linked_age, treatment = linked_trt) # 4. Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ age + treatment, adjustment = adj) # 5. Print the basic model output print(fit)
Print a coxphMixture Object
## S3 method for class 'coxphMixture' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'coxphMixture' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
An object of class |
digits |
The number of significant digits to use. |
... |
Additional arguments. |
Invisibly returns the input object x.
library(survival) set.seed(201) # Simulate survival data (N = 200) n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch match_score <- runif(n, 0.5, 1.0) is_mismatch <- rbinom(n, 1, prob = 1 - match_score) # Induce linkage errors by shuffling covariates of mismatched records linked_x1 <- x1 linked_x2 <- x2 mis_idx <- which(is_mismatch == 1) shuffled_idx <- sample(mis_idx) linked_x1[mis_idx] <- x1[shuffled_idx] linked_x2[mis_idx] <- x2[shuffled_idx] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = linked_x1, x2 = linked_x2, match_score = match_score ) # Fit the Cox PH Mixture Model (Slawski et al., 2023) adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, control = list(max.iter = 15)) # Explicitly call the print method print(fit)library(survival) set.seed(201) # Simulate survival data (N = 200) n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch match_score <- runif(n, 0.5, 1.0) is_mismatch <- rbinom(n, 1, prob = 1 - match_score) # Induce linkage errors by shuffling covariates of mismatched records linked_x1 <- x1 linked_x2 <- x2 mis_idx <- which(is_mismatch == 1) shuffled_idx <- sample(mis_idx) linked_x1[mis_idx] <- x1[shuffled_idx] linked_x2[mis_idx] <- x2[shuffled_idx] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = linked_x1, x2 = linked_x2, match_score = match_score ) # Fit the Cox PH Mixture Model (Slawski et al., 2023) adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, control = list(max.iter = 15)) # Explicitly call the print method print(fit)
Prints the estimated contingency table (corrected for linkage error) and a summary of the adjustment parameters used by the mixture model.
## S3 method for class 'ctableMixture' print(x, digits = 3, ...)## S3 method for class 'ctableMixture' print(x, digits = 3, ...)
x |
An object of class |
digits |
Integer; the number of significant digits to use when printing numeric values. Defaults to 3. |
... |
Additional arguments passed to |
The argument x, invisibly.
set.seed(125) n <- 300 # 1. Simulate true categorical data with dependency exposure <- sample(c("low", "high"), n, replace = TRUE) # Induce dependency - High exposure -> higher disease probability prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_disease <- ifelse(runif(n) < prob_disease, "yes", "no") # 2. Induce 15% linkage error mis_idx <- sample(1:n, size = floor(0.15 * n)) obs_disease <- true_disease obs_disease[mis_idx] <- sample(obs_disease[mis_idx]) linked_df <- data.frame(exposure = exposure, disease = obs_disease) # 3. Fit the adjusted contingency table model adj <- adjMixture(linked.data = linked_df, m.rate = 0.15) fit <- plctable(~ exposure + disease, adjustment = adj) # 4. Explicitly call the print method print(fit)set.seed(125) n <- 300 # 1. Simulate true categorical data with dependency exposure <- sample(c("low", "high"), n, replace = TRUE) # Induce dependency - High exposure -> higher disease probability prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_disease <- ifelse(runif(n) < prob_disease, "yes", "no") # 2. Induce 15% linkage error mis_idx <- sample(1:n, size = floor(0.15 * n)) obs_disease <- true_disease obs_disease[mis_idx] <- sample(obs_disease[mis_idx]) linked_df <- data.frame(exposure = exposure, disease = obs_disease) # 3. Fit the adjusted contingency table model adj <- adjMixture(linked.data = linked_df, m.rate = 0.15) fit <- plctable(~ exposure + disease, adjustment = adj) # 4. Explicitly call the print method print(fit)
glmELE ObjectPrints the function call and the estimated coefficient matrices from a fitted
glmELE object.
## S3 method for class 'glmELE' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'glmELE' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
An object of class |
digits |
The number of significant digits to print. Defaults to
|
... |
Additional arguments passed to methods. |
Invisibly returns the input object x.
data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) print(fit)data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) print(fit)
Print a glmMixBayes model object
## S3 method for class 'glmMixBayes' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'glmMixBayes' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
An object of class |
digits |
Minimum number of significant digits to show. |
... |
Further arguments (unused). |
The input x, invisibly.
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) print(fit)data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) print(fit)
Print a glmMixture Object
## S3 method for class 'glmMixture' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'glmMixture' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
An object of class |
digits |
The number of significant digits to use. |
... |
Additional arguments. |
Invisibly returns the input object x.
# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) print(fit)# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) print(fit)
Print pooled regression results
## S3 method for class 'mi_link_pool_glm' print(x, digits = max(3L, getOption("digits") - 2L), ...)## S3 method for class 'mi_link_pool_glm' print(x, digits = max(3L, getOption("digits") - 2L), ...)
x |
An object of class |
digits |
the number of significant digits to print. |
... |
further arguments (unused). |
The input x, invisibly.
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) pooled_fit <- mi_with( object = fit, data = lifem_small, formula = age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = gaussian() ) print(pooled_fit, digits = 4)data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) pooled_fit <- mi_with( object = fit, data = lifem_small, formula = age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = gaussian() ) print(pooled_fit, digits = 4)
Print pooled Cox regression results
## S3 method for class 'mi_link_pool_survreg' print(x, digits = max(3L, getOption("digits") - 2L), ...)## S3 method for class 'mi_link_pool_survreg' print(x, digits = max(3L, getOption("digits") - 2L), ...)
x |
An object of class |
digits |
the number of significant digits to print. |
... |
further arguments (unused). |
The input x, invisibly.
set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) pooled_obj <- mi_with( object = fit, data = linked_df, formula = survival::Surv(time, status) ~ trt ) print(pooled_obj, digits = 4)set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) pooled_obj <- mi_with( object = fit, data = linked_df, formula = survival::Surv(time, status) ~ trt ) print(pooled_obj, digits = 4)
Prints the model call and posterior mean regression coefficients for the first mixture component of the fitted survival model. In this package, component 1 is interpreted as the correct-match component and component 2 as the incorrect-match component.
## S3 method for class 'survMixBayes' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'survMixBayes' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
An object of class |
digits |
Minimum number of significant digits to show. |
... |
Further arguments (unused). |
The input x, invisibly.
set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) print(fit)set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) print(fit)
Produces a summary of the fitted CoxPH model with linkage error adjustment, including coefficient estimates, hazard ratios, standard errors, z-statistics, and p-values.
## S3 method for class 'coxphELE' summary(object, conf.int = 0.95, ...)## S3 method for class 'coxphELE' summary(object, conf.int = 0.95, ...)
object |
An object of class |
conf.int |
The confidence level for the confidence intervals (default is 0.95). |
... |
Additional arguments (currently ignored). |
An object of class summary.coxphELE containing:
call |
The matched call. |
coefficients |
A matrix with columns for coefficients, hazard ratios (exp(coef)), standard errors, z-values, and p-values. |
conf.int |
A matrix of confidence intervals for the hazard ratios. |
n |
The number of observations. |
nevent |
The number of events. |
library(survival) set.seed(104) n <- 200 # 1. Simulate covariates age_centered <- rnorm(n, 0, 5) treatment <- rbinom(n, 1, 0.5) # 2. Simulate true survival times true_time <- rexp(n, rate = exp(0.05 * age_centered - 0.6 * treatment)) cens_time <- rexp(n, rate = 0.2) time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 3. Induce 15% Exchangeable Linkage Error (ELE) mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age_centered linked_trt <- treatment # False links drawn uniformly from the target population false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age_centered[false_link_idx] linked_trt[mis_idx] <- treatment[false_link_idx] linked_data <- data.frame(time = time, status = status, age = linked_age, treatment = linked_trt) # 4. Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ age + treatment, adjustment = adj) # 5. Generate and print the detailed statistical summary sum_fit <- summary(fit) print(sum_fit)library(survival) set.seed(104) n <- 200 # 1. Simulate covariates age_centered <- rnorm(n, 0, 5) treatment <- rbinom(n, 1, 0.5) # 2. Simulate true survival times true_time <- rexp(n, rate = exp(0.05 * age_centered - 0.6 * treatment)) cens_time <- rexp(n, rate = 0.2) time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 3. Induce 15% Exchangeable Linkage Error (ELE) mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age_centered linked_trt <- treatment # False links drawn uniformly from the target population false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age_centered[false_link_idx] linked_trt[mis_idx] <- treatment[false_link_idx] linked_data <- data.frame(time = time, status = status, age = linked_age, treatment = linked_trt) # 4. Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ age + treatment, adjustment = adj) # 5. Generate and print the detailed statistical summary sum_fit <- summary(fit) print(sum_fit)
summary method for class coxphMixture. Provides a detailed summary
of the fitted model, including coefficients, hazard ratios, standard errors,
z-statistics, and p-values for both the outcome model and the mismatch model.
## S3 method for class 'coxphMixture' summary(object, conf.int = 0.95, scale = 1, ...)## S3 method for class 'coxphMixture' summary(object, conf.int = 0.95, scale = 1, ...)
object |
An object of class |
conf.int |
The confidence level for the confidence intervals of the hazard ratios. |
scale |
Scale factor for the standard errors (default is 1). |
... |
Additional arguments. |
An object of class summary.coxphMixture containing:
call |
The function call. |
n |
Total number of observations. |
nevent |
Number of events. |
coefficients |
Matrix of coefficients for the outcome model. |
m.coefficients |
Matrix of coefficients for the mismatch model. |
conf.int |
Matrix of confidence intervals for the hazard ratios. |
logtest |
Log-likelihood information (Outcome Model). |
avgcmr |
The average posterior probability of a correct match. |
library(survival) set.seed(201) # Simulate survival data (N = 200) n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch match_score <- runif(n, 0.5, 1.0) is_mismatch <- rbinom(n, 1, prob = 1 - match_score) # Induce linkage errors by shuffling covariates of mismatched records linked_x1 <- x1 linked_x2 <- x2 mis_idx <- which(is_mismatch == 1) shuffled_idx <- sample(mis_idx) linked_x1[mis_idx] <- x1[shuffled_idx] linked_x2[mis_idx] <- x2[shuffled_idx] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = linked_x1, x2 = linked_x2, match_score = match_score ) # Fit the Cox PH Mixture Model (Slawski et al., 2023) adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, control = list(max.iter = 15)) # Print detailed statistical summary sum_fit <- summary(fit) print(sum_fit)library(survival) set.seed(201) # Simulate survival data (N = 200) n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch match_score <- runif(n, 0.5, 1.0) is_mismatch <- rbinom(n, 1, prob = 1 - match_score) # Induce linkage errors by shuffling covariates of mismatched records linked_x1 <- x1 linked_x2 <- x2 mis_idx <- which(is_mismatch == 1) shuffled_idx <- sample(mis_idx) linked_x1[mis_idx] <- x1[shuffled_idx] linked_x2[mis_idx] <- x2[shuffled_idx] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = linked_x1, x2 = linked_x2, match_score = match_score ) # Fit the Cox PH Mixture Model (Slawski et al., 2023) adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, control = list(max.iter = 15)) # Print detailed statistical summary sum_fit <- summary(fit) print(sum_fit)
Provides a detailed summary of the ctableMixture model fit, including
the estimated cell probabilities with standard errors, convergence status,
and a Chi-squared test of independence performed on the adjusted counts.
## S3 method for class 'ctableMixture' summary(object, ...)## S3 method for class 'ctableMixture' summary(object, ...)
object |
An object of class |
... |
Additional arguments (currently ignored). |
An object of class summary.ctableMixture containing:
call |
The function call. |
m.rate |
The assumed mismatch rate. |
ftable |
The estimated contingency table of correctly matched counts. |
coefficients |
A matrix containing estimates, standard errors, z-values, and p-values for cell probabilities. |
chisq |
The result of a Pearson's Chi-squared test on the adjusted table. |
converged |
Logical indicating if the EM algorithm converged. |
iterations |
Number of iterations performed. |
set.seed(125) n <- 300 # 1. Simulate true categorical data with dependency exposure <- sample(c("low", "high"), n, replace = TRUE) # Induce dependency - High exposure -> higher disease probability prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_disease <- ifelse(runif(n) < prob_disease, "yes", "no") # 2. Induce 15% linkage error mis_idx <- sample(1:n, size = floor(0.15 * n)) obs_disease <- true_disease obs_disease[mis_idx] <- sample(obs_disease[mis_idx]) linked_df <- data.frame(exposure = exposure, disease = obs_disease) # 3. Fit the adjusted contingency table model adj <- adjMixture(linked.data = linked_df, m.rate = 0.15) fit <- plctable(~ exposure + disease, adjustment = adj) # 4. Generate the detailed summary object sum_fit <- summary(fit) # 5. Access specific components of the summary print(sum_fit$coefficients) print(sum_fit$chisq)set.seed(125) n <- 300 # 1. Simulate true categorical data with dependency exposure <- sample(c("low", "high"), n, replace = TRUE) # Induce dependency - High exposure -> higher disease probability prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_disease <- ifelse(runif(n) < prob_disease, "yes", "no") # 2. Induce 15% linkage error mis_idx <- sample(1:n, size = floor(0.15 * n)) obs_disease <- true_disease obs_disease[mis_idx] <- sample(obs_disease[mis_idx]) linked_df <- data.frame(exposure = exposure, disease = obs_disease) # 3. Fit the adjusted contingency table model adj <- adjMixture(linked.data = linked_df, m.rate = 0.15) fit <- plctable(~ exposure + disease, adjustment = adj) # 4. Generate the detailed summary object sum_fit <- summary(fit) # 5. Access specific components of the summary print(sum_fit$coefficients) print(sum_fit$chisq)
glmELE ObjectSummarizes the results from a glmELE fit, providing coefficient estimates,
standard errors, test statistics, and p-values for each weighting method used.
## S3 method for class 'glmELE' summary(object, ...)## S3 method for class 'glmELE' summary(object, ...)
object |
An object of class |
... |
Additional arguments passed to methods. |
An object of class "summary.glmELE", which is a list containing:
call |
The matched call. |
family |
The family object used. |
coefficients |
A list of matrices, one per weighting method, containing estimates, SEs, t/z values, and p-values. |
dispersion |
The estimated dispersion parameter(s). |
deviance |
The deviance of the fitted model. |
df.residual |
The residual degrees of freedom. |
data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) summary(fit)data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) summary(fit)
Summary method for glmMixBayes models
## S3 method for class 'glmMixBayes' summary(object, ...)## S3 method for class 'glmMixBayes' summary(object, ...)
object |
An object of class |
... |
Not used. |
An object of class "summary.glmMixBayes", which is printed with a custom method.
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) summary(fit)data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) summary(fit)
summary method for class glmMixture.
## S3 method for class 'glmMixture' summary(object, dispersion = NULL, ...)## S3 method for class 'glmMixture' summary(object, dispersion = NULL, ...)
object |
An object of class |
dispersion |
The dispersion parameter for the family used. If NULL, it is inferred from object. |
... |
Additional arguments. |
An object of class summary.glmMixture containing:
call |
The component from object. |
family |
The component from object. |
df.residual |
The residual degrees of freedom. |
coefficients |
Matrix of coefficients for the outcome model. |
m.coefficients |
Matrix of coefficients for the mismatch model. |
dispersion |
Estimated dispersion parameter. |
cov.unscaled |
The estimated covariance matrix. |
match.prob |
The posterior match probabilities. |
# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) summary(fit)# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) summary(fit)
Computes posterior summaries for the regression coefficients, mixing weight,
and component-specific distribution parameters in a fitted
survMixBayes model. Throughout, component 1 is interpreted as the
correct-match component and component 2 as the incorrect-match component.
## S3 method for class 'survMixBayes' summary(object, probs = c(0.025, 0.5, 0.975), ...)## S3 method for class 'survMixBayes' summary(object, probs = c(0.025, 0.5, 0.975), ...)
object |
An object of class |
probs |
Numeric vector of probabilities used to compute posterior
quantiles for the model parameters. The default,
|
... |
Further arguments (unused). |
An object of class summary.survMixBayes containing posterior
quantile summaries for the regression coefficients in both mixture
components, the mixing weight, and any family-specific distribution
parameters included in the fitted model. Component 1 corresponds to the
correct-match component and component 2 to the incorrect-match component.
set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) fit_summary <- summary(fit, probs = c(0.025, 0.5, 0.975)) print(fit_summary)set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) fit_summary <- summary(fit, probs = c(0.025, 0.5, 0.975)) print(fit_summary)
Fits a Bayesian two-component parametric survival regression model using Stan. Each observation is assumed to arise from one of two latent components with component-specific survival regression parameters.
survregMixBayes( X, y, dist = "weibull", priors = NULL, control = list(iterations = 10000, burnin.iterations = 1000, seed = sample.int(.Machine$integer.max, 1), cores = getOption("mc.cores", 1L)), ... )survregMixBayes( X, y, dist = "weibull", priors = NULL, control = list(iterations = 10000, burnin.iterations = 1000, seed = sample.int(.Machine$integer.max, 1), cores = getOption("mc.cores", 1L)), ... )
X |
A numeric design matrix ( |
y |
A survival response. This can be either a two-column numeric matrix
with columns |
dist |
Character string specifying the parametric survival distribution
used for both mixture components. Supported values are |
priors |
A named Intercept and slope priors are decoupled. Use |
control |
A named
Values supplied through |
... |
Optional overrides for elements of |
The function supports "gamma" and "weibull" component
distributions, with both components sharing the same family. Right-censored
survival outcomes are supported.
Posterior draws are returned for the component-specific regression parameters and mixing weight. To improve interpretability of posterior summaries, the function applies a post-processing step that aligns component labels across posterior draws.
An object of class "survMixBayes" containing (at least):
m_samplesPosterior draws of aligned latent component labels (matrix of size draws × N), where component 1 corresponds to the correct-match component and component 2 to the incorrect-match component.
estimates$coefficientsPosterior draws of regression coefficients for the correct-match component (component 1; draws × K).
estimates$m.coefficientsPosterior draws of regression coefficients for the incorrect-match component (component 2; draws × K).
estimates$thetaPosterior draws of the mixing weight for the correct-match component (component 1; vector of length draws).
estimates$shapePosterior draws of the shape parameter for the correct-match component (component 1; family-specific).
estimates$m.shapePosterior draws of the shape parameter for the incorrect-match component (component 2; family-specific).
estimates$scalePosterior draws of the scale parameter for the correct-match component (component 1; Weibull only).
estimates$m.scalePosterior draws of the scale parameter for the incorrect-match component (component 2; Weibull only).
familyThe survival distribution used in the model.
callThe matched function call.
Mixture models are invariant to permutations of component labels, which can lead to label switching in MCMC output. To ensure interpretable posterior summaries, this function applies a post-processing step that aligns component labels across posterior draws.
First, an optional global swap of labels (1 and 2) is performed
if component 2 is more frequent overall. Then, labels are
aligned across draws using the ECR-ITERATIVE-1
relabeling algorithm.
Gutman, R., Sammartino, C., Green, T., & Montague, B. (2016). Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+. Statistics in Medicine, 35(1), 115–129. doi:10.1002/sim.6586
Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4), 795–809. doi:10.1111/1467-9868.00265
Papastamoulis, P. (2016). label.switching: An R package for dealing with the label switching problem in MCMC outputs. Journal of Statistical Software, 69(1), 1–24. doi:10.18637/jss.v069.c01
# Example: Bayesian mixture survival model fit to linked survival data # with induced linkage mismatch errors # 1. Simulate a linked survival dataset set.seed(301) n <- 150 X <- matrix(rnorm(n * 2), ncol = 2) colnames(X) <- c("x1", "x2") # Generate survival times from a Weibull AFT model true_time <- rweibull( n, shape = 1.5, scale = exp(0.5 * X[, 1] - 0.5 * X[, 2]) ) # Apply right-censoring cens_time <- rexp(n, rate = 0.1) event <- as.integer(true_time <= cens_time) obs_time <- pmin(true_time, cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] event[mismatch_idx] <- event[shuffled] y <- cbind(time = obs_time, event = event) # 2. Fit the Bayesian two-component mixture survival model # Note: Iterations are set artificially low for run time fit <- survregMixBayes( X = X, y = y, dist = "weibull", control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) # 3. Inspect posterior summaries # (Label switching is handled automatically) cat("Component 1 (Correct Links):\n") print(colMeans(fit$estimates$coefficients)) cat("Component 2 (Incorrect Links):\n") print(colMeans(fit$estimates$m.coefficients)) cat("Estimated probability of correct linkage:\n") print(mean(fit$estimates$theta))# Example: Bayesian mixture survival model fit to linked survival data # with induced linkage mismatch errors # 1. Simulate a linked survival dataset set.seed(301) n <- 150 X <- matrix(rnorm(n * 2), ncol = 2) colnames(X) <- c("x1", "x2") # Generate survival times from a Weibull AFT model true_time <- rweibull( n, shape = 1.5, scale = exp(0.5 * X[, 1] - 0.5 * X[, 2]) ) # Apply right-censoring cens_time <- rexp(n, rate = 0.1) event <- as.integer(true_time <= cens_time) obs_time <- pmin(true_time, cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] event[mismatch_idx] <- event[shuffled] y <- cbind(time = obs_time, event = event) # 2. Fit the Bayesian two-component mixture survival model # Note: Iterations are set artificially low for run time fit <- survregMixBayes( X = X, y = y, dist = "weibull", control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) # 3. Inspect posterior summaries # (Label switching is handled automatically) cat("Component 1 (Correct Links):\n") print(colMeans(fit$estimates$coefficients)) cat("Component 2 (Incorrect Links):\n") print(colMeans(fit$estimates$m.coefficients)) cat("Estimated probability of correct linkage:\n") print(mean(fit$estimates$theta))
Extracts the variance-covariance matrix of the main parameters (outcome model coefficients)
from a fitted coxphELE model.
## S3 method for class 'coxphELE' vcov(object, ...)## S3 method for class 'coxphELE' vcov(object, ...)
object |
An object of class |
... |
Additional arguments (currently ignored). |
A matrix of the estimated covariances between the parameter estimates.
library(survival) set.seed(104) n <- 200 # 1. Simulate covariates age_centered <- rnorm(n, 0, 5) treatment <- rbinom(n, 1, 0.5) # 2. Simulate true survival times true_time <- rexp(n, rate = exp(0.05 * age_centered - 0.6 * treatment)) cens_time <- rexp(n, rate = 0.2) time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 3. Induce 15% Exchangeable Linkage Error (ELE) mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age_centered linked_trt <- treatment # False links drawn uniformly from the target population false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age_centered[false_link_idx] linked_trt[mis_idx] <- treatment[false_link_idx] linked_data <- data.frame(time = time, status = status, age = linked_age, treatment = linked_trt) # 4. Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ age + treatment, adjustment = adj) # 5. Extract the sandwich variance-covariance matrix vmat <- vcov(fit) print(vmat)library(survival) set.seed(104) n <- 200 # 1. Simulate covariates age_centered <- rnorm(n, 0, 5) treatment <- rbinom(n, 1, 0.5) # 2. Simulate true survival times true_time <- rexp(n, rate = exp(0.05 * age_centered - 0.6 * treatment)) cens_time <- rexp(n, rate = 0.2) time <- pmin(true_time, cens_time) status <- as.numeric(true_time <= cens_time) # 3. Induce 15% Exchangeable Linkage Error (ELE) mis_idx <- sample(1:n, size = floor(0.15 * n)) linked_age <- age_centered linked_trt <- treatment # False links drawn uniformly from the target population false_link_idx <- sample(1:n, size = length(mis_idx), replace = TRUE) linked_age[mis_idx] <- age_centered[false_link_idx] linked_trt[mis_idx] <- treatment[false_link_idx] linked_data <- data.frame(time = time, status = status, age = linked_age, treatment = linked_trt) # 4. Fit the adjusted Cox PH model adj <- adjELE(linked.data = linked_data, m.rate = 0.15) fit <- plcoxph(Surv(time, status) ~ age + treatment, adjustment = adj) # 5. Extract the sandwich variance-covariance matrix vmat <- vcov(fit) print(vmat)
Extracts the variance-covariance matrix of the main parameters from a fitted
coxphMixture object. The matrix is estimated using Louis' method (1982)
to account for the missing data structure (latent match status) inherent in the
mixture model.
## S3 method for class 'coxphMixture' vcov(object, ...)## S3 method for class 'coxphMixture' vcov(object, ...)
object |
An object of class |
... |
Additional arguments (currently ignored). |
A matrix of the estimated covariances between the parameter estimates.
The rows and columns correspond to the outcome model coefficients (beta)
and the mismatch model coefficients (gamma).
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 44(2), 226-233.
library(survival) set.seed(201) # Simulate survival data (N = 200) n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch match_score <- runif(n, 0.5, 1.0) is_mismatch <- rbinom(n, 1, prob = 1 - match_score) # Induce linkage errors by shuffling covariates of mismatched records linked_x1 <- x1 linked_x2 <- x2 mis_idx <- which(is_mismatch == 1) shuffled_idx <- sample(mis_idx) linked_x1[mis_idx] <- x1[shuffled_idx] linked_x2[mis_idx] <- x2[shuffled_idx] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = linked_x1, x2 = linked_x2, match_score = match_score ) # Fit the Cox PH Mixture Model (Slawski et al., 2023) adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, control = list(max.iter = 15)) # Extract the variance-covariance matrix # Note: For outcome coefficients (beta) and mismatch coefficients (gamma) vmat <- vcov(fit) print(vmat)library(survival) set.seed(201) # Simulate survival data (N = 200) n <- 200 x1 <- rnorm(n) x2 <- rbinom(n, 1, 0.5) true_time <- rexp(n, rate = exp(0.5 * x1 - 0.5 * x2)) cens_time <- rexp(n, rate = 0.5) # Simulate auxiliary match scores and linkage errors # Lower match scores correspond to a higher probability of mismatch match_score <- runif(n, 0.5, 1.0) is_mismatch <- rbinom(n, 1, prob = 1 - match_score) # Induce linkage errors by shuffling covariates of mismatched records linked_x1 <- x1 linked_x2 <- x2 mis_idx <- which(is_mismatch == 1) shuffled_idx <- sample(mis_idx) linked_x1[mis_idx] <- x1[shuffled_idx] linked_x2[mis_idx] <- x2[shuffled_idx] linked_data <- data.frame( time = pmin(true_time, cens_time), status = as.numeric(true_time <= cens_time), x1 = linked_x1, x2 = linked_x2, match_score = match_score ) # Fit the Cox PH Mixture Model (Slawski et al., 2023) adj <- adjMixture(linked.data = linked_data, m.formula = ~ match_score) fit <- plcoxph(Surv(time, status) ~ x1 + x2, adjustment = adj, control = list(max.iter = 15)) # Extract the variance-covariance matrix # Note: For outcome coefficients (beta) and mismatch coefficients (gamma) vmat <- vcov(fit) print(vmat)
Extracts the estimated variance-covariance matrix of the cell probabilities
from a fitted ctableMixture object. The variance is estimated using
the observed information matrix (via the Hessian of the mixture log-likelihood).
## S3 method for class 'ctableMixture' vcov(object, ...)## S3 method for class 'ctableMixture' vcov(object, ...)
object |
An object of class |
... |
Additional arguments (currently ignored). |
A matrix of the estimated covariances between the cell probability estimates. The row and column names correspond to the cells of the table in row-major order (e.g., "(Row1, Col1)", "(Row1, Col2)", ...).
set.seed(125) n <- 300 # 1. Simulate true categorical data with dependency exposure <- sample(c("low", "high"), n, replace = TRUE) # Induce dependency - High exposure -> higher disease probability prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_disease <- ifelse(runif(n) < prob_disease, "yes", "no") # 2. Induce 15% linkage error mis_idx <- sample(1:n, size = floor(0.15 * n)) obs_disease <- true_disease obs_disease[mis_idx] <- sample(obs_disease[mis_idx]) linked_df <- data.frame(exposure = exposure, disease = obs_disease) # 3. Fit the adjusted contingency table model adj <- adjMixture(linked.data = linked_df, m.rate = 0.15) fit <- plctable(~ exposure + disease, adjustment = adj) # 4. Extract the variance-covariance matrix of the cell probabilities vmat <- vcov(fit) print(vmat)set.seed(125) n <- 300 # 1. Simulate true categorical data with dependency exposure <- sample(c("low", "high"), n, replace = TRUE) # Induce dependency - High exposure -> higher disease probability prob_disease <- ifelse(exposure == "high", 0.7, 0.3) true_disease <- ifelse(runif(n) < prob_disease, "yes", "no") # 2. Induce 15% linkage error mis_idx <- sample(1:n, size = floor(0.15 * n)) obs_disease <- true_disease obs_disease[mis_idx] <- sample(obs_disease[mis_idx]) linked_df <- data.frame(exposure = exposure, disease = obs_disease) # 3. Fit the adjusted contingency table model adj <- adjMixture(linked.data = linked_df, m.rate = 0.15) fit <- plctable(~ exposure + disease, adjustment = adj) # 4. Extract the variance-covariance matrix of the cell probabilities vmat <- vcov(fit) print(vmat)
glmELE ObjectExtracts the variance-covariance matrix of the main parameters for a specific weighting method.
## S3 method for class 'glmELE' vcov(object, weight.matrix = NULL, ...)## S3 method for class 'glmELE' vcov(object, weight.matrix = NULL, ...)
object |
An object of class |
weight.matrix |
Character string specifying which weighting method to return. Defaults to the first method found in the object. |
... |
Additional arguments passed to methods. |
A matrix of the estimated covariances between the parameter estimates.
data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) vcov(fit)data(brfss, package = "postlink") adj_object <- adjELE(linked.data = brfss, m.rate = unique(brfss$m.rate), blocks = imonth, weight.matrix = "BLUE") fit <- plglm(Weight ~ Height + Physhlth + Menthlth + Exerany, family = "gaussian", adjustment = adj_object) vcov(fit)
Posterior covariance matrix for glmMixBayes coefficients
## S3 method for class 'glmMixBayes' vcov(object, ...)## S3 method for class 'glmMixBayes' vcov(object, ...)
object |
A |
... |
Not used. |
Posterior covariance matrix of the regression coefficients for component 1 (the correct-match component).
data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) vcov(fit)data(lifem) # lifem data preprocessing # For computational efficiency in the example, we work with a subset of the lifem data. lifem <- lifem[order(-(lifem$commf + lifem$comml)), ] lifem_small <- rbind( head(subset(lifem, hndlnk == 1), 100), head(subset(lifem, hndlnk == 0), 20) ) x <- cbind(1, poly(lifem_small$unit_yob, 3, raw = TRUE)) y <- lifem_small$age_at_death adj <- adjMixBayes( linked.data = lifem_small, priors = list(theta = "beta(2, 2)") ) fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj, control = list( iterations = 200, burnin.iterations = 100, seed = 123 ) ) vcov(fit)
Returns the variance-covariance matrix of the main parameters of a fitted
glmMixture object. The matrix is estimated using a sandwich estimator
to account for the mixture structure.
## S3 method for class 'glmMixture' vcov(object, ...)## S3 method for class 'glmMixture' vcov(object, ...)
object |
An object of class |
... |
Additional arguments (currently ignored). |
A matrix of the estimated covariances between the parameter estimates. Row and column names correspond to the parameter names (coefficients, dispersion, etc.).
# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) vcov(fit)# Load the LIFE-M demo dataset data(lifem) # Phase 1: Adjustment Specification # We model the correct match indicator via logistic regression using # name commonness scores (commf, comml) and a 5% expected mismatch rate. adj_object <- adjMixture( linked.data = lifem, m.formula = ~ commf + comml, m.rate = 0.05, safe.matches = hndlnk ) # Phase 2: Estimation & Inference # Fit a Gaussian regression model utilizing a cubic polynomial for year of birth. fit <- plglm( age_at_death ~ poly(unit_yob, 3, raw = TRUE), family = "gaussian", adjustment = adj_object ) vcov(fit)
Returns the empirical posterior covariance matrix of the regression
coefficients for component 1 of a fitted survMixBayes model.
In this package, component 1 is interpreted as the correct-match component.
## S3 method for class 'survMixBayes' vcov(object, ...)## S3 method for class 'survMixBayes' vcov(object, ...)
object |
A |
... |
Further arguments (unused). |
Posterior covariance matrix of the regression coefficients for component 1, interpreted as the correct-match component.
set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) # Extract the empirical posterior covariance matrix for component 1 vcov_mat <- vcov(fit) print(vcov_mat)set.seed(301) n <- 150 trt <- rbinom(n, 1, 0.5) # Simulate Weibull AFT data true_time <- rweibull(n, shape = 1.5, scale = exp(1 + 0.8 * trt)) cens_time <- rexp(n, rate = 0.1) true_obs_time <- pmin(true_time, cens_time) true_status <- as.integer(true_time <= cens_time) # Induce linkage mismatch errors in approximately 20% of records is_mismatch <- rbinom(n, 1, 0.2) obs_time <- true_obs_time obs_status <- true_status mismatch_idx <- which(is_mismatch == 1) shuffled <- sample(mismatch_idx) obs_time[mismatch_idx] <- obs_time[shuffled] obs_status[mismatch_idx] <- obs_status[shuffled] linked_df <- data.frame(time = obs_time, status = obs_status, trt = trt) adj <- adjMixBayes(linked.data = linked_df) fit <- plsurvreg( survival::Surv(time, status) ~ trt, dist = "weibull", adjustment = adj, control = list(iterations = 200, burnin.iterations = 100, seed = 123) ) # Extract the empirical posterior covariance matrix for component 1 vcov_mat <- vcov(fit) print(vcov_mat)