Run the Variational Bayes patient Phenotyping model

runModel(
  biomarkers,
  gmm_X,
  logit_X,
  gmm_delta = 1e-06,
  logit_delta = 1e-16,
  gmm_maxiters = 200,
  logit_maxiters = 10000,
  gmm_init = "kmeans",
  gmm_initParams = NULL,
  gmm_prior = NULL,
  logit_prior = NULL,
  gmm_stopIfELBOReverse = FALSE,
  gmm_verbose = FALSE,
  logit_verbose = FALSE
)

Arguments

biomarkers: The EHR variables that are biomarkers. This is a vector of data column names corresponding to the biomarker variables.
gmm_X: n x p data matrix (or data frame that will be converted to a matrix).
logit_X: The input design matrix. Note the intercept column vector is assumed included.
gmm_delta: Change in ELBO that triggers algorithm stopping.
logit_delta: Change in ELBO that triggers algorithm stopping.
gmm_maxiters: The maximum iterations for VB GMM.
logit_maxiters: The maximum iterations for VB logit.
gmm_init: Initialize the clusters c("random", "kmeans", "dbscan").
gmm_initParams: Parameters for an initialiser requiring its own parameters e.g. dbscan requires 'eps' and 'minPts'.
gmm_prior: An informative prior for the GMM.
logit_prior: An informative prior for the logit.
gmm_stopIfELBOReverse: Stop the VB iterations if the ELBO reverses direction (TRUE or FALSE).
gmm_verbose: Print out information per iteration to track progress in case of long-running experiments.
logit_verbose: Print out information per iteration to track progress in case of long-running experiments.

Value

A list containing:

prevalence - The mean probability of latent phenotype given the data and priors.
biomarker_shift - A data frame containing the biomarker shifts from normal for the phenotype.
gmm - The VB GMM results. For details see help(vb_gmm_cavi).
logit - The VB Logit results. For details see help(logit_CAVI).

Examples

if (FALSE) {
##Example 1: Use the internal Sickle Cell Disease data to find the rare
##           phenotype.  SCD is extremely rare so we use DBSCAN to initialise
##           the VB GMM. We also use an informative prior for the mixing
##           coefficient and stop iterations when the ELBO starts to reverse
##           so that we stop when the minor (SCD) component is reached.

library(data.table)

# Load the SCD example data supplied with the VBphenoR package
data(scd_cohort)

# We will use the SCD biomarkers to discover the SCD latent class.
# X1 is the data matrix for the VB GMM.
X1 <- scd_cohort[,.(CBC,RC)]

# We need to supply DBSCAN hyper-parameters as we will initialise VBphenoR
# with DBSCAN. See help(DBSCAN) for details of these parameters.
initParams <- c(0.15, 5)
names(initParams) <- c('eps','minPts')

# Set an informative prior for the VB GMM mixing coefficient alpha
# hyper-parameter
prior_gmm <- list(
  alpha = 0.001
)

# Set informative priors for the beta coefficients of the VB logit
prior_logit <- list(mu=c(1,
                   mean(scd_cohort$age),
                   mean(scd_cohort$highrisk),
                   mean(scd_cohort$CBC),
                   mean(scd_cohort$RC)),
              Sigma=diag(1,5))           # Simplest isotropic case

# X2 is the design matrix for the VB logit
X2 <- scd_cohort[,.(age,highrisk,CBC,RC)]
X2[,age:=as.numeric(age)]
X2[,highrisk:=as.numeric(highrisk)]
X2[,Intercept:=1]
setcolorder(X2, c("Intercept","age","highrisk","CBC","RC"))

# Run the patient phenotyping model

# Need to state what columns are the biomarkers
biomarkers <- c('CBC', 'RC')
set.seed(123)

pheno_result <- runModel(biomarkers,
                        gmm_X=X1, gmm_init="dbscan",
                        gmm_initParams=initParams,
                        gmm_maxiters=20, gmm_prior=prior_gmm,
                        gmm_stopIfELBOReverse=TRUE,
                        logit_X=X2, logit_prior=prior_logit
)

# Biomarker shifts for phenotype of interest
pheno_result$biomarker_shift
}