Run the Variational Bayes patient Phenotyping model
runModel(
biomarkers,
gmm_X,
logit_X,
gmm_delta = 1e-06,
logit_delta = 1e-16,
gmm_maxiters = 200,
logit_maxiters = 10000,
gmm_init = "kmeans",
gmm_initParams = NULL,
gmm_prior = NULL,
logit_prior = NULL,
gmm_stopIfELBOReverse = FALSE,
gmm_verbose = FALSE,
logit_verbose = FALSE
)
The EHR variables that are biomarkers. This is a vector of data column names corresponding to the biomarker variables.
n x p data matrix (or data frame that will be converted to a matrix).
The input design matrix. Note the intercept column vector is assumed included.
Change in ELBO that triggers algorithm stopping.
Change in ELBO that triggers algorithm stopping.
The maximum iterations for VB GMM.
The maximum iterations for VB logit.
Initialize the clusters c("random", "kmeans", "dbscan").
Parameters for an initialiser requiring its own parameters e.g. dbscan requires 'eps' and 'minPts'.
An informative prior for the GMM.
An informative prior for the logit.
Stop the VB iterations if the ELBO reverses direction (TRUE or FALSE).
Print out information per iteration to track progress in case of long-running experiments.
Print out information per iteration to track progress in case of long-running experiments.
A list containing:
prevalence - The mean probability of latent phenotype given the data and priors.
biomarker_shift - A data frame containing the biomarker shifts from normal for the phenotype.
gmm - The VB GMM results. For details see help(vb_gmm_cavi).
logit - The VB Logit results. For details see help(logit_CAVI).
if (FALSE) {
##Example 1: Use the internal Sickle Cell Disease data to find the rare
## phenotype. SCD is extremely rare so we use DBSCAN to initialise
## the VB GMM. We also use an informative prior for the mixing
## coefficient and stop iterations when the ELBO starts to reverse
## so that we stop when the minor (SCD) component is reached.
library(data.table)
# Load the SCD example data supplied with the VBphenoR package
data(scd_cohort)
# We will use the SCD biomarkers to discover the SCD latent class.
# X1 is the data matrix for the VB GMM.
X1 <- scd_cohort[,.(CBC,RC)]
# We need to supply DBSCAN hyper-parameters as we will initialise VBphenoR
# with DBSCAN. See help(DBSCAN) for details of these parameters.
initParams <- c(0.15, 5)
names(initParams) <- c('eps','minPts')
# Set an informative prior for the VB GMM mixing coefficient alpha
# hyper-parameter
prior_gmm <- list(
alpha = 0.001
)
# Set informative priors for the beta coefficients of the VB logit
prior_logit <- list(mu=c(1,
mean(scd_cohort$age),
mean(scd_cohort$highrisk),
mean(scd_cohort$CBC),
mean(scd_cohort$RC)),
Sigma=diag(1,5)) # Simplest isotropic case
# X2 is the design matrix for the VB logit
X2 <- scd_cohort[,.(age,highrisk,CBC,RC)]
X2[,age:=as.numeric(age)]
X2[,highrisk:=as.numeric(highrisk)]
X2[,Intercept:=1]
setcolorder(X2, c("Intercept","age","highrisk","CBC","RC"))
# Run the patient phenotyping model
# Need to state what columns are the biomarkers
biomarkers <- c('CBC', 'RC')
set.seed(123)
pheno_result <- runModel(biomarkers,
gmm_X=X1, gmm_init="dbscan",
gmm_initParams=initParams,
gmm_maxiters=20, gmm_prior=prior_gmm,
gmm_stopIfELBOReverse=TRUE,
logit_X=X2, logit_prior=prior_logit
)
# Biomarker shifts for phenotype of interest
pheno_result$biomarker_shift
}