Plotting logistic regression curves
Logistic regression models can be implemented in Monolix to identify and estimate predictor parameters and covariate effects that influence the probability of a binary outcome.
Probability vs Covariate
The following R function plotLogRegVsCovariate()
function generates a visualization of how changes in a selected covariate influence the predicted probabilities from the logistic regression model.
The function works by varying one covariate of interest while holding all other covariates constant at user-defined reference values. It prepares a modified covariate dataset, runs a Simulx simulation with the specified number of replicates, and extracts the simulation results. It then summarizes the predicted probabilities (e.g., median, 5th, and 95th percentiles) across replicates and creates a plot showing the relationship between the covariate and the simulated response probability.
The output is a ggplot object that visualizes the median predicted probability as a function of the covariate, with a shaded ribbon indicating uncertainty bounds and annotated with reference covariate values. This function is especially useful for exploring covariate sensitivity in model-based simulations and generating informative visual summaries for reports or publications.
The function plotLogRegVsCovariate
takes the following arguments:
covariate
(string) - The name of the covariate you want to vary in the simulation. This is the main variable of interest for which the effect on predicted probability will be assessed.ref_values
(named list) - A list of reference values for all other covariates that should be held constant during the simulation. Each element must be named according to the covariate it corresponds to.df_cov
(data frame) - A data frame containing covariate values. The specified covariate will vary according to this input, while others will be overwritten with their reference values fromref_values
.path.data
(string) - File path where the modified covariate data CSV will be saved for Simulx to use. Should end with a slash (e.g.,"./simdata/"
).nb_reps
(integer, default = 500) - The number of replicate simulations to perform. More replicates improve the precision of estimated prediction intervals.hex
(string, default = "#3A5FCD") - The hex code for the color to be used in the plot. Controls both the line and ribbon fill color.labels
(named list, default = empty list) - Optional pretty labels for covariates to be used in the plot axes and subtitle. Keys should match covariate names; values are the display labels.
plotLogRegVsCovariate <- function(
covariate, # covariate of interest
ref_values, # named list of reference values for other covariates
df_cov,
path.data,
nb_reps = 500, # default reps if no nb provided
hex = "#3A5FCD", # default color if none provided
labels = list())
{
# Prepare covariate data: vary `covariate`, fix others to reference
df_cov_c <- df_cov
for (ref_var in names(ref_values)) {
if (ref_var != covariate) {
df_cov_c[[ref_var]] <- ref_values[[ref_var]]
}
}
# Save covariate data to CSV for Simulx
cov_file <- paste0(path.data, "df_cov_", covariate, ".csv")
write.csv(df_cov_c, cov_file, row.names = FALSE, quote = FALSE)
# Set up Simulx covariate and group
defineCovariateElement(name = "cov_c", element = cov_file)
setGroupElement(group = "simulationGroup1", elements = c("cov_c"))
setNbReplicates(nb = nb_reps)
# Run simulation
runScenario()
# Save simulation project
saveProject(projectFile = paste0("logreg_Responder_vs_", covariate, "_", nb_reps, "reps.smlx"))
# Extract and merge results
prob <- getSimulationResults()$res$p
res <- getSimulationResults()$individualParameters$simulationGroup1
data <- merge(prob, res, by = c("rep", "id", "original_id"))
# Summarize predicted probabilities
c_sym <- rlang::sym(covariate)
df_summary <- data %>%
group_by(id) %>%
summarise(
covariate = first(!!c_sym),
p_median = median(p),
p_lower = quantile(p, 0.05),
p_upper = quantile(p, 0.95),
.groups = "drop") %>%
arrange(covariate)
# Use pretty label for x-axis and title
pretty_cov <- ifelse(covariate %in% names(labels), labels[[covariate]], covariate)
# Format reference values into a readable string with nice labels
ref_string <- paste(
sapply(names(ref_values), function(var) {
if (var != covariate) {
label <- ifelse(var %in% names(labels), labels[[var]], var)
paste0(label, " = ", ref_values[[var]])}}),
collapse = ", ")
# Generate plot
plot <- ggplot(df_summary, aes(x = covariate, y = p_median)) +
geom_line(color = hex, linewidth = 1.2) +
geom_ribbon(aes(ymin = p_lower, ymax = p_upper), alpha = 0.5, fill = hex) +
geom_hline(yintercept = 0.5, linetype = "solid", color = "red", linewidth = 0.7) +
labs(title = paste("Predicted Probability vs", pretty_cov),
subtitle = paste("Reference values:", ref_string),
x = pretty_cov,
y = "Predicted Probability (P)") +
theme_minimal(base_size = 14)
return(plot)
}
Examples
Effect of Karnofsky Score on Predicted Response Probability
This example demonstrates how to use plotLogRegVsCovariate()
to simulate and visualize the effect of the Karnofsky score (karno
) on the predicted probability of being a responder.
Once a logistic model has been established and the covariate effects have been estimated in Monolix, the logistic regression curve — displaying the probability of a specific outcome versus a continuous covariate — can be generated using a simple R script.
Only three R packages are required: ggplot2
, dplyr
, and lixoftConnectors
. The starting point is a Monolix project containing the logistic regression model, which is then exported to Simulx for simulations.
rm(list = ls())
# Set working directory to script location
path.prog <- suppressWarnings(paste0(dirname(rstudioapi::getSourceEditorContext()$path), "/"))
setwd(path.prog)
# Load required libraries
suppressWarnings(library(ggplot2))
suppressWarnings(library(dplyr))
suppressWarnings(library(lixoftConnectors)) # For MonolixSuite integration
# Define MonolixSuite path and project
path.software <- "C:/Program Files/Lixoft/MonolixSuite2024R1"
path.prj <- path.prog
# path.prj <- "C:/Users/..."
name.prj <- "logreg_signpred.mlxtran"
# Initialize Lixoft Connectors and import Monolix project to Simulx
initializeLixoftConnectors(path.software, software = "monolix")
loadProject(paste0(path.prj,name.prj))
exportProject(settings = list(targetSoftware = "simulx", filesNextToProject = T))
The simulation is configured to compute the probability of an outcome based on the predictor variable , where all terms — the population intercept and covariate effects — were estimated in Monolix.
A new output element representing the probability p=1/(1+exp(−x))
is defined, where x
refers to the full linear predictor , not just the intercept
The simulation uses 100 subjects and 500 replicates, with mlx_PopUncertainSA
enabled to account for parameter uncertainty—introducing variability between replicates by resampling population parameters. Simulation results include both the binary response (mlx_responder
) and the continuous probability (mlx_probability
).
# Define output element: the probability derived from the logistic function
setAddLines("p=1/(1+exp(-x))")
defineOutputElement(name = "probability", element = list(data = data.frame(time = 0), output = "p"))
# Use uncertainty on population parameters
setGroupElement(group = "simulationGroup1", elements = c("mlx_PopUncertainSA"))
# Use responder and probability output vectors
setGroupElement(group = "simulationGroup1", elements = c("mlx_responder", "probability"))
nb_sbj = 100
setGroupSize("simulationGroup1", nb_sbj)
# Define simulation scenario: simulate only, no endpoint computation
setScenario(c(simulation = TRUE, endpoints = FALSE))
setNbReplicates(nb = 500)
The covariate dataset is loaded, labels are set, and reference scenarios are defined by fixing trtm
and tumorsize0
.
# Load covariate dataset
df_cov <- read.csv(paste0(path.prog, "/df_sim.csv"))
# df_cov <- unique(getCovariateElements()$mlx_Cov$data)
# Label map
label_map <- list(karno = "Karnofsky Score",
tumorsize0 = "Tumor Size at BL",
trtm = "Treatment Arm")
# Set reference values for other covariates
ref_vals_karno_typical <- list(trtm = "Placebo" , tumorsize0 = 0)
ref_vals_karno_nontypical1 <- list(trtm = "Treatment", tumorsize0 = 45)
ref_vals_karno_nontypical2 <- list(trtm = "Placebo" , tumorsize0 = 45)
ref_vals_karno_nontypical3 <- list(trtm = "Treatment", tumorsize0 = 60)
# Run and collect plots
p1_0 <- plotLogRegVsCovariate("karno", ref_vals_karno_typical , df_cov, path.prog, labels = label_map)
p1_1 <- plotLogRegVsCovariate("karno", ref_vals_karno_nontypical1, df_cov, path.prog, hex = "#CD3333", labels = label_map)
p1_2 <- plotLogRegVsCovariate("karno", ref_vals_karno_nontypical2, df_cov, path.prog, hex = "#2E8B57" ,labels = label_map)
p1_3 <- plotLogRegVsCovariate("karno", ref_vals_karno_nontypical3, df_cov, path.prog, hex = "#CDCD00", labels = label_map)
The function is called for each scenario to generate plots illustrating how predictions change with karno
under different clinical conditions.
Predicted probability vs Kanofsky score | |
---|---|
Impact of Baseline Tumor Size on Predicted Response Probability
The simulations are based on the same Monolix project as in the example above, except that this time the probability is plotted against tumor size at baseline.
ref_vals_tumorsize0_typical <- list(trtm = "Placebo" , karno = 100)
ref_vals_tumorsize0_nontypical1 <- list(trtm = "Treatment", karno = 70)
ref_vals_tumorsize0_nontypical2 <- list(trtm = "Placebo" , karno = 70)
ref_vals_tumorsize0_nontypical3 <- list(trtm = "Treatment", karno = 50)
p2_0 <- plotLogRegVsCovariate("tumorsize0", ref_vals_tumorsize0_typical , df_cov, path.prog, labels = label_map)
p2_1 <- plotLogRegVsCovariate("tumorsize0", ref_vals_tumorsize0_nontypical1, df_cov, path.prog, hex = "#CD3333", labels = label_map)
p2_2 <- plotLogRegVsCovariate("tumorsize0", ref_vals_tumorsize0_nontypical2, df_cov, path.prog, hex = "#2E8B57", labels = label_map)
p2_3 <- plotLogRegVsCovariate("tumorsize0", ref_vals_tumorsize0_nontypical3, df_cov, path.prog, hex = "#CDCD00", labels = label_map)
Predicted probability vs tumor size at baseline | |
---|---|
Probability vs Full Predictor
The plotLogRegVsPredictor()
function is designed to visualize the relationship between the predictor variable and the predicted probabilities from a logistic regression model. It provides a clear and interpretable summary by binning the predictor, summarizing the predicted probabilities within each bin, and then plotting these summaries.
The plotLogRegVsPredictor
function is designed to visually represent the relationship between a continuous predictor (x
) and predicted probabilities (p
), typically derived from Simulx simulations with replicates. This function returns a ggplot
object, offering a clear graphical summary. It first involves binning the predictor x
based on quantiles. Subsequently, within these created bins, the predicted probabilities p
are summarized by their median, 5th, and 95th percentiles across replicates. This allows for a robust plot of the summarized probabilities against the predictor values. This visualization is usefull for illustrating the overall behavior between a predictor and probability. Furthermore, with its group-specific option, it effectively visualizes covariate sensitivity.
The function plotLogRegVsPredictor
takes the following arguments:
data
(data frame): The input data frame. It must contain a predictor variable namedx
and a predicted probabilities column namedp
group_col
(string, optional): The name of a column indata
to use for grouping. If provided, the quantile binning and probability summaries will be computed separately for each unique level within this column.group_levels
(character vector, optional): A subset of levels fromgroup_col
to include in the analysis. If specified, only data corresponding to these levels will be processed.prob_seq
(numeric vector, default = seq(0, 1, 0.1)): A sequence of probabilities used to calculate the quantile breaks for thex
predictor.x_break
(numeric, default = 2.5): The interval between major tick marks on the x-axis of the plot.hex
(string, default = "#3A5FCD"): A hexadecimal color code used for the plot's line and ribbon fill.custom_title
(string, optional): A custom title for the plot. If provided, this will override the automatically generated title.label_map
(named list, default = empty list): An optional list for providing more descriptive or "pretty" labels for your covariates.
plotLogRegVsPredictor <- function(data,
group_col = NULL,
group_levels = NULL,
prob_seq = seq(0, 1, 0.1),
x_break = 2.5,
hex = "#3A5FCD",
custom_title = NULL,
label_map = list()) {
# Input validation
required_cols <- c("x", "p")
missing_cols <- required_cols[!required_cols %in% names(data)]
if (length(missing_cols) > 0) {
stop(paste("Data frame must contain columns:", paste(missing_cols, collapse = ", ")))
}
# ---- STEP 1: QUANTILE-BASED BINNING ----
group_sym <- if (!is.null(group_col)) rlang::sym(group_col) else NULL
# Filter data to specific group levels if specified
data_filtered <- if (!is.null(group_col) && !is.null(group_levels)) {
if (!group_col %in% names(data)) {
stop(paste("Group column", group_col, "not found in data frame"))
}
data %>% filter(!!group_sym %in% group_levels)
} else {
data
}
data_filtered <- data_filtered %>%
arrange_at(vars(any_of(c("rep", "id", "x"))))
# Create quantile-based bins
data_binned <- if (!is.null(group_col)) {
data_filtered %>%
group_by(!!group_sym) %>%
mutate(quantiles = list(quantile(x, probs = prob_seq, na.rm = TRUE)),
bins = cut(x, breaks = quantiles[[1]], include.lowest = TRUE)) %>%
ungroup()
} else {
# Overall quantiles
quantile_breaks <- quantile(data_filtered$x, probs = prob_seq, na.rm = TRUE)
data_filtered %>%
mutate(bins = cut(x, breaks = quantile_breaks, include.lowest = TRUE))
}
# ---- STEP 2: SUMMARY STATISTICS ----
summary_data <- data_binned %>%
group_by(bins) %>%
summarise(median_p = median(p, na.rm = TRUE),
p5 = quantile(p, 0.05, na.rm = TRUE),
p95 = quantile(p, 0.95, na.rm = TRUE),
bin_mid = round(mean(x, na.rm = TRUE), 3),
n_obs = n(), # Add observation count for reference
.groups = "drop") %>%
filter(!is.na(bins), n_obs > 0) # Remove bins with insufficient data
# Store group information for automatic title generation
if (!is.null(group_col) && !is.null(group_levels) && length(group_levels) == 1) {
attr(summary_data, "group_col") <- group_col
attr(summary_data, "group_value") <- group_levels[1]
}
# ---- STEP 3: TITLE GENERATION ----
generate_plot_title <- function(summary_data, label_map, custom_title) {
# Use custom title if provided
if (!is.null(custom_title)) return(custom_title)
base_title <- "Predicted Probability"
# Extract group information from attributes
group_col <- attr(summary_data, "group_col")
group_value <- attr(summary_data, "group_value")
# Use label_map if provided, otherwise use column name
if (!is.null(group_col) && !is.null(group_value)) {
group_label <- if (group_col %in% names(label_map)) {
label_map[[group_col]]
} else {
group_col
}
return(paste0(base_title, " - ", group_label, ": ", group_value))
}
return(base_title)
}
plot_title <- generate_plot_title(summary_data, label_map, custom_title)
# ---- STEP 4: PLOTTING ----
min_x <- min(summary_data$bin_mid, na.rm = TRUE)
max_x <- max(summary_data$bin_mid, na.rm = TRUE)
# Create the plot
p <- ggplot(summary_data, aes(x = bin_mid, y = median_p)) +
geom_line(color = hex, linewidth = 1) +
geom_ribbon(aes(ymin = p5, ymax = p95), fill = hex, alpha = 0.3) +
geom_hline(yintercept = 0.5, linetype = "solid", color = "red", linewidth = 0.8) +
scale_x_continuous(name = "Predictor (x)",
breaks = seq(min_x, max_x, by = x_break)) +
scale_y_continuous(name = "Predicted Probability (P)",
limits = c(0, 1),
breaks = seq(0, 1, 0.2),
expand = expansion(mult = c(0.01, 0.01))) +
labs(title = plot_title) +
theme_minimal()
return(p)
}
Examples
This example builds upon the Monolix project previously discussed in the "Probability vs. Covariate" section. The Monolix project incorporates an established logistic regression model that describes the probability of being a responder. This model accounts for the impact of several covariates, including Karnofsky score, tumor size at baseline, and the effects of different treatment groups.
Only four R packages are required: ggplot2
, dplyr
, rlang
and lixoftConnectors
. The starting point is a Monolix project containing the logistic regression model, which is then exported to Simulx for simulations.
rm(list = ls())
path.prog <- suppressWarnings(paste0(dirname(rstudioapi::getSourceEditorContext()$path), "/"))
setwd(path.prog)
# Load required libraries
suppressWarnings(library(ggplot2))
suppressWarnings(library(dplyr))
suppressWarnings(library(rlang))
suppressWarnings(library(lixoftConnectors))
# Define MonolixSuite path and project
path.software <- "C:/Program Files/Lixoft/MonolixSuite2024R1"
path.prj <- path.prog
# path.prj <- "C:/Users/..."
name.prj <- "logreg_signpred.mlxtran"
# Initialize Lixoft Connectors and import Monolix project to Simulx
initializeLixoftConnectors(path.software, software = "monolix")
loadProject(paste0(path.prj,name.prj))
exportProject(settings = list(targetSoftware = "simulx", filesNextToProject = T))
A new output element representing the probability p=1/(1+exp(−x))
is defined, where x
refers to the full linear predictor, not just the intercept. The simulation uses 500 subjects and 500 replicates, with mlx_PopUncertainSA
enabled to account for parameter uncertainty—introducing variability between replicates by resampling population parameters. The original mlx_Cov
element is used for covariate information in the simulation. Simulation results include both the binary response mlx_responder
and the continuous probability mlx_probabilit.
The simulation scenario is then launched, and the results are retrieved.
# Define output element: the probability derived from the logistic function
setAddLines("p=1/(1+exp(-x))")
defineOutputElement(name = "probability", element = list(data = data.frame(time = 0), output = "p"))
# Use uncertainty on population parameters
setGroupElement(group = "simulationGroup1", elements = c("mlx_PopUncertainSA"))
# Use responder and probability output vectors
setGroupElement(group = "simulationGroup1", elements = c("mlx_responder", "probability"))
setGroupElement(group = "simulationGroup1", elements = c("mlx_Cov"))
setGroupSize("simulationGroup1", 500)
# Define simulation scenario: simulate only, no endpoint computation
setScenario(c(simulation = TRUE, endpoints = FALSE))
setNbReplicates(nb = 500)
# Run simulation
runScenario()
# Save simulation project
# saveProject(projectFile = paste0("logreg_Responder_vs_predictor_500reps.smlx"))
# Extract results
prob <- getSimulationResults()$res$p
res <- getSimulationResults()$individualParameters$simulationGroup1
Initially, simulation results are merged from the prob
and res
datasets. Subsequently, categorical groups are created for the continuous covariates karno
(Karnofsky score) and tumorsize0
(tumor size at baseline), with these groups being formed based on specific ranges. While optional, these groups are later used to visualize the probability versus a predictor for different subgroups along the logistic sigmoid curve.
Additionally, a label_map
is defined. This aids in generating clearer plot titles and labels for these new groups and other covariates. The subsequent section then presents example uses of the plotLogRegVsPredictor
function, demonstrating how to create plots for the entire dataset, for specific subgroups, and with customized appearances.
# Merge results & create groups for continuous covariates (optional)
data <- merge(prob, res, by = c("rep", "id", "original_id")) %>%
mutate(Karno_group = case_when(karno>=min(karno) & karno<66 ~ paste0("[",min(karno),"; 66)"),
karno>=66 & karno<81 ~ "[66; 81)",
TRUE ~ paste0("[81; ",max(karno),"]" )),
TS0_group = case_when(tumorsize0>=min(tumorsize0)& tumorsize0 <34.5 ~ paste0("[",min(tumorsize0),"; 34.5)"),
tumorsize0>=34.5 & tumorsize0 < 47.5 ~ paste0("[34.5; 47.5)"),
TRUE ~ paste0("[47.5; ",max(tumorsize0),"]")))
#-------------------------------------------------------------------------------
# Define label mapping for cleaner titles (project-specific)
label_map <- list(
Karno_group = "Karnofsky Performance Status",
TS0_group = "Baseline Tumor Size",
trtm = "Treatment Group"
)
# Example usage patterns:
# 1. Entire dataset with default settings
plot1 <- plotLogRegVsPredictor(data)
# 2. Specific tumor size group with custom color
plot2 <- plotLogRegVsPredictor(data,
group_col = "TS0_group",
group_levels = "[21.6; 34.5)",
hex = "#CD3333",
label_map = label_map)
# 3. Treatment comparison with fine-grained binning
plot3 <- plotLogRegVsPredictor(data,
group_col = "trtm",
group_levels = "Treatment",
prob_seq = seq(0, 1, 0.075), # 7.5% quantiles
x_break = 0.5,
hex = "#2E8B57",
label_map = label_map)
# 4. Custom title override
plot4 <- plotLogRegVsPredictor(data,
group_col = "Karno_group",
group_levels = "[81; 97]",
custom_title = "High Performance Patients",
x_break = 1,
hex = "#CDCD00")
Predicted probability (p) vs predictor ( x) | |
---|---|