1

I'm working on a nonlinear regression analysis using R and have encountered an issue with error propagation. Specifically, I'm using predictNLS to estimate the errors of my predictions, but in some cases, the error ranges exceed biologically plausible thresholds (e.g., prediction intervals going above 1 or below 0, which doesn't make sense in my biological context).

Below is a reproducible example of my code. It simulates an exponential relationship, fits a nonlinear least squares (NLS) model, and then uses predictNLS to estimate prediction errors. My challenge is handling instances where the error propagation results in biologically impossible values.

Reproducible Example:

# simulate exponential relationship
set.seed(123)
# generate random x values between 0 and 60
x <- runif(100, 0, 60)
y <- 1 - exp(-0.075 * x) * rnorm(100, 0.7, 0.1)

data = data.frame(sr= x, fipar = y)

# create a nls model to fit the data
model <- nls(fipar ~ 1 - exp(-a * sr), data = data, start = list(a = 0.001))

# create an observed and predicted dataframe
data$predicted <- predict(model, data)

library(ggplot2)
data %>%
  ggplot(aes(x = sr, y = fipar)) + 
  geom_point() +
  geom_line(aes(y = predicted), color = "red")

# estimate the errors using predictNLS
newdat = data.frame(sr = seq(1, 60, 1))

prediction_se <- predictNLS(model, newdata = newdat, interval = "prediction", type = 'response')

prediction_se$summary$sr <- newdat$sr

prediction_se$summary %>%
  ggplot(aes(x = sr, y = Prop.Mean.1)) + 
  ylim(0, 1.2) +
  geom_point() +
  geom_ribbon(aes(ymin = `Prop.2.5%`, ymax = `Prop.97.5%`), alpha = 0.2) +
  geom_hline(yintercept = 1)

A scatter plot in ggplot2 showing relationship between model input and output, with 95% confidence interval band. A horizontal line represents the point at which any predicted value or error, would be going outside what is biologically possible

My natural reaction to this would be to simply replace any error value > 1 with a 1, but I'm guessing this overlooks a bunch of statistical assumptions. Are there any recommended approaches to constrain the prediction intervals or adjust the error estimation to reflect biological constraints?Perhaps my question would be better off in the stats stack exchange, but I figure I'll look for an answer here too..

2
  • If you are predicting probabilities or other outcomes with a ceiling, you need to use a distribution that matches your desired outcome. Commented Mar 28, 2024 at 3:36
  • Thanks! So I tried using a glm with a logit link function and the results now fit the biological constraints Commented Mar 28, 2024 at 3:54

1 Answer 1

3

A glm using a logit link function worked:

# Fit the GLM model 
model <- glm(fipar ~ sr, 
             data = data, 
             family = binomial(link = "logit"))

# Create an observed and predicted dataframe
data$predicted <- predict(model, data, type = "response",se.fit = T)$fit
data$se <- predict(model, data, type = "response",se.fit = T)$se.fit

# Plot the data and the model's predicted probabilities
library(ggplot2)
ggplot(data, aes(x = sr, y = fipar)) + 
  geom_point() +
  geom_line(aes(y = predicted), color = "red")+
  geom_ribbon(aes(ymin = predicted - se, ymax = predicted + se), alpha = 0.2)

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

I think the “standard” , even canonical, link for a binomial model is “logistic”. Putting it in explicitly does no harm but I think calling it a “custom function” seems to add an air of complexity that isn’t really needed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.