Handling Error Propagation Above Biological Thresholds in R with predictNLS

Question

I'm working on a nonlinear regression analysis using R and have encountered an issue with error propagation. Specifically, I'm using predictNLS to estimate the errors of my predictions, but in some cases, the error ranges exceed biologically plausible thresholds (e.g., prediction intervals going above 1 or below 0, which doesn't make sense in my biological context).

Below is a reproducible example of my code. It simulates an exponential relationship, fits a nonlinear least squares (NLS) model, and then uses predictNLS to estimate prediction errors. My challenge is handling instances where the error propagation results in biologically impossible values.

Reproducible Example:

# simulate exponential relationship
set.seed(123)
# generate random x values between 0 and 60
x <- runif(100, 0, 60)
y <- 1 - exp(-0.075 * x) * rnorm(100, 0.7, 0.1)

data = data.frame(sr= x, fipar = y)

# create a nls model to fit the data
model <- nls(fipar ~ 1 - exp(-a * sr), data = data, start = list(a = 0.001))

# create an observed and predicted dataframe
data$predicted <- predict(model, data)

library(ggplot2)
data %>%
  ggplot(aes(x = sr, y = fipar)) + 
  geom_point() +
  geom_line(aes(y = predicted), color = "red")

# estimate the errors using predictNLS
newdat = data.frame(sr = seq(1, 60, 1))

prediction_se <- predictNLS(model, newdata = newdat, interval = "prediction", type = 'response')

prediction_se$summary$sr <- newdat$sr

prediction_se$summary %>%
  ggplot(aes(x = sr, y = Prop.Mean.1)) + 
  ylim(0, 1.2) +
  geom_point() +
  geom_ribbon(aes(ymin = `Prop.2.5%`, ymax = `Prop.97.5%`), alpha = 0.2) +
  geom_hline(yintercept = 1)

My natural reaction to this would be to simply replace any error value > 1 with a 1, but I'm guessing this overlooks a bunch of statistical assumptions. Are there any recommended approaches to constrain the prediction intervals or adjust the error estimation to reflect biological constraints?Perhaps my question would be better off in the stats stack exchange, but I figure I'll look for an answer here too..

If you are predicting probabilities or other outcomes with a ceiling, you need to use a distribution that matches your desired outcome. — IRTFM
– IRTFM, Commented Mar 28, 2024 at 3:36
Thanks! So I tried using a glm with a logit link function and the results now fit the biological constraints — Squan Schmaan
– Squan Schmaan, Commented Mar 28, 2024 at 3:54

Squan Schmaan · Accepted Answer · 2024-04-02 07:25:19Z

3

A glm using a logit link function worked:

# Fit the GLM model 
model <- glm(fipar ~ sr, 
             data = data, 
             family = binomial(link = "logit"))

# Create an observed and predicted dataframe
data$predicted <- predict(model, data, type = "response",se.fit = T)$fit
data$se <- predict(model, data, type = "response",se.fit = T)$se.fit

# Plot the data and the model's predicted probabilities
library(ggplot2)
ggplot(data, aes(x = sr, y = fipar)) + 
  geom_point() +
  geom_line(aes(y = predicted), color = "red")+
  geom_ribbon(aes(ymin = predicted - se, ymax = predicted + se), alpha = 0.2)

edited Apr 2, 2024 at 7:25

answered Mar 28, 2024 at 4:01

Squan Schmaan

3851 silver badge11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

IRTFM Over a year ago

I think the “standard” , even canonical, link for a binomial model is “logistic”. Putting it in explicitly does no harm but I think calling it a “custom function” seems to add an air of complexity that isn’t really needed.

Collectives™ on Stack Overflow

Handling Error Propagation Above Biological Thresholds in R with predictNLS

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related