Weird edge case for `check_heteroskedasticity` plots #408

mattansb · 2022-03-24T11:29:34Z

What's this weirdness?

library(performance)
#> Warning: package 'performance' was built under R version 4.1.3
library(see)

set.seed(1)
x <- rpois(360, 1.7)
y <- x + rnorm(length(x))

m <- lm(y ~ x)

plot(check_heteroskedasticity(m))
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : pseudoinverse used at -0.068209
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : neighborhood radius 2.0697
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : reciprocal condition number 1.9085e-015
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : There are other near singularities as well. 4.1579
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
#> -0.068209
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
#> 2.0697
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 1.9085e-015
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : There are other near
#> singularities as well. 4.1579

plot(x, y)

^{Created on 2022-03-24 by the reprex package (v2.0.1)}

strengejacke · 2024-07-14T00:14:26Z

Could be related to #642, where @bwiernik suggested using a different smooth function for "non-continuous" scales.

bwiernik · 2024-07-14T10:00:25Z

Yeah

strengejacke · 2024-07-14T11:32:36Z

For these plots, we use fitted() for the x-axis, and scaled residuals for the y-axis. But when do we decide whether the x-axis is "categorical"? E.g., adding a continuous variable to the model makes the plot looking much more "usual":

set.seed(1)
d <- data.frame(x = rpois(360, 1.7), x2 = rnorm(360))
d$y <- d$x + rnorm(length(d$x))

m <- lm(y ~ x + x2, data = d)
performance::check_heteroscedasticity(m) |> plot()

fitted() in the above case return 360 unique values, the same as the number of observations.

For Mattan's example, fitted() returns 27 unique values, much less than the 360 observations:

set.seed(1)
d <- data.frame(x = rpois(360, 1.7))
d$y <- d$x + rnorm(length(d$x))

m <- lm(y ~ x, data = d)
length(unique(fitted(m)))
#> [1] 27

We must either think of a way how to determine the "spread" of data points across the x axis (even in the first example, they all "spread" around integer values), or whether we want to have at least x% of unique values for the fitted values compared to nobs.

bwiernik · 2024-07-14T13:07:00Z

I think that plot is fine, even though it's sort of clustered

I think if it's either of these cases:

It's a discrete model like Poisson or Binomial or Negative Binomial or ordinal (though I think we already have a different plot for binomial)
The number of discrete fitted values is "small", maybe 10 or fewer?

And maybe let's have an argument that can be set to force one form or the other?

mattansb · 2024-07-14T13:17:52Z

I think if it's either of these cases:

It's a discrete model like Poisson or Binomial or Negative Binomial or ordinal (though I think we already have a different plot for binomial)

But a discrete model ≠ discrete predictions (fitted values), so is this necessary?

bwiernik · 2024-07-14T23:56:48Z

Yeah actually thinking about it, the homogeneity of variance plot really only applies to Gaussian models.

Maybe we detect based on the predictors all being factors and/binary?

bwiernik self-assigned this Mar 24, 2022

strengejacke added the 3 investigators ❔❓ Need to look further into this issue label Mar 24, 2022

strengejacke mentioned this issue Oct 26, 2023

DHARMa implementation for new check_residuals() function #643

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird edge case for `check_heteroskedasticity` plots #408

Weird edge case for `check_heteroskedasticity` plots #408

mattansb commented Mar 24, 2022

strengejacke commented Jul 14, 2024

bwiernik commented Jul 14, 2024

strengejacke commented Jul 14, 2024

bwiernik commented Jul 14, 2024

mattansb commented Jul 14, 2024

bwiernik commented Jul 14, 2024

Weird edge case for check_heteroskedasticity plots #408

Weird edge case for check_heteroskedasticity plots #408

Comments

mattansb commented Mar 24, 2022

strengejacke commented Jul 14, 2024

bwiernik commented Jul 14, 2024

strengejacke commented Jul 14, 2024

bwiernik commented Jul 14, 2024

mattansb commented Jul 14, 2024

bwiernik commented Jul 14, 2024

Weird edge case for `check_heteroskedasticity` plots #408

Weird edge case for `check_heteroskedasticity` plots #408