-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird edge case for check_heteroskedasticity
plots
#408
Comments
Yeah |
For these plots, we use set.seed(1)
d <- data.frame(x = rpois(360, 1.7), x2 = rnorm(360))
d$y <- d$x + rnorm(length(d$x))
m <- lm(y ~ x + x2, data = d)
performance::check_heteroscedasticity(m) |> plot()
For Mattan's example, set.seed(1)
d <- data.frame(x = rpois(360, 1.7))
d$y <- d$x + rnorm(length(d$x))
m <- lm(y ~ x, data = d)
length(unique(fitted(m)))
#> [1] 27 We must either think of a way how to determine the "spread" of data points across the x axis (even in the first example, they all "spread" around integer values), or whether we want to have at least x% of unique values for the fitted values compared to nobs. |
I think that plot is fine, even though it's sort of clustered I think if it's either of these cases:
And maybe let's have an argument that can be set to force one form or the other? |
But a discrete model ≠ discrete predictions (fitted values), so is this necessary? |
Yeah actually thinking about it, the homogeneity of variance plot really only applies to Gaussian models. Maybe we detect based on the predictors all being factors and/binary? |
What's this weirdness?
Created on 2022-03-24 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: