Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ggdist::curve_interval for plotting curves in Bayesian models #455

Open
mattansb opened this issue Aug 14, 2021 · 11 comments
Open

Use ggdist::curve_interval for plotting curves in Bayesian models #455

mattansb opened this issue Aug 14, 2021 · 11 comments
Labels
Enhancement 💥 Implemented features can be improved or revised Feature idea 🔥 New feature or request

Comments

@mattansb
Copy link
Member

This gives an overall smoother line / interval (less "lumpy").

Read more, here >>

Linear regression plot

m <- rstanarm::stan_glm(mpg ~ hp, data = mtcars)

plot(modelbased::estimate_relation(m, length = 100))

image

Looks lumpy...

library(ggplot2)

emt <- emmeans::emmeans(m, ~ hp, at = list(hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 100)))

tidybayes::gather_emmeans_draws(emt) |>
  ggdist::curve_interval(.value, .along = hp, .width = 0.95)  |>
  ggplot(aes(hp, .value)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper), alpha = 0.4) +
  geom_line()

image

Mmmmm... smoothhhhh

JN-plot smooth() plot

formula <- brms::brmsformula(
  mpg ~ s(wt),
  sigma ~ s(wt),
  beta ~ s(wt),
  family = brms::exgaussian()
)

m <- brms::brm(formula, data = mtcars, chains = 1, iter = 200)

plot(modelbased::estimate_slopes(m, trend = "wt", at = "wt", length = 100))

image

library(ggplot2)

emt <- emmeans::emtrends(m, ~ wt, "wt", at = list(wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 100)))

tidybayes::gather_emmeans_draws(emt) |>
  ggdist::curve_interval(.value, .along = wt, .width = 0.95)  |>
  transform(.sig = .lower > 0 | .upper < 0) |>
  transform(.part.group = {function(x) rep(seq_along(x$lengths), x$length)}(rle(.sig))) |> 
  ggplot(aes(wt, .value)) + 
  geom_hline(yintercept = 0) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper, group = .part.group,  fill = .sig), alpha = 0.4) +
  geom_line()

image

@DominiqueMakowski
Copy link
Member

Mmh interesting, I'll try to include that... I suppose it will go somewhere here:

https://github.com/easystats/modelbased/blob/eed1a1572fd71197aaa979dbcb857c6d0be5f459/R/visualisation_recipe.estimate_predicted.R#L292-L293


image

this one looks funny tho

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Aug 15, 2021

oh actually no, this should be something that must be done at the level of bayestestR, when the computation of the CI is done. Maybe we could add an option to describe_posterior like smooth_ci or something?

@DominiqueMakowski
Copy link
Member

or in ci_method add "curve" for curve_interval method

@mattansb
Copy link
Member Author

Yeah, that sounds right (:

@mattansb
Copy link
Member Author

this one looks funny tho

A very hurtful thing to say to the poor graph....

@mattansb
Copy link
Member Author

Actually, do we need this implemented in bayestestR? It would essentially be a wrapper around ggdist::curve_interval, no?

@DominiqueMakowski
Copy link
Member

Well it seems to be like it belongs there, because if I understand it's a different method of computing CIs, different from hdi, quantile etc. Perhaps mostly used for plotting, true, but still

Moreover, modelbased now simply runs insight:

https://github.com/easystats/modelbased/blob/3ae98f03d5311b883276c67c18dd2cf5b49836ee/R/estimate_predicted.R#L219-L226

which in turn calls bayestestR ^^

https://github.com/easystats/insight/blob/f4e5c6d312f96fd671bc1cdc91d274bc567ad41e/R/get_predicted_ci.R#L435-L437

so yeah for generalizability (for people that use insight::get_predicted directly) it'd be convenient to have that feature in bayestestR

@bwiernik
Copy link
Contributor

I think it's okay if the curve estimation and the tabling have slightly different estimation. It would be okay to restrict the ggdist calls to see

@mattansb
Copy link
Member Author

Another example with moderation:

m <- rstanarm::stan_glm(mpg ~ factor(cyl) * hp, data = mtcars,
                        chains = 1, iter = 200, refresh = 0)

em_cyl <- emmeans::emmeans(m, ~ cyl + hp,
                           at = list(hp = seq(50, 350, by = 15))) |>
  tidybayes::gather_emmeans_draws()

library(ggplot2)

em_cyl |>
  ggdist::median_hdci(.width = 0.9) |>
  transform(cyl = factor(cyl)) |>
  ggplot(aes(hp, .value, ymin = .lower, ymax = .upper, 
             fill = cyl, color = cyl, group = cyl)) + 
  geom_ribbon(alpha = 0.2, color = NA) + 
  geom_line()

em_cyl |>
  ggdist::curve_interval(.width = 0.9) |>
  transform(cyl = factor(cyl)) |>
  ggplot(aes(hp, .value, ymin = .lower, ymax = .upper, 
             fill = cyl, color = cyl, group = cyl)) + 
  geom_ribbon(alpha = 0.2, color = NA) + 
  geom_line()

Created on 2021-08-15 by the reprex package (v2.0.0)

@mjskay
Copy link

mjskay commented Aug 18, 2021

Chiming in since @DominiqueMakowski pinged me on mjskay/ggdist#95. I'm not sure in what context you're planning to use curve_interval, but it's worth pointing out a few things you may want to ponder:

  • It is giving a joint band rather than pointwise intervals, which may or may not be what you want. I'm not sure where in your API this sits and if users are expecting pointwise or joint intervals there in general.
  • I hesitate to suggest it being the default for anything at the moment as it is a bit experimental, though I'd certainly be happy if folks put it through its paces. Unlike pointwise intervals, getting good joint bands is nontrivial and can quite possibly give weird results on some datasets. I'll say that I think what I have in curve_interval is more robust than anything else I've seen, but I wouldn't be surprised if someone came along with a dataset that caused it to do weird stuff.
  • Internally it currently requires {posterior} (only required as Suggests) so you might need to convince me to rewrite part of it to not use posterior, rewrite it for me (I wouldn't object!), or accept {posterior} as a dep in Suggests.
  • A slightly more reliable calculation of band depth (one step in the algorithm for finding the simultaneous bands) as implemented in {fda} can be used with curve_interval via .interval = "bd-mbd" but this requires the {fda} package, which pulls in a bunch of dependencies. Worth pondering if you are thinking about defaults.

@mjskay
Copy link

mjskay commented Aug 19, 2021

One last thought from me: I was trying to figure out why your pointwise intervals were so noisy looking in the first place and realized you are using HDIs. HDIs can be very noisy, as you have discovered. If you use quantile intervals instead you'll find the pointwise bands are much smoother, besides having other benefits (like being invariant under transformation). FWIW if I were picking a default pointwise interval type I would lean in that direction.

@strengejacke strengejacke added Enhancement 💥 Implemented features can be improved or revised Feature idea 🔥 New feature or request labels Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement 💥 Implemented features can be improved or revised Feature idea 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants