Hands-on Exercise 4c

Hands-on Exercise

Statistical Analysis

Author

Yi Fang

Published

May 6, 2025

Modified

May 22, 2025

Visualising Uncertainty

1 Learning Outcome

Visualising uncertainty is relatively new in statistical graphics. To create statistical graphics for visualising uncertainty, the following will be covered in this hands-on exercise.

to plot statistics error bars by using ggplot2,
to plot interactive error bars by combining ggplot2, plotly and DT,
to create advanced by using ggdist, and
to create hypothetical outcome plots (HOPs) by using ungeviz package.

2 Getting Started

2.1 Installing and loading the packages

For the purpose of this exercise, the following R packages will be used:

tidyverse, a family of R packages for data science process,
plotly for creating interactive plot,
gganimate for creating animation plot,
DT for displaying interactive html table,
crosstalk for for implementing cross-widget interactions (currently, linked brushing and filtering), and
ggdist for visualising distribution and uncertainty.
stringr for automatically wrapping text

pacman::p_load(plotly, crosstalk, DT, 
               ggdist, ggridges, colorspace,
               gganimate, tidyverse, stringr)

2.2 Data import

For the purpose of this exercise, Exam_data.csv will be used.

exam <- read_csv("data/Exam_data.csv")

3 Visualising the uncertainty of point estimates: ggplot2 methods

A point estimate is a single number, such as a mean. Uncertainty, on the other hand, is expressed as standard error, confidence interval, or credible interval.

Important

Don’t confuse the uncertainty of a point estimate with the variation in the sample

This section will go through how to plot error bars of maths scores by race by using data provided in exam tibble data frame.

Firstly, code chunk below will be used to derive the necessary summary statistics.

my_sum <- exam %>%
  group_by(RACE) %>%
  summarise(
    n=n(),
    mean=mean(MATHS),
    sd=sd(MATHS)
    ) %>%
  mutate(se=sd/sqrt(n-1))

Note

group_by() of dplyr package is used to group the observation by RACE,
summarise() is used to compute the count of observations, mean, standard deviation
mutate() is used to derive standard error of Maths by RACE, and
the output is save as a tibble data table called my_sum.

Next, the code chunk below will be used to display my_sum tibble data frame in an html table format.

Table
Code

RACE	n	mean	sd	se
Chinese	193	76.50777	15.69040	1.132357
Indian	12	60.66667	23.35237	7.041005
Malay	108	57.44444	21.13478	2.043177
Others	9	69.66667	10.72381	3.791438

knitr::kable(head(my_sum), format = 'html')

3.1 Plotting standard error bars of point estimates

Standard error bars of means maths score by race will be plotted as shown below.

Plot
Code

ggplot(my_sum) +
  geom_errorbar(
    aes(x=RACE, 
        ymin=mean-se, 
        ymax=mean+se), 
    width=0.2, 
    colour="black", 
    alpha=0.9, 
    linewidth=0.5) +
  geom_point(aes
           (x=RACE, 
            y=mean), 
           stat="identity", 
           color="red",
           size = 1.5,
           alpha=1) +
  ggtitle("Standard error of mean maths score by race")

Note

The error bars are computed by using the formula mean+/-se.
For geom_point(), it is important to indicate stat=“identity”.

3.2 Plotting confidence interval of point estimates

Instead of plotting the standard error bar of point estimates, the confidence intervals of mean maths score by race can also be plotted.

Plot
Code

ggplot(my_sum) +
  geom_errorbar(
    aes(x=reorder(RACE, -mean), 
        ymin=mean-1.96*se, 
        ymax=mean+1.96*se), 
    width=0.2, 
    colour="black", 
    alpha=0.9, 
    linewidth=0.5) +
  geom_point(aes
           (x=RACE, 
            y=mean), 
           stat="identity", 
           color="red",
           size = 1.5,
           alpha=1) +
  labs(x = "Maths score",
       title = "95% confidence interval of mean maths score by race")

3.3 Visualising the uncertainty of point estimates with interactive error bars

This section will cover how to plot interactive error bars for the 99% confidence interval of mean math score by race as shown below.

Plot
Code

shared_df = SharedData$new(my_sum)

bscols(widths = c(6,6),
       ggplotly((ggplot(shared_df) +
                   geom_errorbar(aes(
                     x=reorder(RACE, -mean),
                     ymin=mean-2.58*se, 
                     ymax=mean+2.58*se), 
                     width=0.2, 
                     colour="black", 
                     alpha=0.9, 
                     size=0.5) +
                   geom_point(aes(
                     x=RACE, 
                     y=mean, 
                     text = paste("Race:", `RACE`, 
                                  "<br>N:", `n`,
                                  "<br>Avg. Scores:", round(mean, digits = 2),
                                  "<br>95% CI:[", 
                                  round((mean-2.58*se), digits = 2), ",",
                                  round((mean+2.58*se), digits = 2),"]")),
                     stat="identity", 
                     color="red", 
                     size = 1.5, 
                     alpha=1) + 
                   xlab("Race") + 
                   ylab("Average Scores") + 
                   theme_minimal() + 
                   theme(axis.text.x = element_text(
                     angle = 45, vjust = 0.5, hjust=1)) +
                   ggtitle("99% Confidence interval of average <br>maths scores by race")), 
                tooltip = "text"), 
       DT::datatable(shared_df, 
                     rownames = FALSE, 
                     class="compact", 
                     width="100%", 
                     options = list(pageLength = 10,
                                    scrollX=T), 
                     colnames = c("No. of pupils", 
                                  "Avg Scores",
                                  "Std Dev",
                                  "Std Error")) %>%
         formatRound(columns=c('mean', 'sd', 'se'),
                     digits=2))

4 Visualising Uncertainty: ggdist package

ggdist is an R package that provides a flexible set of ggplot2 geoms and stats designed especially for visualising distributions and uncertainty.
It is designed for both frequentist and Bayesian uncertainty visualization, taking the view that uncertainty visualization can be unified through the perspective of distribution visualization:
- for frequentist models, one visualises confidence distributions or bootstrap distributions (see vignette(“freq-uncertainty-vis”));
- for Bayesian models, one visualises probability distributions (see the tidybayes package, which builds on top of ggdist).

4.1 Visualising the uncertainty of point estimates: ggdist methods

In the code chunk below, stat_pointinterval() of ggdist is used to build a visual for displaying distribution of maths scores by race.

Plot
Code

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_pointinterval() +
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Mean Point + Multiple-interval plot")

Note

This function comes with many arguments, it is best to read the syntax reference for more details.

For example, in the code chunk below the following arguments are used:

.width = 0.95
.point = median
.interval = qi

Plot
Code

exam %>%
  ggplot(aes(x = RACE, y = MATHS)) +
  stat_pointinterval(.width = 0.95,
  .point = median,
  .interval = qi) +
  labs(
    title = "Visualising confidence intervals of median math score",
    subtitle = "Median Point + Multiple-interval plot")

4.2 Visualising the uncertainty of point estimates: ggdist methods

To show 99% and 95% confidence, refer to the following code.

# For 99% CI
exam %>%
  ggplot(aes(x = RACE, y = MATHS)) +
  stat_pointinterval(.width = 0.99,
  .point = median,
  .interval = qi) +
  labs(
    title = "Visualising confidence intervals of median math score",
    subtitle = "Median Point + Multiple-interval plot")

# For 95% CI
exam %>%
  ggplot(aes(x = RACE, y = MATHS)) +
  stat_pointinterval(.width = 0.95,
  .point = median,
  .interval = qi) +
  labs(
    title = "Visualising confidence intervals of median math score",
    subtitle = "Median Point + Multiple-interval plot")

Gentle advice: This function comes with many arguments, it is advisable to read the syntax reference for more details.

4.3 Visualising the uncertainty of point estimates: ggdist methods

In the code chunk below, stat_gradientinterval() of ggdist is used to build a visual for displaying distribution of maths scores by race.

Plot
Code

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_gradientinterval(   
    fill = "skyblue",      
    show.legend = TRUE     
  ) +                        
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Gradient + interval plot")

Gentle advice: This function comes with many arguments, it is advisable to read the syntax reference for more details.

5 Visualising Uncertainty With Hypothetical Outcomes Plots (HOPs)

5.1 Installing ungeviz package

devtools::install_github("wilkelab/ungeviz")

5.2 Launch the application in R

library(ungeviz)

5.3 Visualing Uncertainty with Hypothetical Outcome Plots (HOPs)

Next, the code chunk below will be used to build the HOPs.

Plot
Code

ggplot(data = exam, 
       (aes(x = factor(RACE), 
            y = MATHS))) +
  geom_point(position = position_jitter(
    height = 0.3, 
    width = 0.05), 
    size = 0.4, 
    color = "#0072B2", 
    alpha = 1/2) +
  geom_hpline(data = sampler(25, 
                             group = RACE), 
              height = 0.6, 
              color = "#D55E00") +
  theme_bw() + 
  transition_states(.draw, 1, 3)