Skip to contents

TL;DR

After having learned how nowcasting works, we add unobserved dates to our delay triangle and forecasting, i.e., project the number of cases beyond the data already observed.

Forecasting

Nowcasting is the estimation of the events that have already started but not yet reported. In epidemiology, the events as cases of an infectious disease that has onset its first symptoms but is not yet captured from the notification system. Forecasting is simply the estimation of the events that have not yet started their onset of first symptoms, i.e., cases that are in a probable incubation period. From the data structure that any database has is the cases that have a date of onset greater than present date. In the language of delay triangle, forecasting will only be to estimate the part that is completely unobserved.

Parameter K > 0

The nowcasting_inla function has a functional parameter that flags to the model parsed to inla to construct the delay triangle with dates greater than the present date. This is done by the setting the K parameter to a value greater than 0, like K=1,2,3,K = 1,2,3,\dots. We proceed as in any other example of nowcasting. We will be forecasting as was done for nowcasting in the article Nowcasting for decision making.

First, we cut the lazy data loaded with the namespace of the package to a date of interest. For more details on why we cut on this date, see the article Nowcasting for decision making. And after this we parse to the nowcasting_inla function returns the date of onset and date of report columns, a flag for returning a data.frame with the case counts by week and setting the K parameter to a desired forecasting horizon, here we forecast one week ahead of the present date.

Before forecasting, we construct the case counts curve from the lazy data, by using the internal function data.w_no_age, we construct the curve by summarizing the case counts by each date of onset of the first symptoms present in the database.

library(tidyverse)
library(lubridate)
library(nowcaster)

## To see Nowcasting as if we were on the verge of a rise in the curve
data("sragBH")
srag_now <- sragBH |> 
  filter(DT_DIGITA <= "2020-07-04")

data_by_week <- data.w_no_age(dataset = srag_now, 
                             date_onset = DT_SIN_PRI, 
                             date_report = DT_DIGITA, 
                             K = 0) |> 
  group_by(date_onset) |> 
  tally()

data_by_week |>  
  ggplot(aes(x = date_onset, 
             y = n))+
  geom_line()+
  theme_bw()+
  labs(x = 'Date of onset of symptons', 
       y = 'Nº Cases')+
  scale_color_manual(values = c('grey50', 'black'), 
                     name = '')+
  scale_x_date(date_breaks = '2 weeks', 
               date_labels = '%V/%y', 
               name = 'Date in Weeks')

Now, to forecast one week ahead, we just flag the K = 1, and proceed as in normal nowcasting calling, the code is:

nowcasting_bh_no_age <- nowcasting_inla(dataset = srag_now, 
                                        date_onset = DT_SIN_PRI, 
                                        date_report = DT_DIGITA, 
                                        data.by.week = T, 
                                        K = 1)
head(nowcasting_bh_no_age$data)
#> # A tibble: 6 × 4
#>   dt_event   delay     Y  Time
#>   <date>     <dbl> <dbl> <int>
#> 1 2019-12-28     0     1     1
#> 2 2019-12-28     1     2     1
#> 3 2019-12-28     2     4     1
#> 4 2019-12-28     3     5     1
#> 5 2019-12-28     4     2     1
#> 6 2019-12-28     5     2     1

To visualize the forecast, we plot the observed curve and the nowcasting and forecasting estimate, change the colors for each part. We plot together the nowcasting and forecasting estimates, the curve observed one year later, to validate the estimate made.


## Nowcasting estimate filtering
nowcasting_estimate<-nowcasting_bh_no_age$total |> 
  filter(dt_event <= "2020-06-27")
## Forecasting estimate filtering
forecasting_estimate<-nowcasting_bh_no_age$total |> 
  filter(dt_event >= "2020-06-27")

## One year after observed
data_one_year_after<-sragBH %>%
  filter(DT_SIN_PRI <= "2020-07-11") %>% 
  mutate(
    D_SIN_PRI_2 = DT_SIN_PRI - as.numeric(format(DT_SIN_PRI, "%w"))
  ) %>% 
  group_by(D_SIN_PRI_2) %>% 
  tally()


## Plotting
nowcasting_estimate |> 
  ggplot(aes(x = dt_event, y = Median, col = "Nowcasting"))+
  geom_line()+
  geom_ribbon(aes(ymin = LI, ymax = LS, col = "Nowcasting", fill = "Nowcasting"), 
              alpha = 0.2, 
              show.legend = F)+
  geom_line(data = forecasting_estimate, 
            aes(x = dt_event, y = Median, col = "Forecasting"))+
  geom_ribbon(data = forecasting_estimate, 
              aes(ymin = LI, ymax = LS, col = "Forecasting", fill = "Forecasting"), 
              alpha = 0.2,
              show.legend = F)+
  geom_line(data = data_by_week, 
            aes(date_onset, y = n, col = 'Observed'))+
  geom_line(data = data_one_year_after, 
            mapping = aes(x = D_SIN_PRI_2, y = n, 
                          color = "Observed after one year")) + 
  theme_bw() +
  theme(legend.position = "bottom", 
        axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c('orange','grey50', 'black', 'red'), 
                     name = '')+
  scale_fill_manual(values = c('orange','grey50'), 
                     name = '')+
  scale_x_date(date_breaks = '2 weeks', 
               date_labels = '%V/%y', 
               name = 'Date in Weeks')+
  labs(x = '', 
       y = 'Nº Cases')

Uncertainty grows

As we try to forecasting longer ahead of the present date, the uncertainty has to grow. This is seen by the opening of the confidence intervals on the last graph. Although it is wide, the realized curve still falls inside it, indicating that it is assertive of the tendency of the curve at least. This can be taken to support decision-making when planning ahead of the present date. To any other forecasting horizon further than 1 week ahead, the confidence interval will be wider than this, to help a little, we can try forecasting with an imposed structured on the data. Now, we forecasting the parsing of the columns for age to the nowcasting function, this can reduce the opening of the confidence interval.

nowcasting_bh_age <- nowcasting_inla(dataset = srag_now, 
                                     date_onset = DT_SIN_PRI, 
                                     date_report = DT_DIGITA, 
                                     data.by.week = T,
                                     K = 1,
                                     age_col = Idade)
head(nowcasting_bh_age$data)
#> # A tibble: 6 × 5
#>   dt_event   delay fx_etaria     Y  Time
#>   <date>     <dbl> <fct>     <dbl> <int>
#> 1 2019-12-28     0 1             0     1
#> 2 2019-12-28     0 2             0     1
#> 3 2019-12-28     0 3             0     1
#> 4 2019-12-28     0 4             0     1
#> 5 2019-12-28     0 5             0     1
#> 6 2019-12-28     0 6             0     1

And we plot the nowcasting estimate and forecasting estimate again:


## Nowcasting estimate filtering
nowcasting_estimate<-nowcasting_bh_age$total |> 
  filter(dt_event <= "2020-06-27")
## Forecasting estimate filtering
forecasting_estimate<-nowcasting_bh_age$total |> 
  filter(dt_event >= "2020-06-27")

## Plotting
nowcasting_estimate |> 
  ggplot(aes(x = dt_event, y = Median, col = "Nowcasting"))+
  geom_line()+
  geom_ribbon(aes(ymin = LI, ymax = LS, col = "Nowcasting", fill = "Nowcasting"), 
              alpha = 0.2, 
              show.legend = F)+
  geom_line(data = forecasting_estimate, 
            aes(x = dt_event, y = Median, col = "Forecasting"))+
  geom_ribbon(data = forecasting_estimate, 
              aes(ymin = LI, ymax = LS, col = "Forecasting", fill = "Forecasting"), 
              alpha = 0.2,
              show.legend = F)+
  geom_line(data = data_by_week, 
            aes(date_onset, y = n, col = 'Observed'))+
  geom_line(data = data_one_year_after, 
            mapping = aes(x = D_SIN_PRI_2, y = n, 
                          color = "Observed after one year")) + 
  theme_bw() +
  theme(legend.position = "bottom", 
        axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c('orange','grey50', 'black', 'red'), 
                     name = '')+
  scale_fill_manual(values = c('orange','grey50'), 
                     name = '')+
  scale_x_date(date_breaks = '2 weeks', 
               date_labels = '%V/%y', 
               name = 'Date in Weeks')+
  labs(x = '', 
       y = 'Nº Cases')

Conclusion

In this vignette, we explored the forecasting estimates that the model can produces. We have add the K parameter different than 0 meaning we want an estimate that goes beyond the partial unobserved data. We further compared forecasting produced considering age class as another predictor, as explored on the Nowcasting on Structured data, we fit a model to each of the age classes present on the data but now we make use of K equals to 1, meaning to add a line of completely unobserved dates to the delay triangle.