Hell no, fitting 16 regressions at the same time.

Straight into the pool

This post assumes you are already familiar with regression. No attempt will be made to explain the concept or justify any decision. The purpose of this post is to demonstrate a cool feature you can use to fit several models at once.

Required packages

tidyverse (of course, 😀)
wakefield (to generate random data)
gtsummary (for table outputs)

Problem

For a strange reason, which I recently encountered, I had to fit a different regression for each region in the country to test the true effect of what i was seeing.

I could fit it one by one, a pain it would be. But then, the only reason why we write computer programs is because we are lazy and smart too. We use our smartness to benefit ourselves.

require(wakefield)

Loading required package: wakefield

require(ggpubr)

Loading required package: ggpubr

Loading required package: ggplot2

require(gtsummary)

Loading required package: gtsummary

#StandWithUkraine

require(tidyverse)  ## If you are an ardent follower of tidyverse, always load it last

Loading required package: tidyverse

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()       masks stats::filter()
✖ lubridate::hour()     masks wakefield::hour()
✖ dplyr::id()           masks wakefield::id()
✖ lubridate::interval() masks wakefield::interval()
✖ dplyr::lag()          masks stats::lag()
✖ lubridate::minute()   masks wakefield::minute()
✖ lubridate::month()    masks wakefield::month()
✖ lubridate::second()   masks wakefield::second()
✖ lubridate::year()     masks wakefield::year()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

regs <- c(  ## regions in the country
  "Ashanti",
  "Bono",
  "Bono East",
  "Ahafo",
  "Central",
  "Eastern",
  "Greater Accra",
  "Northern",
  "Savannah",
  "North East",
  "Upper East",
  "Upper West",
  "Volta",
  "Oti",
  "Western",
  "Western North"
)

Generate the data

dset <- tibble(
  reg = character(),
  gender = factor(),
  age = integer(),
  income = numeric()
)

for( r in regs){
  n <-
    r_sample_integer(1, x = c(20:30)) # the number of records to generate for each region. The higher the range, the more time it takes, be conservative
  for (i in 1:n) {
    dset <-
      dset %>%
      add_row(
        reg = r,
        gender = gender(n) |> as.character(),
        age = age(n) |> as.integer(),
        income = income(n) |> as.numeric()
      )
  }
}

## Show a sample of the data
dset |> sample_n(5)

# A tibble: 5 × 4
  reg           gender   age income
  <chr>         <chr>  <int>  <dbl>
1 Savannah      Female    84 69635.
2 Upper East    Male      71 40048.
3 Northern      Female    83 29449.
4 Greater Accra Female    63  3092.
5 Western North Female    28 40313.

Number of records generated: 25.

Business time

(?) is an anonymous function. An anonymous function is ran once in the code 
and no more references can be made to the function. It is great when used in iterations
as the function is only used within the iteration and discarded afterwards. 

Don't worry, you will get around and love it too.

_?_ can be any valid variable name. I often use x

First of all the data is grouped based on the region and nested. The nest function comes from dplyr
Then we mutate, remember to mutate either creates a new column or modifies an existing one. In this case, we are creating a new one which we call model
It will contain the lm output for each column.
x from the anonymous function, which contains the nested data is supplied as an argument to the data argument

mod_data <- dset %>% 
  group_by(reg) %>% 
 nest() %>%
  mutate(
    model = map(data, (x) lm( x$income ~ ., data = x))
  )

mod_data

# A tibble: 16 × 3
# Groups:   reg [16]
   reg           data               model 
   <chr>         <list>             <list>
 1 Ashanti       <tibble [529 × 3]> <lm>  
 2 Bono          <tibble [784 × 3]> <lm>  
 3 Bono East     <tibble [841 × 3]> <lm>  
 4 Ahafo         <tibble [484 × 3]> <lm>  
 5 Central       <tibble [400 × 3]> <lm>  
 6 Eastern       <tibble [576 × 3]> <lm>  
 7 Greater Accra <tibble [625 × 3]> <lm>  
 8 Northern      <tibble [784 × 3]> <lm>  
 9 Savannah      <tibble [841 × 3]> <lm>  
10 North East    <tibble [400 × 3]> <lm>  
11 Upper East    <tibble [900 × 3]> <lm>  
12 Upper West    <tibble [400 × 3]> <lm>  
13 Volta         <tibble [441 × 3]> <lm>  
14 Oti           <tibble [900 × 3]> <lm>  
15 Western       <tibble [484 × 3]> <lm>  
16 Western North <tibble [625 × 3]> <lm>

Extracting our models

(m) is an anonymous function. It is executed and discarded. The function is not stored in the global environment and cannot be referenced.
tbl_stack and tbl_regression are from the gtsummary package.
tbl_stack puts a tbl_object on top of each other.
tbl_regression displays a regression model, nicely formatted
the mod_data$model contains many models.
The map function loops over the values in the column and produces a list of tbl_regression objects – Which is used by the tbl_stack function

stk <- tbl_stack(map(mod_data$model, (m)
                     tbl_regression(m, conf.int =T,  label = 
                                      list(age = "Age",
                                     gender  = "Gender")
                                    )),
                 group_header = mod_data$reg)

In all its glory

stk %>% 
  as_hux_table()  # require the _huxtable_ package to be installed

Group	Characteristic	Beta	95% CI	p-value
Ashanti	Gender
	Female	—	—
	Male	-1,454	-5,982, 3,075	0.5
	Age	8.7	-99, 116	0.9
Bono	Gender
	Female	—	—
	Male	569	-3,380, 4,518	0.8
	Age	-1.4	-96, 93	>0.9
Bono East	Gender
	Female	—	—
	Male	969	-2,756, 4,695	0.6
	Age	40	-51, 131	0.4
Ahafo	Gender
	Female	—	—
	Male	988	-4,260, 6,237	0.7
	Age	-187	-310, -64	0.003
Central	Gender
	Female	—	—
	Male	-161	-5,503, 5,180	>0.9
	Age	28	-97, 153	0.7
Eastern	Gender
	Female	—	—
	Male	2,269	-2,336, 6,874	0.3
	Age	58	-54, 170	0.3
Greater Accra	Gender
	Female	—	—
	Male	196	-4,723, 5,115	>0.9
	Age	-30	-145, 86	0.6
Northern	Gender
	Female	—	—
	Male	294	-3,615, 4,203	0.9
	Age	18	-80, 115	0.7
Savannah	Gender
	Female	—	—
	Male	-547	-4,425, 3,330	0.8
	Age	2.2	-89, 93	>0.9
North East	Gender
	Female	—	—
	Male	-817	-6,354, 4,720	0.8
	Age	-56	-196, 84	0.4
Upper East	Gender
	Female	—	—
	Male	4,391	565, 8,217	0.025
	Age	80	-15, 174	0.10
Upper West	Gender
	Female	—	—
	Male	188	-5,543, 5,919	>0.9
	Age	-47	-186, 93	0.5
Volta	Gender
	Female	—	—
	Male	-346	-4,997, 4,305	0.9
	Age	-17	-132, 97	0.8
Oti	Gender
	Female	—	—
	Male	2,185	-1,684, 6,055	0.3
	Age	-86	-179, 8.1	0.073
Western	Gender
	Female	—	—
	Male	3,078	-1,759, 7,916	0.2
	Age	8.2	-107, 123	0.9
Western North	Gender
	Female	—	—
	Male	1,832	-3,080, 6,744	0.5
	Age	-4.4	-123, 114	>0.9
CI = Confidence Interval