Building reproducible
workflows with {targets}

Welcome!

Plan for today

Why care about workflows?
How {targets} works
Hand-on {targets} practice!

About me

Andrew Heiss

andrewheiss.com @andrew.heiss.phd

@andrewheiss andrewheiss

Assistant professor of public policy, Georgia State University
Data visualization, statistics, and causal inference

Andrew's headshot

Follow along

All the materials for today are accessible at

andhs.co/northwestern

Why care about workflows?

Statistical research
is a complicated,
messy process!

Itty bitty pieces

Data
Statistical results
Code
Fieldwork
Interviews
Analysis
Figures
Images
Tables
Citations
Your actual words

Each of these come
from different places!

Each of these can be
in a different state!

Approaches for handling all the itty bitty pieces

The Office model

Put everything in one document

Everything lives in one .docx file

The Engineering model

Embrace the bittiness and compile it all at the end

Everything lives separately and is combined in the end
Quarto!

Approaches for handling different states

YOLO workflow

Mentally remember to run all the scripts when data changes, replace old figures/tables/values with new values, and manually run everything in the right order.

Procedural workflow

Carefully document the precise order that your scripts run, maybe even with a master script that runs everything for you. Run the master script when data changes and rebuild the whole thing every time. Maybe get fancy with things like Quarto caching/freezing.

Functional workflow

Divide workflow into separate objects and let software keep track of which things are out of date and orchestrate which things need to re-run. Run one command to rebuild the whole project, skipping dependencies that don’t need to build again.

My own workflow journey

YOLO workflow

01_clean.R + 02_analysis.R + 03_plots.R

Procedural workflow

R Markdown/Quarto websites (example)

01_clean.Rmd + 02_analysis.Rmd + 03_plots.Rmd + caching

Functional workflow

Makefiles (example) → {targets} pipelines (example)

How {targets} works

{targets} documentation

There’s a whole Quarto book with detailed documentation

General workflow

Create functions that make things (or “targets”; distinct objects that you can do stuff with)
Build these targets with tar_make()
- {targets} keeps track of upstream and downstream dependencies and skips targets if nothing has changed
Load a target into an R session with tar_load(target_name) or blah <- tar_read(target_name)

Anatomy of `_targets.R`

_targets.R

library(targets)

# General pipeline settings
# ---------------------------
tar_option_set(
  packages = c("tibble") # Packages that your targets need for their tasks.
)

# Load functions
# ----------------
# Run the R scripts in the R/ folder with your custom functions:
tar_source()

# Actual pipeline
# -----------------
list(
  tar_target(
    name = data,  # Conceptually the same as saying `data <- tibble(...)`
    command = tibble(x = rnorm(100), y = rnorm(100))
  ),
  tar_target(
    name = model,  # Concetpually the same as saying `model <- coefficients(...)`
    command = coefficients(lm(y ~ x, data = data))
  )
)

Viewing the pipeline

tar_glimpse()

tar_visnetwork()

Building the pipeline

Build the whole pipeline:

tar_make()
#> + data dispatched                           
#> ✔ data completed [5ms, 1.75 kB]
#> + model dispatched
#> ✔ model completed [2ms, 113 B]
#> ✔ ended pipeline [132ms, 2 completed, 0 skipped]

Build it again, everything gets skipped!

tar_make()
#> ✔ skipped pipeline [61ms, 2 skipped]

Change something in model, then re-run:

tar_make()
#> + model dispatched                          
#> ✔ model completed [1ms, 108 B]
#> ✔ ended pipeline [88ms, 1 completed, 1 skipped]

Build specific targets:

tar_make(model)

Build multiple targets:

tar_make(c(data, model))

Use tidyselect selectors:

tar_make(starts_with("model_"))
tar_make(contains("tbl"))

Using targets

In a different R script or Quarto file:

library(targets)

# This loads the target as its name
tar_load(data)

# Do stuff with it
plot(data)

If you don’t want to use the target’s actual name, use tar_read():

library(targets)

# This lets you assign the target to a new object
my_neat_data <- tar_read(data)

# Do stuff with it
plot(my_neat_data)

Behind the scenes

{targets} stores each target as an extension-less .rds file in _targets/objects:

You can access a full data frame of all the target metadata if you really want

tar_meta() |> View()

Neat advanced stuff

Automatic parallel processing
Automatic remote HPC processing
Store targets in the cloud
Programmatically generate targets

{targets} and elections

2024 Idaho elections

That’s all really abstract—
let’s practice {targets} together!

andhs.co/northwestern

Building reproducibleworkflows with {targets}

Welcome!

Plan for today

About me

Follow along

Why care about workflows?

Itty bitty pieces

Approaches for handling all the itty bitty pieces

The Office model

The Engineering model

Approaches for handling different states

YOLO workflow

Procedural workflow

Functional workflow

My own workflow journey

YOLO workflow

Procedural workflow

Functional workflow

How {targets} works

{targets} documentation

General workflow

Anatomy of _targets.R

Viewing the pipeline

Building the pipeline

Using targets

Behind the scenes

Neat advanced stuff

{targets} and elections

2024 Idaho elections

That’s all really abstract—let’s practice {targets} together!

Building reproducible
workflows with {targets}

Anatomy of `_targets.R`

That’s all really abstract—
let’s practice {targets} together!