FairChoices Architecture – User Guide | FairChoices

Summary

FairChoices includes an online user interface designed to guide the development of health benefits packages, with a strong emphasis on intervention cost-effectiveness. The platform also comprises an input tool for data entry and updates, as well as a validation tool to ensure data quality and consistency.

Technology Stack

We use two technologies and frameworks to power FairChoices:

  • R programming language: For writing component code that computes the end-to-end analytical tool
  • Shiny: A package that makes it easy to build interactive web apps straight from R
  • React JS: Beta version of the web interface and input tool under development

General Guidelines and Rules of Thumb

Data Frame Principle

“Data frame in, data frame out” – Generally, any integration between components should be formatted in a data frame. You can do whatever computation you want within a particular component, but keep integration between components in data frame format.

Benefits of Data Frame Approach

  • Legibility: Each step produces human-readable, simplified output
  • Data Storage: Consistent format for storage and retrieval
  • Logging: Easy to track and log each step of the process
  • Testing: Simplified testing and validation of each component

System Architecture

Fig. 1: System Architecture Diagram

User Interface (Shiny/React)
API Layer (Plumber)
Business Logic (R Functions)
Data Processing & Models
Data Storage (PostgreSQL/CSV)

Component Overview

User Interface Layer

Interactive web applications built with Shiny and React JS, providing intuitive interfaces for health benefit package design

API Layer

RESTful API built with Plumber package, handling requests between frontend and backend systems

Business Logic Layer

Custom R functions implementing health economics models, cost-effectiveness calculations, and optimization algorithms

Data Processing Layer

Data cleaning, transformation, and model execution components that process demographic and epidemiological data

Data Storage Layer

PostgreSQL database for structured data and CSV files for reference datasets, including GBD data and intervention parameters

File Structure

The FairChoices project follows a well-organized directory structure to maintain code quality and facilitate collaboration:

FairChoices/
├── config/
│ └── init.R # Environment setup and initialization
├── data/
│ └── *.csv / *.rds # Raw and processed data files
├── scripts/
│ ├── clean/ # Scripts for cleaning and preprocessing data
│ ├── fxns/ # Custom R functions used across the app
│ ├── model/ # Model logic and statistical scripts
│ ├── objects/ # Precomputed objects or constants
│ └── process/ # Scripts for processing or transforming data
├── user-interface/
│ ├── app/ # Main Shiny app entry point
│ │ └── src/
│ │ ├── global.R # Global variables and shared code
│ │ ├── ui.R # User interface layout
│ │ └── server.R # Server logic
│ └── [other_shiny_apps]/ # Additional modular Shiny apps
├── api/
│ ├── controllers/ # Logic for handling API requests
│ ├── models/ # Data models and business logic
│ ├── functions/ # Helper functions for APIs
│ └── plumber.R # API routes and Plumber configuration

Directory Descriptions

config/

Contains initialization scripts, environment variables, and configuration files for different deployment environments

data/

Stores raw data files (CSV, RDS) and serves as the central repository for all input datasets including GBD data, intervention parameters, and country-specific information

scripts/

Organized into subdirectories for different types of processing: cleaning, functions, modeling, and data transformation scripts

user-interface/

Contains all frontend code including Shiny applications, React components, and user interface assets

api/

Backend API implementation using Plumber package, with organized controllers, models, and routing configuration

Extracting Data from Global Burden of Disease

Data extraction from the Global Burden of Disease (GBD) study follows a systematic approach using the GBD Results Tool at https://vizhub.healthdata.org/gbd-results

GBD Data Categories

1. Cause of Death or Injury

Measures: Deaths, YLDs, Prevalence, Incidence
Metric: Number, Rate
Cause: Select all causes
Location: Select all countries and territories (filter in cleaning)
Age: Select all (filter in cleaning)
Sex: Male, Female, Both
Year: 2019

2. Impairment

Measures: YLDs, Prevalence
Metric: Rate
Impairment: Select all impairments (filter in cleaning)
Cause: Select all causes
Location: Select all countries and territories (filter in cleaning)

3. Etiology

Measures: Deaths, YLDs
Metric: Rate
Etiology: Select all etiologies (filter in cleaning)
Cause: Select all causes
Location: Select all countries and territories (filter in cleaning)

4. Injuries by Nature

Measures: YLDs, Prevalence, Incidence
Metric: Rate
Injury: Select all injuries (filter in cleaning)
Cause: To be determined
Location: Select all countries and territories (filter in cleaning)

Data Processing Note

All data should be downloaded with broad selection criteria and filtered during the cleaning process to ensure consistency and maintain data integrity.

Clean

Data cleaning is performed through dedicated scripts saved in the scripts/clean folder, following established rules and best practices.

Cleaning Rules

  • PostgreSQL Schema: All cleaned data should be reflected in the PostgreSQL schema
  • Long Format: Clean data tables should be stored in long format for optimal storage and querying
  • Minimal Overlap: Clean data tables should have as little overlapping information (columns) with other clean data tables

Cleaning Process

Step 1: Raw Data Assessment

Evaluate raw data quality, completeness, and structure before beginning cleaning process

Step 2: Standardization

Standardize variable names, formats, and coding schemes across all datasets

Step 3: Validation

Implement data validation checks to identify outliers, missing values, and inconsistencies

Step 4: Transformation

Transform data into appropriate formats for analysis, including long format conversion and variable derivation

Scripts

The scripts/clean folder contains scripts that take raw data, clean them, and save them to data/clean. This data serves as input for all subsequent analytic scripts including demography, epidemiology, and modeling components.

Script Requirements

Mandatory Metadata

The beginning of each script must contain basic metadata on where the raw data came from, including data source, URL, processor name, and processing date.

Template Structure

# Source: # – World Population Prospects 2022 # – https://population.un.org/wpp/Download/Standard/MostUsed/ # Processor: # – Sarah Bolongaita # – 2023-05-12 # Load required libraries library(tidyverse) library(readr) # Read raw data raw_data <- read_csv("data/raw/population_data.csv") # Data cleaning steps cleaned_data % # Standardize variable names rename( country = `Country or Area`, year = Year, population = `Value` ) %>% # Filter for relevant years filter(year >= 2000) %>% # Remove missing values drop_na() %>% # Standardize country names mutate(country = case_when( country == “United States” ~ “USA”, country == “United Kingdom” ~ “UK”, TRUE ~ country )) # Save cleaned data write_csv(cleaned_data, “data/clean/population_clean.csv”)

Script Categories

Data Ingestion Scripts

Scripts that download, import, and perform initial validation of raw data from external sources like GBD, WHO, and World Bank

Cleaning Scripts

Data transformation scripts that standardize formats, handle missing values, and ensure consistency across datasets

Validation Scripts

Quality assurance scripts that check data integrity, identify outliers, and validate assumptions

Integration Scripts

Scripts that combine multiple cleaned datasets and create analysis-ready data frames for modeling

Best Practices

  • Modularity: Keep scripts focused on specific tasks for better maintainability
  • Error Handling: Implement comprehensive error handling and logging
  • Documentation: Include detailed comments explaining complex transformations
  • Version Control: Use git for tracking changes and collaboration
  • Testing: Include unit tests for critical data transformations

Template Availability

The scripts/clean folder contains a template (scripts/clean/template.R) to help users get started with proper script structure and metadata formatting.