FairChoices Architecture
Technical documentation of the system architecture, file structure, and development guidelines
Summary
FairChoices includes an online user interface designed to guide the development of health benefits packages, with a strong emphasis on intervention cost-effectiveness. The platform also comprises an input tool for data entry and updates, as well as a validation tool to ensure data quality and consistency.
Technology Stack
We use two technologies and frameworks to power FairChoices:
- R programming language: For writing component code that computes the end-to-end analytical tool
- Shiny: A package that makes it easy to build interactive web apps straight from R
- React JS: Beta version of the web interface and input tool under development
General Guidelines and Rules of Thumb
Data Frame Principle
“Data frame in, data frame out” – Generally, any integration between components should be formatted in a data frame. You can do whatever computation you want within a particular component, but keep integration between components in data frame format.
Benefits of Data Frame Approach
- Legibility: Each step produces human-readable, simplified output
- Data Storage: Consistent format for storage and retrieval
- Logging: Easy to track and log each step of the process
- Testing: Simplified testing and validation of each component
System Architecture
Fig. 1: System Architecture Diagram
Component Overview
User Interface Layer
Interactive web applications built with Shiny and React JS, providing intuitive interfaces for health benefit package design
API Layer
RESTful API built with Plumber package, handling requests between frontend and backend systems
Business Logic Layer
Custom R functions implementing health economics models, cost-effectiveness calculations, and optimization algorithms
Data Processing Layer
Data cleaning, transformation, and model execution components that process demographic and epidemiological data
Data Storage Layer
PostgreSQL database for structured data and CSV files for reference datasets, including GBD data and intervention parameters
File Structure
The FairChoices project follows a well-organized directory structure to maintain code quality and facilitate collaboration:
Directory Descriptions
config/
Contains initialization scripts, environment variables, and configuration files for different deployment environments
data/
Stores raw data files (CSV, RDS) and serves as the central repository for all input datasets including GBD data, intervention parameters, and country-specific information
scripts/
Organized into subdirectories for different types of processing: cleaning, functions, modeling, and data transformation scripts
user-interface/
Contains all frontend code including Shiny applications, React components, and user interface assets
api/
Backend API implementation using Plumber package, with organized controllers, models, and routing configuration
Extracting Data from Global Burden of Disease
Data extraction from the Global Burden of Disease (GBD) study follows a systematic approach using the GBD Results Tool at https://vizhub.healthdata.org/gbd-results
GBD Data Categories
1. Cause of Death or Injury
Measures: Deaths, YLDs, Prevalence, Incidence
Metric: Number, Rate
Cause: Select all causes
Location: Select all countries and territories (filter in cleaning)
Age: Select all (filter in cleaning)
Sex: Male, Female, Both
Year: 2019
2. Impairment
Measures: YLDs, Prevalence
Metric: Rate
Impairment: Select all impairments (filter in cleaning)
Cause: Select all causes
Location: Select all countries and territories (filter in cleaning)
3. Etiology
Measures: Deaths, YLDs
Metric: Rate
Etiology: Select all etiologies (filter in cleaning)
Cause: Select all causes
Location: Select all countries and territories (filter in cleaning)
4. Injuries by Nature
Measures: YLDs, Prevalence, Incidence
Metric: Rate
Injury: Select all injuries (filter in cleaning)
Cause: To be determined
Location: Select all countries and territories (filter in cleaning)
Data Processing Note
All data should be downloaded with broad selection criteria and filtered during the cleaning process to ensure consistency and maintain data integrity.
Clean
Data cleaning is performed through dedicated scripts saved in the scripts/clean folder, following established rules and best practices.
Cleaning Rules
- PostgreSQL Schema: All cleaned data should be reflected in the PostgreSQL schema
- Long Format: Clean data tables should be stored in long format for optimal storage and querying
- Minimal Overlap: Clean data tables should have as little overlapping information (columns) with other clean data tables
Cleaning Process
Step 1: Raw Data Assessment
Evaluate raw data quality, completeness, and structure before beginning cleaning process
Step 2: Standardization
Standardize variable names, formats, and coding schemes across all datasets
Step 3: Validation
Implement data validation checks to identify outliers, missing values, and inconsistencies
Step 4: Transformation
Transform data into appropriate formats for analysis, including long format conversion and variable derivation
Scripts
The scripts/clean folder contains scripts that take raw data, clean them, and save them to data/clean. This data serves as input for all subsequent analytic scripts including demography, epidemiology, and modeling components.
Script Requirements
Mandatory Metadata
The beginning of each script must contain basic metadata on where the raw data came from, including data source, URL, processor name, and processing date.
Template Structure
Script Categories
Data Ingestion Scripts
Scripts that download, import, and perform initial validation of raw data from external sources like GBD, WHO, and World Bank
Cleaning Scripts
Data transformation scripts that standardize formats, handle missing values, and ensure consistency across datasets
Validation Scripts
Quality assurance scripts that check data integrity, identify outliers, and validate assumptions
Integration Scripts
Scripts that combine multiple cleaned datasets and create analysis-ready data frames for modeling
Best Practices
- Modularity: Keep scripts focused on specific tasks for better maintainability
- Error Handling: Implement comprehensive error handling and logging
- Documentation: Include detailed comments explaining complex transformations
- Version Control: Use git for tracking changes and collaboration
- Testing: Include unit tests for critical data transformations
Template Availability
The scripts/clean folder contains a template (scripts/clean/template.R) to help users get started with proper script structure and metadata formatting.