FairChoices runs on four abstracted components:
See below for a detailed breakdown of how these components are implemented. For details on the specification of these components — that is, their analytical framework — check out our analytical overview.
We use two technologies and/or frameworks to power FairChoices:
Data frame in, data frame out: Generally, any integration between components — for example, the loading of clean data into the impact model or the integration between the impact model and the demography model — should be formatted in a data frame. You can do whatever computation you want within a particular component — transform the data frame into a matrix, break out several individual vectors, whatever is optimal — but generally try to keep the integration between components in a data frame format. This helps with legibility, data storage, logging, and testing as each step along the way will be able to produce a human-readable, simplified output.
Fig. 1: Schematic of the FairChoices model
The data/raw
folder contains raw data. This data can be from an online source (e.g., GBD, WPP) or created manually if the information it contains comes from the academic literature (e.g., intervention effect sizes). If the data comes from an online source , it should be saved into the data/raw
folder exactly as it was downloaded (i.e., without any manual modifications in Excel). All data cleaning must be done in R through a data cleaning script, which must be saved in the scripts/clean
folder.
data/raw
file name should be the date of download in YYYYMMDD
format.data/raw
folder, you are responsible for also cleaning the data in R.https://vizhub.healthdata.org/gbd-results
The data/clean
folder contains cleaned data that originates from the data/raw
folder. The only way that data can be added to the data/clean
folder is through a data cleaning script (saved in the scripts/clean
folder).
data/clean
data should be reflected in the PostgreSQL schema.The scripts/clean
folder contains scripts that take data from data/raw
, clean them, and save them to data/clean
. These scripts take data from the data/raw
folder, clean it to a basic level, and save it to the data/clean
folder. This data will be used by a wide variety of FairChoices users and will be the input data for all subsequent analytic (e.g., demography, epidemiology, etc.) scripts.
The beginning of each script must contain basic metadata on where the raw data came from. Include information on the data source, including the URL if relevant, as well as who was responsible for processing the raw data and the date it was last processed (see example below). The scripts/clean
folder contains a template (scripts/clean/template.R
) to help users getting started.
# Source:
# - World Population Prospects 2022
# - https://population.un.org/wpp/Download/Standard/MostUsed/
# Proccessor:
# - Sarah Bolongaita
# - 2023-05-12