-
Notifications
You must be signed in to change notification settings - Fork 25
Using tabular files as inputs
Most of the vignettes in heemod
, including vignette("homogeneous", package = "heemod")
, vignette("non-homogeneous", package = "heemod")
and vignette("probabilistic", package = "heemod")
, provide input to heemod through the console. Models with large numbers of states and transition probabilities might be unwieldy to enter at the keyboard, and in that case, and possibly others, it may be convenient to specify model inputs in files and have those files read into the system.
This vignette demonstrates how to make and use files to input states, transition matrices, parameters, and other inputs to heemod. As an example, we will use the total hip replacement model from Decision Modelling for Health Economic Evaluation, which is discussed in vignette("non-homogeneous", package = "heemod")
. It further introduces run_models_from_files
, a function that reads those files and then runs a suite of analyses (including, depending on inputs, deterministic and probabilistic sensitivity analyses, and acceptability curve calculation) and optionally writes out results to disk.
To define a model in heemod, we have to provide the states, transition probabilities and model parameters at minimum. We show here how to provide these inputs via CSV, XLS or XLSX files. Hereinafter, these types of files are referenced as tabular inputs. (Optional additional tabular inputs will be discussed later.)
Each of these inputs must be provided in a separate file. We deliver links to these files to heemod via a specification file. This file acts as an ‘umbrella’ file atop of the single inputs. We first have a look at the specification file and then examine each input file separately.
Warning for users of the XLS format:
The XLS (and XLSX) files are read in using package readxl
. As of version 0.1.1 of this package, we are aware of the current issue: If an XLS file contains strings and numeric values in one column, the numeric values get rounded to 6 decimal places (regardless of the format or precission in the XLS file). We recommend, therefore, to use solely the CSV or XLSX format.
Let us look at the specification file first, read in simply as a .csv file:
specificationFile = system.file("extdata", "THR/THR_specification.csv", package = "heemod") read.csv(specificationFile)
data | file | .comment | .comment 2 |
---|---|---|---|
state | THR_states.csv | list of states and their costs and effects | |
tm | THR_transition_probs.csv | transition probabilities | |
parameters | THR_parameters.csv | parameters of the models | |
demographics | THR_demographic_table.csv | demographic file | |
data | input_dataframes | folder that stores the dataframes to be input | input_dataframes |
output | output | folder to store outputs |
The file contains two mandatory columns data and file, as well as optional columns with comments which are ignored in processing (below we explain why). The data column must have rows with the following keywords:
- state: tabular file with a list of model states,
- tm: tabular file with transition probabilities,
- parameters: tabular file with model parameters,
Optionally, the following rows can be provided (or the rows can be omitted):
- demographics: tabular file with description of the population to run the models on,
- data: a directory containing tables to be loaded; these can be .csv, .xls or .xlsx,
- output: a directory to save the output graphics.
All the files and directories must be in the same directory as the specification file itself.
All tabular files are read using an unexported convenience function heemod:::read_file
. As noted in the documentation, this function ignores columns with headers starting with .comment
(any combination of upper or lower caser). Here is the data frame created by reading in the file above using f_read_file
:
specificationFile = system.file("extdata", "THR/THR_specification.csv", package = "heemod") heemod:::f_read_file(specificationFile)
data | file |
---|---|
state | THR_states.csv |
tm | THR_transition_probs.csv |
parameters | THR_parameters.csv |
demographics | THR_demographic_table.csv |
data | input_dataframes |
output | output |
Now let’s have a look at the file with states:
state_file = system.file("extdata", "THR/THR_states.csv", package="heemod") heemod:::f_read_file(state_file)
.model | state | cost | qaly | .discount.qaly |
---|---|---|---|---|
standard | PrimaryTHR | 0 | 0.00 | 0.015 |
standard | SuccessfulPrimary | 0 | 0.85 | NA |
standard | RevisionTHR | 5294 | 0.30 | NA |
standard | SuccessfulRevision | 0 | 0.75 | NA |
standard | Death | 0 | 0.00 | NA |
new | PrimaryTHR | 0 | 0.00 | 0.015 |
The columns state
, cost
, qaly
should be familiar from other heemod vignettes. The first column, .model
, allows us to specify states for mulitiple models in a single file. The final column specifies the discount rate for qaly
. We’ll discuss the columns from the simple to the complex.
Just as when defining a state from the keyboard, the user can specify variable names for costs and benefits. The names are handed over to the main function as run_models_from_files(effect= qaly, costs= cost)
. Therefore, we have to use the same names in the function call as well as in the state file. If discounting is used, the rates must be named .discount.<effect name>
and .discount.<cost name>
.
Discount rates can be specified for any variable by adding a column with the name .discount
followed by the variable name. For example, in the file above we have .discount.qaly
, and could also have defined .discount.cost
or other columns along the same lines. We here omitted .discount.cost
not because it makes a lot of sense to discount qalys but not costs, but to illustrate that if no discount rate is specified for a particular variable, that variable is assumed to be undiscounted.
Costs and effects are always discounted at the same rate in all states. A single discount rate can be duplicated across states, or specified in one state and left blank elsewhere. Specifying two different discount rates for a single variable in different states will cause an error. (Specifying different discount rates for different variables is allowed, but not common in practice - use at your own risk.)
The value of 0.015 means that, in this model, future QALY’s are discounted with a rate of 1.5% per Markov cycle. It is up to the user to match the discount rate to time step of the model. If the scales differ (for example, annual discount rate vs. a one-month timestep of the Markov model), the discounting will be different than expected. If continuous discounting is assumed, the convenience function convert_timescale_of_parameters(value, scale_from, scale_to)
can be used to rescale values.
The first column in the file above is called .model
. This mandatory first column denotes the model that each row relates to. For our example we define two models: standard
and new
. Each state can be defined either once for all models, or once for each model. If a state is defined for exactly one model, the program will duplicate the state to all other models - that is, the state is assumed to be identical for all models. If a state is defined separately for each model, each model will get the state defined for it. Defining a state for more than one model, but not for all, will cause an error, because there is no way to automatically decide which version of the state should be used for other models. (To be explicit, if we had models “standard”, “new”, and “radical”, we would need to specify each state either once or three times - we could not specify a state for just two of the models.)
Naturally, we have to list all states. If a state occurs in all models with the same costs and effects, it is sufficient to specify it only once and assign it to an any one of the models. In general, it may be least confusing to specify all the repeated states for a single model, though it is not required. AS explained in the previous paramgraph, a state that differs between models must be specified separately for each model. As a minimum, however, each model has to be mentioned at least once in the specification file to assure correct processing. See the example: the ‘PrimaryTHR’ state has the same values for both models, yet it is specified twice so that the model called new
is mentioned.
We will now explain the process that translates the states file into a model object for the heemod function run_models
. Other files are parsed similarly.
The tabular file contains 5 states for the standard model, while only one state of the new
model is explicitly mentioned. In processing, however, states specified for only one model are duplicated, so that each model is fully defined in an element of a list:
this_file <- system.file("extdata", "THR/THR_states.csv", package = "heemod") state_info <- heemod:::f_parse_multi_spec(this_file, split_on = ".model", group_vars = "state") class(state_info)
## [1] "list"
state_info
$standard
.model | state | cost | qaly | .discount.qaly | |
---|---|---|---|---|---|
9 | standard | PrimaryTHR | 0 | 0.00 | 0.015 |
1 | standard | SuccessfulPrimary | 0 | 0.85 | NA |
3 | standard | RevisionTHR | 5294 | 0.30 | NA |
5 | standard | SuccessfulRevision | 0 | 0.75 | NA |
7 | standard | Death | 0 | 0.00 | NA |
$new
.model | state | cost | qaly | .discount.qaly | |
---|---|---|---|---|---|
10 | new | PrimaryTHR | 0 | 0.00 | 0.015 |
2 | new | SuccessfulPrimary | 0 | 0.85 | NA |
4 | new | RevisionTHR | 5294 | 0.30 | NA |
6 | new | SuccessfulRevision | 0 | 0.75 | NA |
8 | new | Death | 0 | 0.00 | NA |
Each element of this list is transformed into a string defining the states for a model. Below is the string for new
:
states <- heemod:::f_create_state_definitions_from_tabular(state_info[[2]]) states$state_command
## [1] "define_state_list(PrimaryTHR = define_state(cost=0, qaly=discount(0, 0.015)), SuccessfulPrimary = define_state(cost=0, qaly=discount(0.85, 0.015)), RevisionTHR = define_state(cost=5294, qaly=discount(0.3, 0.015)), SuccessfulRevision = define_state(cost=0, qaly=discount(0.75, 0.015)), Death = define_state(cost=0, qaly=discount(0, 0.015)))"
This command is (up to spacing) exactly what would be typed at the keyboard to define the state.
The second input file listed in the specification file defines the transition probabilities.
transprob = system.file("extdata", "THR/THR_transition_probs.csv", package = "heemod") heemod:::f_read_file(transprob)
.model | from | to | prob |
---|---|---|---|
standard | PrimaryTHR | SuccessfulPrimary | C |
standard | PrimaryTHR | Death | 0.02 |
standard | SuccessfulPrimary | SuccessfulPrimary | C |
standard | SuccessfulPrimary | RevisionTHR | pHRFailstandard |
standard | SuccessfulPrimary | Death | mr |
standard | RevisionTHR | SuccessfulRevision | C |
standard | RevisionTHR | Death | 0.02+mr |
standard | SuccessfulRevision | SuccessfulRevision | C |
standard | SuccessfulRevision | RevisionTHR | 0.04 |
standard | SuccessfulRevision | Death | mr |
standard | Death | Death | 1 |
new | SuccessfulPrimary | RevisionTHR | pHRFailNew |
As with the state file above, the values are specified separately for the two models, and probabilities specified for only one model will be carried over to others. Probabilities that differ from one model to another must be specified separately for each model. Un-specified transition probabilities are assumed to be 0, which can make specifying transition probability matrices easier for sparsely-connected models with many states. A probability can be defined by any expression that could be used in defining the matrix at the keyboard: a number, C, a different parameter name, or a function call. In the example above, P[SuccessfulPrimary -> RevisionTHR] is specified in the parameter file as pHRFailStandard
, which will in turn be defined in the parameter file. In the example above,P[SuccessfulPrimary -> RevisionTHR] is different for these two models, and all others are the same.
As for the state file, the transition probability file is parsed into a list of complete transition probability specifications. These lists are then used to create the same strings that would be entered at the keyboard, and the strings are evaluated to create the transition matrices. Below we show the string that creates the transition probability matrix for the standard model. Note that all the 0 probabilities have been inserted in the appropriate positions.
## [1] "define_matrix( 0,C,0,0,0.02,0,C,pHRFailStandard,0,mr,0,0,0,C,0.02+mr,0,0,0.04,C,mr,0,0,0,0,1 , state_names= c(\"PrimaryTHR\",\"SuccessfulPrimary\",\"RevisionTHR\",\"SuccessfulRevision\",\"Death\") )"
The third mandatory input is the parameter file. Just as for state files and transition probability files, parameter files give the system enough information to create the command that would otherwise be entered at the keyboard using define_parameters
. The name of the parameter is given in the parameter
column, the value in the value
column. Just as at the keyboard, a parameter can be specified as any expression: as a value, through a previously defined parameter, using a formula or a function call.
Parameter files can also define low and high values for discrete sensitivity analysis and distributions for probabilistic sensitivity analysis. Values in the optional low and high columns are parsed into a call to define_sensitivity
, creating a heemod sensitivity object for the deterministic sensitivity analysis (DSA). Just like the parameter values for the main run, low and high values can be defined by expressions, and will be evaluated at run time like other parameters. Only parameters with both low and high values defined will be included in the sensitivity object.
Similarly, the optional psa
column is parsed into a call to define_distribution
, and can give any of the distributions allowed by heemod. Parameters that do not have a value in this column will not be included in the probabilistic sensitivity analysis.
Our example contains 17 parameters; we illustrate the principles on a few of them:
param_file = system.file("extdata", "THR/THR_parameters.csv", package = "heemod") pars <- heemod:::f_read_file(param_file) dim(pars)
## [1] 17 5
A parameter can be defined through a formula using another preceding parameter:
pars[which(pars$parameter %in% c("lngamma", "gamma")), ]
parameter | value | low | high | psa |
---|---|---|---|---|
lngamma | 0.3740968 | 0.2791966 | 0.468997 | normal(0.27, 0.001) |
gamma | exp(lngamma) | NA | NA |
It is not meaningful to define separate low and high values for a parameter defined in terms of another parameter, because it cannot be changed separately.
The use of internal heemod variables is permitted: at rows 8 and 10, we use the markov_cycle
variable without defining it:
pars[which(pars$parameter %in% c("age_init", "age")), ]
parameter | value | low | high | psa |
---|---|---|---|---|
age_init | 60 | NA | NA | |
age | age_init + markov_cycle | NA | NA |
A parameter can be defined as a function call as demonstrated at row 13. f_look_up_values_df
is an internal heemod function discussed below; user-defined functions are also allowed if sourced before the run.
pars[which(pars$parameter %in% c("mr")), ]
parameter | value | low | high | psa |
---|---|---|---|---|
mr | f_look_up_values_df(mr_table, age = age, sex = sex_str, numeric_cols = “age”) | NA | NA |
Two elements of the specification file are used primarily during runtime, rather than to define elements of the model. We discuss them in this section.
Sometimes data of various kinds is required for an analysis, for example mortality rates for people of different ages in a specific population or costs arising in different situations. Such data can be encapsulated in function calls (for example, in the get_who_mr
function, which gives access to a large number of WHO mortality tables), or loaded into the environment after starting R but before running a model. The specification file can also have a data row that specifies a subdirectory containing data frames to be loaded; these can be saved as .csv, .xls, or .xlsx files. Multiple files can be placed here, and each filename (without the extension) is used as a dataframe name. To avoid confusion over versions, having multiple files with the same base name but different extensions in the directory will cause an error.
The example specification file above tells run_models_from_files
that those tabular files are located in the subdirectory input_dataframes
of the directory containing the specification file and other inputs. The data frames are loaded into an environment that is placed in the search path just before the analyses are run, and an on.exit
detach statement ensures that this environment is removed when the analysis finishes (or exists with an error). For added safety, the environment is called heemod_temp_variables_envir_detach_me
, which should be a good hint for anyone spotting it. Once the data frames are in the search path, they can be f_look_up_values_df
. The first argument is the name of the data frame.
mort = system.file("extdata", "THR/input_dataframes/mr_table.xlsx", package = "heemod") heemod:::f_read_file(mort)
age | sex | value |
---|---|---|
35 | Males | 0.00151 |
45 | Males | 0.00393 |
55 | Males | 0.01090 |
65 | Males | 0.03160 |
75 | Males | 0.08010 |
85 | Males | 0.18790 |
35 | Females | 0.00099 |
45 | Females | 0.00260 |
55 | Females | 0.00670 |
65 | Females | 0.01930 |
75 | Females | 0.05350 |
85 | Females | 0.15480 |
Note that this dataframe is used in the definition of parameter mr
above as f_look_up_values_df(mr_table, age = age, sex = sex_str, numeric_cols = “age”)
. This function selects the appropriate value from the dataframe mr_table
, finding the correct row based on the age
and sex
arguments (which will be different, of course, for a data frame with different columns); it returns the value
as this is the only column not mentioned in the call.
Since we specify numeric_cols = "age"
, each age is considered a lower boundary: for instance a value of 0.00151 if returned for Males whose age is 35 or larger but smaller than 45. When the numeric_cols =
argument is omitted, an exact match is required (and an error will be thrown if one is not found).
When we run a set of analyses at once, it is useful to automatically store the results. When the data
column has the value output
, the corresponding file
value specifies the directory into which outputs should be written. The outputs include the state count graphs for each model, the discrete and probabilistic sensitivity analysis graphs.
The function run_models_from_files
brings together all the tabular inputs previously discussed, and then runs the various analyses specified by those inputs: not only the base model runs, but also discrete and probabilistic sensitivity analyses, and analyses over various demographic groups. This allows modelers to specify their models in the several files, and then obtain an entire set of results with a single command. run_models_from_files
takes in the specification file discussed above, and also makes use of several more rows that file can contain.
To run the analysis, simply point the code at the model specification file: run_models_from_files(<spec file>)
. For our example, this reads:
modelDir <- system.file("extdata", "THR/", package = "heemod")
THRexample <- heemod:::run_models_from_files(base_dir = modelDir,
ref_file = "THR_specification.csv",
N.prob = 100,
cost = cost, effect = qaly,
base_model = "standard",
save_outputs = FALSE,
overwrite = TRUE,
init = c(1000, rep(0, 4)), cycles = 40,
method = "end")
## Running model 'standard'...
## Running model 'new'...
## Running analysis for model 'standard'.
## Running analysis for model 'new'.
The results are identical to those obtained for the same model in vignette("non-homogeneous", package = "heemod"). We can also compare the results of the model with the results obtained by Briggs et al. in the spreadsheet provided on the web site for their book referenced above.
We first show the number of people in each state (excluding for brevity Primary THR, which is 0 except for the first time period) from the two models side by side, with Briggs et al.’s results on the left, and the heemod results on the right:
# Import the Briggs' spreadsheet
briggs_file <- system.file("extdata", "THR/Briggs/Ex35sol.xls", package = "heemod")
briggs <- readxl::read_excel(briggs_file, sheet = 6, col_names = FALSE, skip = 5)[1:61, 4:8]
briggs[is.na(briggs[, 1]), 1] <- 0
briggs[1, 2:5] <- 0
colnames(briggs) <- paste0(c("PrimTHR", "SuccessPrim", "RevisTHR", "SuccessRevis", "Death"), "")
# Pre-process the heemod results
hmd.res <- attr(THRexample$model_runs, "eval_model_list")$standard$counts
names(hmd.res) <- c("PrimTHR ", "SuccessPrim ", "RevisTHR ", "SuccessRevis ", "Death ")
Numerical comparison reveals identity up to rounding error:
identical(
as.numeric(as.matrix(round(briggs[1:40, ], 10))),
as.numeric(as.matrix(round(hmd.res[1:40, ], 10)))
)
## [1] TRUE