This vignette presents an example of how to use the costmodelr
package. You can access all the vignettes for this package in R by typing browseVignettes("costmodelr")
.
You can download a template of a cost model with working code here
The costmodelr
package provides a set of utility functions for turning a set of cost assumptions into a tidy table that has one row for each cost on each date that cost is incurred.
The core output format has just five columns as follows:
date | id | quantity | price_gbp | real_or_nominal |
---|---|---|---|---|
2017-01-02 | oo_0_1 | 4 | 50 | real |
2017-01-04 | oo_0_2 | 2 | 100 | real |
2017-01-10 | oo_0_3 | 1 | 150 | real |
Additional information is associated with each cost that enables final results to be filtered and cross tabulated. This can then be joined onto the core output to produce a dataframe like this:
date | id | quantity | category_1 | category_2 | category_3 | gdp_deflator | green_book_discount | cost_gbp_real | cost_gbp_nominal | period |
---|---|---|---|---|---|---|---|---|---|---|
2017-01-02 | oo_0_1 | 4 | a | b | c | 1.028260 | 1.063935 | 200 | 205.6521 | alpha |
2017-01-04 | oo_0_2 | 2 | a | b | d | 1.028373 | 1.064135 | 200 | 205.6747 | alpha |
2017-01-10 | oo_0_3 | 1 | b | b | d | 1.028713 | 1.064737 | 150 | 154.3069 | live |
This output can be accessed from cost_model$cost_dataframe
.
The categorisation columns are provided by the user and can be altered from the defaults using costmodelr::setting_categorisation_columns()
or setting_append_to_categorisation_columns()
.
Since this output dataframe is tidy, it is easy to perform aggregations and filtering, and produce tabular and graphical output that summarises actual and forecasted costs.
Assumptions are input into the model from dataframes, typically read in from .csv
files, which must be provided in a specific format. There are a number of different assumption types, such as 'one off costs', recurring costs' and 'staff costs'.
The cost model should be iniitalised with 'key dates', which control the time period over which outputs will be generated.
Key dates look like this:
date | period |
---|---|
2017-01-01 | alpha |
2017-01-05 | beta |
2017-01-10 | live |
In addition to a 'date' columns, any other columns in the key dates dataframe will be added to the final outputs, and therefore can be used for cross tabulation.
One off costs are costs that occur only once. costmodelr
does not need to perform complex computations on these assupmptions, and so the input is similar to the output; there is a one to one correspondence between the rows in the assumption file and output rows.
The input format is as follows:
price_in_original_currency | real_or_nominal | currency | quantity | date | category_1 | category_2 | category_3 | Description | Source |
---|---|---|---|---|---|---|---|---|---|
50 | real | GBP | 4 | 02/01/17 | a | b | c | abc | x |
100 | real | GBP | 2 | 04/01/17 | a | b | d | acbd | y |
150 | real | GBP | 1 | 10/01/17 | b | b | d | bbd | z |
The output would look as follows:
date | id | quantity | category_1 | category_2 | category_3 | gdp_deflator | green_book_discount | cost_gbp_real | cost_gbp_nominal | period |
---|---|---|---|---|---|---|---|---|---|---|
2017-01-02 | oo_0_1 | 4 | a | b | c | 1.028260 | 1.063935 | 200 | 205.6521 | alpha |
2017-01-04 | oo_0_2 | 2 | a | b | d | 1.028373 | 1.064135 | 200 | 205.6747 | alpha |
2017-01-10 | oo_0_3 | 1 | b | b | d | 1.028713 | 1.064737 | 150 | 154.3069 | live |
Allows you to model costs that happen at a given frequency. Includes options that allow costs to grow at given % and fixed rates.
The input format is as follows:
price_in_original_currency | real_or_nominal | currency | quantity | frequency | first_date | growth_in_cost_percent_per_annum | growth_in_cost_absolute_per_annum | growth_in_quantity_percent_per_annum | growth_in_quantity_absolute_per_annum | category_1 | category_2 | category_3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1000 | real | GBP | 1 | day | 2017-01-01 | 36.87754 | 0 | 0 | 0.00 | capital | hardware | cpu |
1000 | real | GBP | 2 | day | 2017-01-09 | 0.00000 | 0 | 0 | 365.25 | capital | software | analysis |
The output would look as follows:
date | id | quantity | category_1 | category_2 | category_3 | gdp_deflator | green_book_discount | cost_gbp_real | cost_gbp_nominal | period |
---|---|---|---|---|---|---|---|---|---|---|
2017-01-01 | rc_0_1 | 1 | capital | hardware | cpu | 1.028204 | 1.063835 | 1000.000 | 1028.204 | alpha |
2017-01-02 | rc_0_1 | 1 | capital | hardware | cpu | 1.028260 | 1.063935 | 1010.000 | 1038.543 | alpha |
2017-01-03 | rc_0_1 | 1 | capital | hardware | cpu | 1.028317 | 1.064035 | 1020.100 | 1048.986 | alpha |
2017-01-04 | rc_0_1 | 1 | capital | hardware | cpu | 1.028373 | 1.064135 | 1030.301 | 1059.534 | alpha |
2017-01-05 | rc_0_1 | 1 | capital | hardware | cpu | 1.028430 | 1.064236 | 1040.604 | 1070.188 | beta |
2017-01-06 | rc_0_1 | 1 | capital | hardware | cpu | 1.028487 | 1.064336 | 1051.010 | 1080.950 | beta |
2017-01-07 | rc_0_1 | 1 | capital | hardware | cpu | 1.028543 | 1.064436 | 1061.520 | 1091.819 | beta |
2017-01-08 | rc_0_1 | 1 | capital | hardware | cpu | 1.028600 | 1.064536 | 1072.135 | 1102.798 | beta |
2017-01-09 | rc_0_1 | 1 | capital | hardware | cpu | 1.028656 | 1.064637 | 1082.857 | 1113.887 | beta |
2017-01-10 | rc_0_1 | 1 | capital | hardware | cpu | 1.028713 | 1.064737 | 1093.685 | 1125.088 | live |
2017-01-09 | rc_0_2 | 2 | capital | software | analysis | 1.028656 | 1.064637 | 2000.000 | 2057.313 | beta |
2017-01-10 | rc_0_2 | 3 | capital | software | analysis | 1.028713 | 1.064737 | 3000.000 | 3086.139 | live |
Allows you to model staff costs, given assumptions of staff % utilisation on the project.
Two different sets of assumptions are needed here: % utilisation, and a ratecard.
The ratecard looks like this
price_in_original_currency | real_or_nominal | id | currency | price_frequency | annual_percentage_increase | category_1 | category_2 | category_3 |
---|---|---|---|---|---|---|---|---|
20 | real | TA | GBP | working_day | 0 | staff | technical | technical architect |
10 | real | PM | GBP | working_day | 0 | staff | non technical | product manager |
The staff utilisation assumptions look like this:
## Warning: Duplicated column names deduplicated: 'TA' => 'TA_1' [3]
date | TA | TA_1 | PM |
---|---|---|---|
2017-01-01 | 0.5 | 1 | 0.25 |
2017-01-05 | 1.0 | 1 | 0.25 |
2017-01-10 | 0.5 | 1 | 0.25 |
The output looks like this:
date | id | quantity | category_1 | category_2 | category_3 | gdp_deflator | green_book_discount | cost_gbp_real | cost_gbp_nominal | period |
---|---|---|---|---|---|---|---|---|---|---|
2017-01-01 | su_0_TA | 0.5 | staff | technical | technical architect | 1.028204 | 1.063835 | 7.142857 | 7.344312 | alpha |
2017-01-02 | su_0_TA | 0.5 | staff | technical | technical architect | 1.028260 | 1.063935 | 7.142857 | 7.344716 | alpha |
2017-01-03 | su_0_TA | 0.5 | staff | technical | technical architect | 1.028317 | 1.064035 | 7.142857 | 7.345120 | alpha |
2017-01-04 | su_0_TA | 0.5 | staff | technical | technical architect | 1.028373 | 1.064135 | 7.142857 | 7.345524 | alpha |
2017-01-05 | su_0_TA | 1.0 | staff | technical | technical architect | 1.028430 | 1.064236 | 14.285714 | 14.691857 | beta |
2017-01-06 | su_0_TA | 1.0 | staff | technical | technical architect | 1.028487 | 1.064336 | 14.285714 | 14.692665 | beta |
2017-01-07 | su_0_TA | 1.0 | staff | technical | technical architect | 1.028543 | 1.064436 | 14.285714 | 14.693473 | beta |
2017-01-08 | su_0_TA | 1.0 | staff | technical | technical architect | 1.028600 | 1.064536 | 14.285714 | 14.694281 | beta |
2017-01-09 | su_0_TA | 1.0 | staff | technical | technical architect | 1.028656 | 1.064637 | 14.285714 | 14.695090 | beta |
2017-01-10 | su_0_TA | 0.5 | staff | technical | technical architect | 1.028713 | 1.064737 | 7.142857 | 7.347949 | live |
(Note only the first 10 rows of the output are show. Note also costs are spread equally throughout the week, so £50 a week = ~£7.14 a day, including Sat and Sun)
This type of cost is to deal with costs which are proportional to the number of users.
It's possible to model both costs which are directly proportional to the number of users (e.g. each user needs a Github account), or to deal with costs which grow with the number of users (e.g. each user uses an additional 2gb of storage each month)
The input assumptions look like this:
Number of users (this will be linearly interpolated):
date | num_users |
---|---|
2017-01-01 | 1 |
2017-01-03 | 1 |
2017-01-05 | 1 |
2017-01-06 | 1 |
2017-01-07 | 2 |
Cost assumptions:
price_in_original_currency | real_or_nominal | currency | pricefrequency | fixed_initial_quantity_per_user | growth_in_cost_percent_per_annum | growth_in_cost_absolute_per_annum | growth_in_quantity_absolute_per_annum_per_user | category_1 | category_2 | category_3 |
---|---|---|---|---|---|---|---|---|---|---|
60 | real | GBP | month | 0 | 0.000 | 0 | 365.25 | infrastructure | storage | amazon s3 |
30 | real | GBP | month | 0 | 0.000 | 0 | 730.50 | infrastructure | compute | ec2 |
1 | real | GBP | day | 10 | 0.000 | 0 | 0.00 | infrastructure | accounts | github |
14 | real | GBP | week | 100 | 1383.244 | 0 | 365.25 | infrastructure | storate | ebs |
The output looks like this:
date | id | quantity | category_1 | category_2 | category_3 | gdp_deflator | green_book_discount | cost_gbp_real | cost_gbp_nominal | period |
---|---|---|---|---|---|---|---|---|---|---|
2017-01-01 | uvc_0_1 | 1 | infrastructure | storage | amazon s3 | 1.028204 | 1.063835 | 1.971253 | 2.026849 | alpha |
2017-01-02 | uvc_0_1 | 2 | infrastructure | storage | amazon s3 | 1.028260 | 1.063935 | 3.942505 | 4.053921 | alpha |
2017-01-03 | uvc_0_1 | 3 | infrastructure | storage | amazon s3 | 1.028317 | 1.064035 | 5.913758 | 6.081217 | alpha |
2017-01-04 | uvc_0_1 | 4 | infrastructure | storage | amazon s3 | 1.028373 | 1.064135 | 7.885010 | 8.108735 | alpha |
2017-01-05 | uvc_0_1 | 5 | infrastructure | storage | amazon s3 | 1.028430 | 1.064236 | 9.856263 | 10.136476 | beta |
2017-01-06 | uvc_0_1 | 6 | infrastructure | storage | amazon s3 | 1.028487 | 1.064336 | 11.827515 | 12.164440 | beta |
2017-01-07 | uvc_0_1 | 8 | infrastructure | storage | amazon s3 | 1.028543 | 1.064436 | 15.770020 | 16.220146 | beta |
2017-01-08 | uvc_0_1 | 10 | infrastructure | storage | amazon s3 | 1.028600 | 1.064536 | 19.712526 | 20.276298 | beta |
2017-01-09 | uvc_0_1 | 12 | infrastructure | storage | amazon s3 | 1.028656 | 1.064637 | 23.655031 | 24.332896 | beta |
2017-01-10 | uvc_0_1 | 14 | infrastructure | storage | amazon s3 | 1.028713 | 1.064737 | 27.597536 | 28.389940 | live |
(again, only the first 10 records are shown)
Running the full cost model amounts to loading in assumptions, and using the run cost model
function.
# The 'key dates' file specifies the time period over which the cost model produces estimates
key_dates <- readr::read_csv("assumptions/key_dates.csv", col_types=readr::cols())
# Read in assumptions from files
users <- readr::read_csv("assumptions/users.csv", col_types=readr::cols())
staff_utilisation <- readr::read_csv("assumptions/staff_utilisation.csv", col_types=readr::cols())
rate_card <- readr::read_csv("assumptions/rate_card.csv", col_types=readr::cols())
recurring_costs <- readr::read_csv("assumptions/recurring_cost.csv", col_types=readr::cols())
oneoff_costs <- readr::read_csv("assumptions/oneoff_costs.csv", col_types=readr::cols())
user_variable_costs <- readr::read_csv("assumptions/user_variable_costs.csv", col_types =readr::cols())
# Add each set of assumptions to model
cost_model <- create_cost_model(key_dates) %>%
add_oneoff_costs(oneoff_costs) %>%
add_recurring_cost(recurring_costs) %>%
add_user_variable_costs(users, user_variable_costs) %>%
add_staff_utilisation(staff_utilisation, rate_card)
# Run model
cost_model <- run_cost_model(cost_model)
# Extract cost dataframe from model - use this if you want to do crosstabs or filters
cost_model$cost_dataframe
# Run a Shiny app to interactively explore your data
shiny_vis(cost_model)
# View a hierarchical bubble chart of the model
shiny_bubble(cost_model)