A Python package for reading data from the Inputs, Assumptions and Scenarios Report (IASR) Microsoft Excel workbook published by the Australian Energy Market Operator for use in their Integrated System Plan modelling.
pip install isp-workbook-parser
- Load a workbook using
Parser
(see examples below).- While we do not include workbooks with the package distribution, you can find the versions for which table configurations are written within
workbooks/<version>
.
- While we do not include workbooks with the package distribution, you can find the versions for which table configurations are written within
- Table configuration files for data tables are located in
src/config/<version>
- These specify the name, location, columns and data range of tables to be extracted from a particular workbook version. Optionally, rows to skip and not read in (e.g. where AEMO has formatted a row with a strike through to indicate that the data is no longer being used) and columns with merged rows can also be specified and handled.
- These are included with the package distributions.
Parser
loads the MS Excel workbook and, by default, will check if the version of the workbook is supported by seeing if configuration files are included in the package for that version.- If they are,
Parser
can use these configuration files to parse the data tables and save them as CSVs.
Note
This package makes some opinionated decisions when processing tables. For example,
multiple header row tables are reduced to a single header, data in merged cells is inferred from surrounding cells,
and notes and footnotes are dropped (amonst other ways in which the data is sanitised).
For more detail, refer to the docstring and code in read_table.py
and sanitisers.py
.
Table configuration file attributes
name
: the table namesheet_name
: the sheet where the table is located- N.B. there may be spaces at the end of sheet names in the workbook
header_rows
: this specifies the Excel row(s) with table column names- A single row of table column names (e.g.
6
) - Or a list of row numbers for the table header sorted in ascending order (e.g.
[6, 7, 8]
)
- A single row of table column names (e.g.
end_row
: the last row of table datacolumn_range
: the Excel column range of the table in alphabetical/Excel format, e.g."B:F"
skip_rows
: optional, Excel row(s) in the table that should not be read in- A single row (e.g.
15
) - Or a list of rows (e.g.
[15, 16]
)
- A single row (e.g.
columns_with_merged_rows
: optional, Excel column(s) with merged rows- A single column in alphabetical format (e.g.
"B"
), - Or a list of columns in alphabetical format (e.g.
["B", "D"]
)
- A single column in alphabetical format (e.g.
forward_fill_values
: optional, specifies whether table values should be forward filled- Default
True
to handle merged cells in tables - Should be set to
False
where there are empty columns
- Default
Refer to the contributing instructions for details on how to contribute table configuration (YAML) files to this repository and package.
Export all the data tables the package has a config file for to CSV files.
from isp_workbook_parser import Parser
workbook = Parser("<path/to/workbook>/2024-isp-inputs-and-assumptions-workbook.xlsx")
workbook.save_tables('<path/to/output directory>')
Return a dictionary of table names, with lists of tables names stored under a key which is their sheet name in the workbook. For a given workbook version, this only returns tables the package has a configuration file for.
from isp_workbook_parser import Parser
workbook = Parser("<path/to/workbook>/2024-isp-inputs-and-assumptions-workbook.xlsx")
names = workbook.get_table_names()
names['Build limits']
Get a single table as a pandas DataFrame
.
from isp_workbook_parser import Parser
workbook = Parser("<path/to/workbook>/2024-isp-inputs-and-assumptions-workbook.xlsx")
table = workbook.get_table("retirement_costs")
Get a table by directly providing the table config.
from isp_workbook_parser import Parser, TableConfig
workbook = Parser("<path/to/workbook>/2024-isp-inputs-and-assumptions-workbook.xlsx")
table_config = TableConfig(
name="table_name",
sheet_name="sheet_name",
header_rows=5,
end_row=21,
column_range="B:J",
)
workbook.get_table_from_config(table_config)
Interested in contributing to the source code or adding table configurations? Check out the contributing instructions, which also includes steps to install isp-workbook-parser
for development.
Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
isp-workbook-parser
was created as a part of the OpenISP project. It is licensed under the terms of GNU GPL-3.0-or-later licences.