An R package to assist with the downloading / importing / manipulation of the (Australian) Geocoded - National Address File (G-NAF).
Addresses are a cultural artefact, created from language rather than rules and legislation *
G-NAF is Australia's most trusted authoritative (g)eocoded - (n)ational (a)ddress (f)ile.
More from: https://psma.com.au/product/gnaf/
PSMA's G-NAF dataset contains all physical addresses in Australia. It's the most trusted source of geocoded addresses for Australian businesses and governments.
Before use, users should read the G-NAF End User Licence Agreement
G-NAF is released on a quarterly basis and is available from here.
- Downloaded copy of G-NAF. Available from here.
- Depending on the function call used, RAM.
- Recommend at least 32GB to build the entire country (as of the February 2020 extract, the largest function call will result in a object of 15.2M x 52 variables, ~10Gb RAM)
- However, importing a single jurisdiction can be as little as a few hundred Mbs.
Please note, the package is not on CRAN.
Installing from GitHub:
# Install `remotes` if it isn't already installed.
if(!any(installed.packages()[,1] == "remotes")) install.packages("remotes")
# Install the `gnaf.r` package.
remotes::install_github("KyleHaynes/gnaf.r")
The following three steps can be completed manually or with the function call get_gnaf()
(see below example).
- Download G-NAF from data.gov.au: https://data.gov.au/dataset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/details?q=G-NAF
- NOTE: File size is ~1.5GB compressed / ~7.7GB uncompressed.
- Extract the content of the compressed download to a desired location.
- Note down the location of the extracted directory (and the sibling month/year folder). E.g. "C:/temp/G-NAF/G-NAF FEBRUARY 2020".
# Load the package.
library("gnaf.r")
# Steps 1-3 in the `Prerequisite steps` section above can be completed from within R.
# Note: If G-NAF is already downloaded, you can skip this function call.
# Download and unpack G-NAF to the "c:/temp/" folder.
get_gnaf(dest_folder = "c:/temp")
# Verbose output example:
# ------------------
# The download is approximately 1.5Gb, depending on your internet speed, the
# following may take a while.
# The G-NAF zip file is currently being downloaded to: C:\temp\feb20_gnaf_pipeseparatedvalue.zip
# ------------------
# G-NAF has been download and is now uncompressing.
# ------------------
# You can now call the `setup()` to begin the initial setup of G-NAF. Be sure to toggle the
# `states` argument to only import relevant jurisdictions.
# Example setup call: setup(dir = "C:\\temp\\G-NAF\\G-NAF NOVEMBER 2020", states = "qld")
# Setup the session before importing G-NAF. This step has two primary purposes.
# 1. Define the location of the G-NAF (month year) root path (./G-NAF <MONTH> <YEAR>).
# 2. Define which jurisdictions to import (case insensitive regex on State abbreviations).
setup(dir = "C:/temp/G-NAF/G-NAF NOVEMBER 2020", states = "qld")
# Import G-NAF for Queensland.
gnaf <- build_gnaf()
# Import again, defining `simple = TRUE` to remove potential non-address related
# variables (i.e reduce the output to just address information).
gnaf_simple <- build_gnaf(simple = TRUE)
# Inspect the stucture of each object.
str(gnaf)
# Classes ‘data.table’ and 'data.frame': 590395 obs. of 48 variables:
# $ ADDRESS_DETAIL_PID : chr "GAACT714845933" "GAACT714845934" "GAACT714845935" "GAACT714845936" ...
# $ BUILDING_NAME : chr "" "" "" "" ...
# $ LOT_NUMBER : int NA NA NA NA NA NA NA NA NA NA ...
# $ FLAT_NUMBER_PREFIX : chr "" "" "" "" ...
# $ FLAT_TYPE : chr NA NA NA NA ...
# $ FLAT_NUMBER : int NA NA NA NA NA NA NA NA NA NA ...
# $ FLAT_NUMBER_SUFFIX : chr "" "" "" "" ...
# $ LEVEL_TYPE : chr NA NA NA NA ...
# $ LEVEL_NUMBER_PREFIX : chr "" "" "" "" ...
# $ LEVEL_NUMBER : int NA NA NA NA NA NA NA NA NA NA ...
# $ NUMBER_FIRST_PREFIX : chr NA NA NA NA ...
# $ NUMBER_FIRST : int 6 3 26 17 5 24 7 5 22 9 ...
# $ NUMBER_FIRST_SUFFIX : chr "" "" "" "" ...
# $ NUMBER_LAST : int NA NA NA NA NA NA NA NA NA NA ...
# $ NUMBER_LAST_SUFFIX : chr NA NA NA NA ...
# $ STREET_NAME : chr "PACKHAM" "BUNKER" "JAUNCEY" "GEEVES" ...
# $ STREET_TYPE : chr "PLACE" "PLACE" "COURT" "COURT" ...
# $ STREET_SUFFIX : chr NA NA NA NA ...
# $ LOCALITY_NAME : chr "CHARNWOOD" "CHARNWOOD" "CHARNWOOD" "CHARNWOOD" ...
# $ STATE_NAME : chr "AUSTRALIAN CAPITAL TERRITORY" "AUSTRALIAN CAPITAL TERRITORY" "AUSTRALIAN CAPITAL TERRITORY" "AUSTRALIAN CAPITAL TERRITORY" ...
# $ POSTCODE : int 2615 2615 2615 2615 2902 2615 2902 2615 2615 2902 ...
# $ LONGITUDE : num 149 149 149 149 149 ...
# $ LATITUDE : num -35.2 -35.2 -35.2 -35.2 -35.4 ...
# $ MB_2011_CODE : chr "80006300000" "80006310000" "80006380000" "80006280000" ...
# $ MB_2016_CODE : chr "80006300000" "80006310000" "80006380000" "80006280000" ...
# $ STREET_LOCALITY_PID : chr "ACT3857" "ACT3807" "ACT3833" "ACT3826" ...
# $ LOCALITY_PID : chr "ACT570" "ACT570" "ACT570" "ACT570" ...
# $ ALIAS_PRINCIPAL : chr "P" "P" "P" "P" ...
# $ LEGAL_PARCEL_ID : chr "BELC/CHAR/15/16/" "BELC/CHAR/17/2/" "BELC/CHAR/83/3/" "BELC/CHAR/29/9/" ...
# $ CONFIDENCE : int 2 2 2 2 2 2 2 2 2 2 ...
# $ ADDRESS_SITE_PID : int 710446419 710446420 710446421 710446422 710446424 710446425 710446427 710446428 710446429 710446430 ...
# $ LEVEL_GEOCODED_CODE : int 7 7 7 7 7 7 7 7 7 7 ...
# $ GNAF_PROPERTY_PID : chr "1026280" "1026283" "351430" "343650" ...
# $ PRIMARY_SECONDARY : chr "" "" "" "" ...
# $ PRIMARY_POSTCODE : int NA NA NA NA NA NA NA NA NA NA ...
# $ GNAF_LOCALITY_PID : int 500219587 500219587 500219587 500219587 500219628 500219587 500219628 500219587 500219587 500219628 ...
# $ GNAF_RELIABILITY_CODE : int 5 5 5 5 5 5 5 5 5 5 ...
# $ GNAF_STREET_PID : int 502493439 502490407 502492206 502491587 502492926 502492206 502492926 502490407 502492206 502492926 ...
# $ GNAF_STREET_CONFIDENCE : int 2 2 2 -1 2 2 2 2 2 2 ...
# $ GNAF_RELIABILITY_CODE_street_locality: int 4 4 4 4 4 4 4 4 4 4 ...
# $ ADDRESS_DEFAULT_GEOCODE_PID :integer64 3006501997 3006502410 3006610521 3006506877 3006499300 3006448778 3006616267 3006485909 ...
# $ GEOCODE_TYPE_CODE : chr "FCS" "FCS" "FCS" "FCS" ...
# $ ADDRESS_MESH_BLOCK_2011_PID : chr "ACT43994755" "ACT43994756" "ACT43994757" "ACT43994758" ...
# $ MB_MATCH_CODE : int 1 1 1 1 1 1 1 1 1 1 ...
# $ ADDRESS_MESH_BLOCK_2016_PID : chr "ACT1547490736" "ACT1547490737" "ACT1547490738" "ACT1547490739" ...
# $ MB_MATCH_CODE_locality : int 1 1 1 1 1 1 1 1 1 1 ...
# $ LOCALITY_CLASS : chr "GAZETTED LOCALITY" "GAZETTED LOCALITY" "GAZETTED LOCALITY" "GAZETTED LOCALITY" ...
# $ STREET_CLASS : chr "CONFIRMED" "CONFIRMED" "CONFIRMED" "CONFIRMED" ...
# - attr(*, ".internal.selfref")=<externalptr>
# - attr(*, "sorted")= chr "ADDRESS_DETAIL_PID"
str(gnaf_simple)
# Classes ‘data.table’ and 'data.frame': 590395 obs. of 25 variables:
# $ ADDRESS_DETAIL_PID : chr "GAACT714845933" "GAACT714845934" "GAACT714845935" "GAACT714845936" ...
# $ BUILDING_NAME : chr "" "" "" "" ...
# $ LOT_NUMBER : int NA NA NA NA NA NA NA NA NA NA ...
# $ FLAT_NUMBER_PREFIX : chr "" "" "" "" ...
# $ FLAT_TYPE : chr NA NA NA NA ...
# $ FLAT_NUMBER : int NA NA NA NA NA NA NA NA NA NA ...
# $ FLAT_NUMBER_SUFFIX : chr "" "" "" "" ...
# $ LEVEL_TYPE : chr NA NA NA NA ...
# $ LEVEL_NUMBER_PREFIX: chr "" "" "" "" ...
# $ LEVEL_NUMBER : int NA NA NA NA NA NA NA NA NA NA ...
# $ NUMBER_FIRST_PREFIX: chr NA NA NA NA ...
# $ NUMBER_FIRST : int 6 3 26 17 5 24 7 5 22 9 ...
# $ NUMBER_FIRST_SUFFIX: chr "" "" "" "" ...
# $ NUMBER_LAST : int NA NA NA NA NA NA NA NA NA NA ...
# $ NUMBER_LAST_SUFFIX : chr NA NA NA NA ...
# $ STREET_NAME : chr "PACKHAM" "BUNKER" "JAUNCEY" "GEEVES" ...
# $ STREET_TYPE : chr "PLACE" "PLACE" "COURT" "COURT" ...
# $ STREET_SUFFIX : chr NA NA NA NA ...
# $ LOCALITY_NAME : chr "CHARNWOOD" "CHARNWOOD" "CHARNWOOD" "CHARNWOOD" ...
# $ STATE_NAME : chr "AUSTRALIAN CAPITAL TERRITORY" "AUSTRALIAN CAPITAL TERRITORY" "AUSTRALIAN CAPITAL TERRITORY" "AUSTRALIAN CAPITAL TERRITORY" ...
# $ POSTCODE : int 2615 2615 2615 2615 2902 2615 2902 2615 2615 2902 ...
# $ LONGITUDE : num 149 149 149 149 149 ...
# $ LATITUDE : num -35.2 -35.2 -35.2 -35.2 -35.4 ...
# $ MB_2011_CODE : chr "80006300000" "80006310000" "80006380000" "80006280000" ...
# $ MB_2016_CODE : chr "80006300000" "80006310000" "80006380000" "80006280000" ...
# - attr(*, ".internal.selfref")=<externalptr>
# - attr(*, "sorted")= chr "ADDRESS_DETAIL_PID"
# Size of each object (gigabytes).
format(object.size(gnaf), units = "Gb")
# [1] "0.3 Gb"
format(object.size(gnaf_simple), units = "Gb")
# [1] "0.1 Gb"
# Attempt to build the entire country (including Other Territories: "OT").
setup(dir = "C:/temp/G-NAF/G-NAF FEBRUARY 2020", states = "")
# Import all jurisdictions.
gnaf <- build_gnaf()
# Dimensions of output.
dim(gnaf)
# [1] 15271641 52
# Object size.
format(object.size(gnaf), units = "Gb")
# [1] "8.9 Gb"
# Frequency table by State.
gnaf[, .N, STATE_NAME]
# STATE_NAME N
# 1: AUSTRALIAN CAPITAL TERRITORY 242999
# 2: NEW SOUTH WALES 4749707
# 3: NORTHERN TERRITORY 113221
# 4: OTHER TERRITORIES 4362
# 5: QUEENSLAND 3219900
# 6: SOUTH AUSTRALIA 1163320
# 7: TASMANIA 347396
# 8: VICTORIA 3886769
# 9: WESTERN AUSTRALIA 1543967
Issues / Bugs / Suggestions: https://github.com/KyleHaynes/gnaf.r/issues
G-NAF ©PSMA Australia Limited licensed by the Commonwealth of Australia under the Open Geo-coded National Address File (G-NAF) End User Licence Agreement.
Incorporates or developed using G-NAF ©PSMA Australia Limited licensed by the Commonwealth of Australia under the Open Geo-coded National Address File (G-NAF) End User Licence Agreement.
Special thanks to the Turnbull Government for the innovative and invaluable step in making this data open to all Australians.