Skip to content

timhowgego/Aquius

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aquius

Here+Us - An Alternative Approach to Public Transport Network Discovery

Description

Aquius at the University of York

Aquius visualises the links between people that are made possible by transport networks. The user clicks once on a location, and near-instantly all the routes, stops, services, and connected populations are summarised visually. Aquius answers the question, what services are available here? And to a degree, who am I connected to by those services? Population is a proxy for all manner of socio-economic activity and facilities, measured both in utility and in perception. Conceptually Aquius is half-way between the two common approaches to public transport information:

  1. Conventional journey planners are aimed at users that already have some understanding of what routes exist. Aquius summarises overall service patterns.
  2. Most network maps are pre-defined and necessarily simplified, typically showing an entire city or territory. Aquius relates to a bespoke location.

The application makes no server queries (once the initial dataset has been loaded), so responds near-instantly. That speed allows network discovery by trial and error, which changes the user dynamic from being told, to playful learning.

Aquius enables maps that are physically impossible to draw as single unified networks, such as cabotaged international routes where the available destinations change depending on where the passenger boards. For example, FlixBus can only carry passengers from places in Portugal to destinations in other countries, even where the bus itself serves many places in Portugal. The reverse is true for passengers travelling from Spain - a range of destinations within Portugal are offered, but none within Spain herself.

Aquius FlixBus at Braga, showing journeys to Spain, not to Portugal

Aquius is capable of displaying the largest urban networks in the world, such as Paris and New York City. Aquius can be configured to locate stops and services very precisely, or to summarise networks strategically. Aggregation of stops into local Wards allows exploration and analysis of every scheduled public transport service in Great Britain. Aquius can even allow proposed route and service changes to be rapidly visualised and assessed in context, prior to any detailed scheduling or modelling, as exemplified for Barcelona.

Aquius at Torrassa with Barcelona Metro Line 9/10 Connection

Some caveats:

  • Aquius maps straight-line links, not the precise geographic route taken. This allows services that do not stop at all intermediate stops to be clearly differentiated. It also makes it technically possible for an internet client to work with a large transport network. Aquius is however limited to conventional scheduled public transport with fixed stopping points.
  • Aquius summarises the patterns of fixed public transport networks. It presumes a degree of network stability over time, and cannot sensibly be used to describe a transport network that is in constant flux. The Aquius data structure allows filtering by time period, but such periods must be pre-defined and cannot offer the same precision as schedule-based systems.
  • Aquius only shows the direct service from here, not journeys achieved by interchange. If the intention is to show all possible interchanges without letting users explore the possibilities for themselves, then Aquius is not the logical platform to use: Displaying all possible interchanges ultimately results in a map of every service which fails to convey what is genuinely local to here.
  • As an internet application, Aquius is limited by the filesize of its datasets, which are currently always loaded as a single file. So while multiple local networks can be aggregated into a single large national dataset, such a dataset may be many megabytes in size, implying unacceptably long load times for online users.

Ready to explore? Try a live demonstration!

The remainder of this document provides a technical manual for Aquius as software. For a transport-orientated review of Aquius, see Aquius – An Alternative Approach to Public Transport Network Discovery.

Manual

In this document:

User FAQ

How are services counted?

Broadly, every service leaving from a node (stop, station) within here is counted once. If that service also arrives at a node within here, it is additionally counted at that node as an arrival, thus within here, services are summarised in both directions. Outside here, only services from here to there are summarised. Any specific local setdown and pickup conditions are taken into consideration. Services (typically trains) that split mid-journey are counted once over common sections. Individual services that combine more than one product together on the same vehicle are counted no more than once, by the longer distance component unless that has been filtered out by the choice of network.

How are people counted?

The population is that of the local area containing one or more nodes (stops, stations) linked to here by the services shown, including the population of here itself. Each node is assigned to just one such geographic area. The geography of these areas is defined by the dataset creator, but is intended to broadly convey the natural catchment or hinterland of the service, and typically uses local administrative boundaries (such as districts or municipalities) to reflect local definitions of place. This population may be additionally factored by Connectivity, as described in the next paragraph.

What is connectivity?

Connectivity factors the population linked (as described above) by the service level linking it - every unique service linking here with the local area is counted once. The Connectivity slider can be moved to reflect one of four broad service expectations, the defaults summarised in the table below (for left to right on the slider, read down the table). The slider attempts to capture broad differences in network perception, for example that 14 trains per day from London to Paris is considered a "good" service, while operating 14 daily within either city would be almost imperceptible. Except for Any, which does not factor population, the precise formula is: 1 - ( 1 / (service * factor)), if the result is greater than 0, with the default factor values: 2 (long distance), 0.2 (local/interurban) and 0.02 (city). The dataset creator or host may change the factor value (see Configuration key connectivity), but not the formula.

Connectivity Expectation 0% Minimum Service 50% Factor Service 95% Factor Service
Any 0 Entire population always counted Entire population always counted
Long Distance (low frequency) 0.5 1 10
Local/Interurban (mid frequency) 5 10 100
City (high frequency) 50 100 1000

What do the line widths and circle diameters indicate?

Links and stops are scaled by the logarithm of the service (such as total daily trains), so at high service levels the visual display is only slightly increased. Increasing the scale increases the visual distinction between service levels, but may flood the view in urban areas. The original numbers are associated reference information can be seen by clicking on the link or node. The area of each population circle is in proportion to the number of residents. The original numbers are displayed in a tooltip, visible when mousing over (or similar) the circle. The here circle defines the exact geographic area of here, that searched to find local stops.

How can everything be displayed?

Zoom out a lot, then click... The result may be visually hard to digest, and laggy - especially with an unfiltered network or when showing multiple map layers: Aquius wasn't really intended to display everything. Hosts can limit this behaviour (by increasing the value of Configuration key minZoom).

Quick Setup

In its most basic form, Aquius simply builds into a specific HTML element on a web page, the most basic configuration placed in the document body:

<div id="aquius-div-id" style="height:100%;"></div>
<script src="absolute/url/to/aquius.js" async></script>
<script>window.addEventListener("load", function() {
  aquius.init("aquius-div-id");
});</script>

Note older browsers also require the html and body elements to be styled with a height of 100% before correctly displaying the div at 100%. Absolute paths are recommended to produce portable embedded code, but not enforced.

The first argument of the function aquius.init() requires a valid id refering to an empty element on the page. The second optional argument is an Object containing keyed options which override the default configuration. Two option keys are worth introducing here:

  • dataset: String (quoted) of the full path and filename of the JSON file containing the network data (the default is empty).
  • uiHash: Boolean (true or false, no quotes) indicating whether Aquius records its current state as a hash in the browser's URL bar (great for sharing, but may interfere with other elements of web page design).

Others options are documented in the Configuration section below. Here is an example with the Spanish Railway dataset:

<div id="aquius" style="height:100%;"></div>
<script src="https://timhowgego.github.io/Aquius/dist/aquius.min.js"
  async></script>
<script>window.addEventListener("load", function() {
  aquius.init("aquius", {
    "dataset": "https://timhowgego.github.io/AquiusData/es-rail/20-jul-2018.json",
    "uiHash": true
  });
});</script>

Current script files can be found on Github. Sample datasets are also available.

Tip: Aquius can also be used as a stand-alone library: aquius.here() accepts and returns data objects without script loading or user interface. Alternatively, Aquius can be built into an existing Leaflet map object by sending that object to aquius.init() as the value of Configuration key map.

Limitations

  • Aquius is conceptually inaccessible to those with severe visual impairment ("blind people"), with no non-visual alternative available.
  • Internet Explorer 9 is the oldest vintage of browser theoretically supported, although a modern browser will be faster (both in use of Promise to load data and Canvas to render data). Mobile and tablet devices are supported, although older devices may lag when interacting with the map interface.
  • Aquius is written in pure Javascript, automatically loading its one dependency at runtime, Leaflet. Aquius can produce graphically intensive results, so be cautious before embedding multiple instances in the same page, or adding Aquius to pages that are already cluttered.
  • Aquius works efficiently for practical queries within large conurbations, or for inter-regional networks. The current extreme test case is a single query containing every bus stop and weekly bus service in the whole of Greater Manchester - a network consisting of about 5 million different stop-time combinations: This takes just under a second for Aquius to query on a slow 2010-era computer. A third of that second is for processing, and two thirds for map rendering: Aquius' primary bottleneck is the ability of the browser to render large numbers of visual objects on a canvas.

Known Issues

Aquius is fundamentally inaccessible to those with severe visual impairment: The limitation lay in the concept, not primarily the implementation. Aquius can't even read out the stop names, since it doesn't necessarily know anything about them except their coordinates. Genuine solutions are more radical than marginal, implying a quite separate application. For example, conversion of the map into a 3D soundscape, or allowing users to walk a route as if playing a MUD.

Multiple base maps can be added but only one may be displayed: A user selection and associated customisation was envisaged for the future.

Population bubbles are not necessarily centered on towns: These are typically located at the centroid of the underlying administrative area, which does not necessary relate to the main settlement. Their purpose is simply to indicate the presence of people at a broad scale, not to map nuances in local population distribution.

Circular services that constitute two routes in different directions, that share some stops but not all, display with the service in both directions serving the entire shared part of the loop: Circular services that operate in both directions normally halve the total service to represent the journey possibilities either clockwise or counter-clockwise, without needing to decide which direction to travel in to reach any one stop. Circular services that take different routes depending on their direction cannot simply be halved in this manner, even over the common section, because the service level in each direction is not necessarily the same. Consequently Aquius would have to understand which direction to travel in order to reach each destination the fastest. That would be technically possible by calculating distance, but would remain prone to misinterpretation, because a service with a significantly higher service frequency in one direction might reasonably be used to make journeys round almost the entire loop, regadless of distance. The safest assumption is that services can be ridden round the loop in either direction. In practice this issue only rarely arises, notably in Parla.

Circular services that were originally described in GTFS as a sequences of trips (blocks) may count the service totals correctly on the map (for nodes and links), but under-count those totals in summary statistics: This occurs where trips return to their origin node only when commencing the next trip in the sequence, such that each individual trip is not itself circular. GTFS To Aquius processes the sequence by joining the trips together, passing all the stops multiple times (accurate map totals), but holding the data as one continuous link (counted in the summary statistics).

Passenger journey opportunities are not always accurately presented where pickup/setdown restrictions apply, and multiple nodes on the same service are included within here: Assuming the dataset is correctly structured (see link property block in the Data Structure), Aquius should always count services accurately at each node and on each link. However the combination may be visually misleading, perhaps suggesting a link between nodes which are actually being counted only in regard to travel to destinations further along the route. The underlying problem lay in cabotage restrictions: Routes where pickup and setdown criteria vary depending on the passenger journey being undertaken, not the vehicle journey. This poses a logical problem for Aquius, since it needs to both display the links from each separate node and acknowledge that these separate links are provided by the exact same vehicle journey.

Link (route), node (stop), place names cannot be localised: Most text within Aquius can be localised (translated), but nodes, places and routes cannot. Many different references can be assigned to the same thing, so multiple languages are supported. But these have no language identifier, so cannot be matched to the user's locale setting. This is compromise between dataset filesize (nodes names, in particular, often bloat files, and even adding a localisation key to single names would tend to increase that bloat by about 50%), practical reality (few agencies provide multilingual information in a structured format, often opting for single names such as "oneLanguage / anotherLanguage"), and the importance of this information within Aquius (strictly secondary reference information).

Link reference data does not change to match the selected service filter: The reference data ascribed to each link describes all the service information that has been used to build the link. The service filter changes the total number of services, but does not know which pieces of reference data pertains to which service within the link, so continues to display all the reference data. Reference data is primarily intended to convey route-level information, which should not vary from trip to trip. While Aquius can display very detailed reference data, such as trip-specific train numbers which vary between different periods of time, such information is intended as a marginal benefit, especially to assist in debugging data, and should be presented with caution.

Configuration

As described in the Quick Setup section, the second optional argument of aquius.init() takes an Object containing keys and respective values. Any key not specified will use the default. Bespoke options must conform to the expected data type.

Tip: Clicking the Embed link on the bottom of the Layer Control will produce a dummy HTML file containing the configuration at the time the Embed was produced.

Data Sources

All except dataset, introduced in the Quick Setup section, can be happily ignored in most cases.

Key Type Default Description
base Array See below Array of objects containing base layer tile maps, described below
connectivity float 1.0 Factor for connectivity calculation: population * ( 1 - ( 1 / ( service * ( 2 / ( power(10, configuration.p) / 10 )) * connectivity))), where p is greater than 0
dataObject Object {} JSON-like network data as an Object: Used in preference to dataset
dataset string "" JSON file containing network data: Recommended full URL, not just filename
leaflet Object {} Active Leaflet library Object L: Used in preference to loading own library
locale string "en-US" Default locale, BCP 47-style. Not to be confused with user selection, t
map Object {} Active Leaflet map Object: Used in preference to own map
network Array [] Extension of network: Array of products, Object of locale keyed names, described below
networkAdd boolean true Append this network extension to dataset defaults. Set false to replace defaults
translation Object {} Custom translations, format described below

For base mapping, base is a complex Array containing one or more tile layers, each represented as an Object. Within, the key url contains a string URI of the resource, potentially formatted as described by Leaflet. WMS layers require a specific key type: "wms". Finally a key called options may be provided, itself containing an Object of keys and values, matching Leaflet's options. Or in summary:

"base": [
  {
    "url": "https://maps.wikimedia.org/osm-intl/{z}/{x}/{y}.png", 
    "type": "",
      // Optional, if WMS: "wms"
    "options": {
      // Optional, but attribution is always advisable
      "attribution": "&copy; OpenStreetMap contributors"
    }
  }
    // Extendable for multiple maps
]

The default locale needs to be fully translated, since it becomes the default should any other language choice not be translated.

The extension of network allows extra network filters to be appended to the defaults provided in the JSON dataset. For example, in the Spanish Railways dataset a separate network filter for FEVE could be created using the product's ID, here 14, and its name. Multiple products or networks can be added in this way. See the Data Structure section for more details. To replace the defaults in the dataset, set configuration networkAdd to false.

"network": [ 
  [ 
    [14],
      // Product ID(s)
    {"en-US": "FEVE"},
      // Locale:Name, must include the default locale
    {}
      // Optional properties
  ] 
  // Extendable for multiple networks
],
"networkAdd": false
  // Replace dataset defaults with this network selection

The translation Object allows bespoke locales to be hacked in. Bespoke translations take precedence over everything else. Even network names can be hacked by referencing key network0, where the final integer is the index position in the network Array. While this is not the optimal way to perform proper translation, it may prove convenient. The structure of translation matches that of getDefaultTranslation() within aquius.init(). Currently translatable strings are listed below. Missing translations default to locale, else are rendered blank.

"translation": {
  "xx-XX": {
    // BCP 47-style locale
    "lang": "X-ish",
      // Required language name in that locale
    "connectivity": "Connectivity",
      // Translate values into locale, leave keys alone
    "connectivityRange": "Any - Frequent",
    "embed": "Embed",
    "export": "Export",
    "here": "Location",
    "language": "Language",
    "link": "Services",
    "network": "Network",
    "node": "Stops",
    "place": "People",
    "scale": "Scale",
    "service": "Period"
  }
    // Extendable for multiple locales
}

The remaining configurations are far more basic.

Styling

Formatting of visual elements, mostly on the map.

Key Type Default Description
hereColor string "#080" CSS Color for here layer circle strokes
linkColor string "#f00" CSS Color for link (service) layer strokes
linkScale float 1.0 Scale factor for link (service) layer strokes: ceil( log( 1 + ( service * ( 1 / ( scale * 4 ) ) ) ) * scale * 4)
minWidth integer 2 Minimum pixel width of links, regardless of scaling. Assists click useability
minZoom integer 0 Minimum map zoom. Sets a soft cap on query complexity
nodeColor string "#333" CSS Color for node (stop) layer circle strokes
nodeScale float 1.0 Scale factor for node (stop) layer circles: ceil( log( 1 + ( service * ( 1 / ( scale * 4) ) ) ) * scale * 2)
panelOpacity float 0.7 CSS Opacity for background of the bottom-left summary panel
panelScale float 1.0 Scale factor for text on the bottom-left summary panel
placeColor string "#00f" CSS Color of place (population) layer circle fill
placeOpacity float 0.5 CSS Opacity of place (population) layer circle fill: 0-1
placeScale float 1.0 Scale factor for place (population) layer circles: ceil( sqrt( people * scale / 666) )

Caution: Colors accept any CSS format, but be wary of introducing transparency this way, because it tends to slow down rendering. Transparent link lines will render with ugly joins.

User Interface

Enable or disable User Interface components. These won't necessarily block the associated code if it can be entered by an alternative means, such as the hash.

Key Type Default Description
uiConnectivity boolean true Enables connectivity slider (if the dataset's place length > 0)
uiHash boolean false Enables recording of the user state in the URL's hash
uiLocale boolean true Enables locale selector
uiNetwork boolean true Enables network selector (if the dataset's network length > 1)
uiPanel boolean true Enables summary statistic panel
uiScale boolean true Enables scale slider
uiService boolean true Enables service selector (if the dataset's service length > 1)
uiShare boolean true Enables embed and export
uiStore boolean true Enables browser session storage of user state

User State

As seen in the URL hash (if uiHash is true). Coordinates are always WGS 84. Here click defines the centre of the search.

Key Type Default Description
c float -0.89 Here click Longitude
k float 41.66 Here click Latitude
m integer 11 Here click zoom
n integer 0 User selected network filter: Must match range of network in dataset
p integer 0 User selected connectivity setting
v string "hlnp" User selected map layers by first letter: here, link, node, place
r integer 0 User selected service filter: Must match range of service in dataset
s integer 5 User selected global scale factor: 0 to 10
t string "en-US" User selected locale: BCP 47-style
x float -3.689 Map view Longitude
y float 40.405 Map view Latitude
z integer 6 Map view zoom

Tip: Instead of specifying s, consider altering the corresponding linkScale, nodeScale, and/or placeScale. Instead of specifying t, consider altering the default locale.

Here Queries

Aquius can also be used as a stand-alone library via aquius.here(), which accepts and returns data Objects without script loading or user interface. Arguments, in order:

  1. dataObject - Object containing a Data Structure.
  2. options - optional Object of key:value pairs.

Possible options:

  • callback - function to receive the result, which should accept Object (0) error (javascript Error), (1) output (as returned without callback, described below), and (2) options (as submitted, which also allows bespoke objects to be passed through to the callback).
  • connectivity - float factor applied to weight population by service level: population * ( 1 - ( 1 / (service * connectivity))), if result is greater than zero. If connectivity is missing or less than or equal to 0, population is not factored. The user interface calculates connectivity as: configuration.connectivity * ( 2 / ( power(10, configuration.p) / 10 )), producing factors 2, 0.2, and 0.02 - broadly matching long distance, local/inter-urban and city expectations.
  • geoJSON - Array of strings describing map layers to be outputted in GeoJSON format ("here", "link", "node" and/or "place").
  • network - integer index of network to filter by.
  • place - Array of integer dataObject place indices to define here (in addition to any range/x/y criteria). Place-based criteria do not currently not return a specific geographic feature equivalent of here.
  • range - float distance from here to be searched for nodes, in metres.
  • sanitize - boolean check data integrity. Checks occur unless set to false. Repeat queries with the same dataObject can safely set sanitize to false.
  • service - integer index of service to filter by.
  • x - float longitude of here in WGS 84.
  • y - float latitude of here in WGS 84.

Caution: Sanitize does not fix logical errors within the dataObject, and should not be used to check data quality. Sanitize merely replaces missing or incomplete structures with zero-value defaults, typically causing bad data to be ignored without throwing errors.

Without a callback, calls to aquius.here() return a JSON-like Object. On error, that Object contains a key error.

If geoJSON is specified a GeoJSON-style Object with a FeatureCollection is returned. In addition to the standard geometry data, each Feature has two or more properties, which can be referenced when applying styling in your GIS application:

  • type - "here", "link" (routes), "node" (stops), "place" (demographics).
  • value - numeric value associated with each (such as daily services or resident population).
  • node - array of reference data objects relating to the node itself.
  • link - array of reference data objects relating to the links at the node, or the links contained on the line.
  • place - array of reference data objects relating to the place itself.

The information contained within keys node and link is that otherwise displayed in popup boxes when clicking on nodes or links in the map view. The existence of keys node, link and place will depend on the dataset. The potential format of the objects within reference property of node, link and place are described in the Data Structure.

Otherwise the JSON-like Object will contain summary, is an Object containing link, node and place totals, and geometry for here, link, node and place. Each geometry key contains an Array of features, where each feature is an Object with a key value (the associated numeric value, such as number of services) and either circle (here, node, place) or polyline (link). Circles consist of an Array containing a single x, y pair of WGS 84 coordinates. Polylines consist of an Array of Arrays in route order, each child Array containing a similar pair of x, y coordinates. The (sanitized) dataObject will also be returned as a key, allowing subsequent queries with the returned dataObject to be passed with sanitize false, which speeds up the query slighty.

Caution: Both link outputs mirrors the internal construction of Aquius' map, which tries to find adjoining links with the same service frequency and attach them to one continuous polyline. The logic reduces the number of objects, but does not find all logical links, nor does it necessarily links the paths taken by individual services. If you need to map individual routes interrogate the original link in the original dataObject.

GTFS To Aquius

General Transit Feed Specification is the most widely used interchange format for public transport schedule data. A script is available that automatically converts single GTFS archives into Aquius datasets. This script is currently under development, requiring both features and testing, so check the output carefully. A live demonstration is available here. Alternatively, run the gtfs.min.js file privately, either:

  1. With a user interface: Within a webpage, load the script and call gtfsToAquius.init("aquius-div-id"), where "aquius-div-id" is the ID of an empty element on the page.
  2. From another script: Call gtfsToAquius.process(gtfs, options). Required value gtfs is an Object consisting of keys representing the name of the GTFS file without extension, whose value is an array containing one or more blocks of raw text from the GTFS file - for example, "calendar": ["raw,csv,string,data"]. If the file is split into multiple blocks (to allow very large files to be handled) the first block must contain at least the first header line. options is an optional Object that may contain the following keys, each value itself an Object:
  • callback - function to receive the result, which should accept 3 Object: error (javascript Error), output (as returned without callback, described below), options (as submitted, which also allows bespoke objects to be passed through to the callback).
  • config - contains key:value pairs for optional configuration settings, as described in the next section.
  • geojson - is the content of a GeoJSON file pre-parsed into an Object, as detailed in a subsequent section.

Without callback, the function returns an Object with possible keys:

  • aquius - as dataObject.
  • config - as config, but with defaults or calculated values applied.
  • error - Array of error message strings.
  • summary - Object containing summary network (productFilter-serviceFilter matrix) and service (service histogram).

Caution: Runtime is typically about a second per 10 megabytes of GTFS text data (with roughly half that time spent processing the Comma Separated Values), plus time to assign stops (nodes) to population (places), which varies depending on the number and complexity of GeoJSON boundaries processed. The single-operator networks found in most GTFS archives should process within 5-10 seconds, but very complex multi-operator conurbations or regions may take longer. Processing requires operating system memory of up to 10 times the total size of the raw text data. If processing takes more than a minute, it is highly likely that the machine has run out of free physical memory, and processing should be aborted.

Tip: There are two ways to work around any memory limitation, both of which can be used together if required:

  1. Run GTFS to Aquius As Node Script (detailed below). Node can specifically be allocated system memory that is not necessarily available to browser processes.
  2. Copy all the GTFS file and divide stop_times.txt (which is almost always far larger than any other file, and thus most likely the trigger for any memory issue) into two or more pieces, each placed in a separate file directory. The stop_times.txt divide must not break a trip (divide immediately before a stop_sequence 0) and for optimum efficiency should break between different route_id. The first (header) line of stop_times.txt must be present as the first line of each file piece. Add to each directory identical config.json files and a complete copy of all the other GTFS files used - except frequencies.txt, which if present must only be added in one directory. GTFS file shapes.txt (which is sometimes large) is not used by GTFS to Aquius, so can also be skipped. Run GTFS to Aquius on each directory separately, then use Merge Aquius to merge the two (or more) outputs together. GTFS to Aquius will skip any (not-frequency) link it cannot find stop_times for, while Merge Aquius automatically merges the duplicate nodes created.

As Node Script

A Node script is available to run GTFS to Aquius from the command line: dist/gtfs-node.js. Recommended when handling large (100+ MB) GTFS datasets. Thereafter the limit is Node and ultimately System memory. Try granting Node up to 10MB of memory per MB of GTFS text. For example, for 30GB: node --max-old-space-size=30000.

Setup is the same as the browser version: Extract the GTFS into a directory, alongside any config.json and boundaries.geojson. Other operations, such as unzipping the GTFS, or managing config files,will need be done using other bespoke scripts. Then command line: node path/to/gtfs-node.js path/to/data_directory/. Or with more memory: node --max-old-space-size=30000 path/to/gtfs-node.js path/to/data_directory/.

Outputs added (created or overwritten) in the data directory:

  • aquius.json - output for use in the wider Aquius ecosystem.
  • config.auto - augmented or generated config.json - rename to config.json to reuse in future GTFS processing.
  • histogram.csv - proprotion of services by hour of day, useful for quality assurance.
  • network.csv - service totals by network, useful for quality assurance.

Configuration File

By default GTFS To Aquius simply analyses services over the next 7 days, producing average daily service totals, filtered by agency (operator). GTFS To Aquius accepts and produces a file called config.json, which in the absence of a detailed user interface, is the only way to customise the GTFS processing. The minimum content of config.json is empty, vis: {}. To this Object one or more key: value pairs may be added. Currently supported keys are:

Key Type Default Description
agencyPrefix string "" Prefix string to all agency names in this GTFS file. Helpful to maintain uniqueness of names when processing batches of GTFS files. Only applied when networkFilter references are created automatically, so will not override reference names already specified in config.json
allowBlock boolean false Process block_id as a sequence of inter-operated trips, described below
allowCabotage boolean false Process duplicate vehicle trips with varying pickup/setdown restrictions as cabotage, described below
allowCode boolean true Include stop codes
allowColor boolean true Include route-specific colors
allowDuplication boolean false Include duplicate vehicle trips (same route, service period, direction and stop times)
allowDuration boolean false Include array of crude average total minutes per trip by service period (experimental)
allowDwell boolean false Include array of crude average minute dwell times per trip by node (experimental)
allowHeadsign boolean false Include trip-specific headsigns (information may be redundant if using allowRoute)
allowName boolean true Include stop names (increases file size significantly)
allowNoc boolean false Append Great Britain NOC to operator name if available (for newly generated network filter only)
allowRoute boolean true Include route-specific short names
allowRouteLong boolean false Include route-specific long names
allowRouteUrl boolean true Include URLs for routes (can increase file size significantly unless URLs conform to logical repetitive style)
allowSplit boolean false Include trips on the same route (service period and direction) which share at least splitMinimumJoin (but not all) stop times as "split"
allowStopUrl boolean true Include URLs for stops (can increase file size significantly unless URLs conform to logical repetitive style)
allowWaypoint boolean true Include stops with no pickup and no setdown as dummy routing nodes
allowZeroCoordinate boolean true Include stops with 0,0 coordinates, else stops are skipped
codeAsStopId boolean false Use stop_id as stop code (requires allowCode=true)
coordinatePrecision integer 5 Coordinate decimal places (smaller values tend to group clusters of stops), described below
duplicationRouteOnly boolean true Restrict duplicate check to services on the same route
fromDate YYYYMMDD dateString Today, or if today is not in GTFS's dates, middle of the available dates Start date for service pattern analysis (inclusive)
inGeojson boolean true If geojson boundaries are provided, only services at stops within a boundary will be analysed. If false, assigns stops to nearest boundary (by centroid)
isCircular array [] GTFS "route_id" (strings) to be referenced as circular. If empty (default), GTFS to Aquius follows its own logic, described below
meta object {"schema": "0"} As Data Structure meta key
mirrorLink boolean true Services mirrored in reverse are combined into the same link. Reduces filesize, but can distort service averages
modeInclude array [] Mode GTFS route_type as strings to limit analysis to, all if empty (for example, for funicular only, ["7", "1400"])
networkFilter object {"type": "agency"} Group services by, using network definitions, detailed below
nodeGeojson object {} Cache containing node "x:y": Geojson "x:y" (both coordinates at coordinatePrecision), described below
option object {} As Configuration/Data Structure option key
placeNameProperty string "name" Field name in GeoJSON properties containing the name or identifier of the place
populationProperty string "population" Field name in GeoJSON properties containing the number of people (or equivalent demographic statistic)
productOverride object {} Properties applied to all links with the same product ID, detailed below
routeExclude array [] GTFS "route_id" (strings) to be excluded from analysis
routeInclude array [] GTFS "route_id" (strings) to be included in analysis, all if empty
routeOverride object {} Properties applied to routes, by GTFS "route_id" key, detailed below
serviceFilter object {} Group services by, using service definitions, detailed below
servicePer integer 1 Service average per period in days (1 gives daily totals, 7 gives weekly totals), regardless of fromDate/toDate
splitMinimumJoin integer 2 Minimum number of concurrent nodes that split services must share
stopExclude array [] GTFS "stop_id" (strings) to be excluded from analysis
stopInclude array [] GTFS "stop_id" (strings) to be included in analysis, all if empty
stopOverride object {} Properties applied to stops, by GTFS "stop_id" key, detailed below
stopPlace boolean false Group and merge stops by their respective place centroid (assumes geojson)
toDate YYYYMMDD dateString Week after fromDate End date for service pattern analysis (inclusive)
translation object {} As Configuration/Data Structure translation key

Tip: The fastest way to start building a config.json file is to run GTFS To Aquius once, download and edit the resulting config.json, then use that file in subsequent GTFS To Aquius processing.

Blocks

GTFS "block_id" can be used to group sequences of trips together. Some GTFS sources use this value to communicate important routing information, notably circular (see below) or "town" services where one trip operates into another for the benefit of through passengers. Potentially, without processing the "block_id" one segment of the journey may be missed entirely. In these cases allowBlock should be set true, which will allow GTFS to Aquius to merge all the block trips together as one continuous link. However some GTFS sources use "block_id" literally, as an identifier of a vehicle diagram (operational allocation) on a route that offes no practical reason for passengers to continue from one trip to the next. Such use of "block_id" can result in excessively large Aquius files because blocks will duplicate the route's stops multiple times - and potentially also duplicate those extended stop sequences for many different vehicle diagrams. Consequently "block_id" processing is skipped by default, with trips processed separately. This default is typically the most reliable, but if key segments of route are missing from output, and GTFS trips.txt includes a "block_id" column, try setting allowBlock to true.

Circulars

Circular services are where one journey seamlessly operates into the next as a contiunous loop. If circular services are not correctly flagged circular, the Aquius output will double-count those services at their start/end node, and (for circular services defined separately in each direction) may imply passenger journey opportunities that are affected only by the least direct of route around the circle.

The key isCircular allows specific "route_id" values to be flagged as circular. Values reference the first column of GTFS routes.txt. If one or more route is specified in this way, only those routes specified will be flagged as circular. This gives absolute control over what services are assigned circular, and should be used if the default logic fails to correctly differentiate circular routes.

If isCircular is empty, GTFS to Aquius will attempt to evaulate whether a route is circular, vis:

  1. Start and end stops are the same, and the route contains multiple trips with a shared (non-empty) "block_id". Loops should be coded with a single "block_id" that groups all such journeys, but this convention is not universal, and often the "block_id" is empty.
  2. Start and end stops are the same, with the stop immediately after the start and the stop prior to the end greater than 200 metres from each other. Aquius processes basic lasso routes as uni-directional, so should not flag such routes as circular. Complex loops, such as figures-of-eight (dual loop with a common mid-point) or dual lassos (common middle section with loops at each end), should be flagged as circular. Although the "outbound" and "inbound" segment of a simple lasso route may be serve the same location along the same street, the actual stops served may differ because they are on opposite sides of that street, hence the range test.

Tip: To disabled the default logic without actually assigning any route as circular specify "isCircular": [null]. This gives the array a length greater than 0 (thus disables the default logic) without ever matching any valid "route_id" (which cannot be null).

Coordinates

Lowering the value of coordinatePrecision reduces the size of the Aquius output file, but not just because the coordinate data stored is shorter: At lower coordinatePrecision, stops that are in close proximity will tend to merge into single nodes, which in turn tends to result in fewer unique links between nodes. Such merges reflect mathematical rounding, and will not necessarily group clusters of stops in a way that users or operators might consider logical. Networks with widespread use of pickup and setdown restrictions may be rendered illogical by excessive grouping. Aggregation of different stops into the same node may also create false duplicate trips, since duplication is assessed by node, not original stop - in most case setting allowDuplication to true will avoid this issue (as detailed below). If the precise identification of individual stops is important, increase the default coordinatePrecision to avoid grouping. Very low coordinatePrecision values effectively transform the entire network into a fixed "Vortex Grid", useful for strategic geospatial analysis. For greater control over such Vortex Grids, specify the desired grid as GeoJSON boundaries and set stopPlace to true, which groups all nodes on the centroid of their respective (GeoJSON) place.

Duplication

A duplicate is a vehicle trip which shares all its stop times with another trip on the same route (unless duplicationRouteOnly is false), and in the same direction and same (calendar) day. Duplicates normally add no additional passenger journey opportunities, merely capacity, so by default these duplicate trips are removed (allowDuplication is false). If the Aquius output specifically analyses capacity or counts vehicles, set allowDuplication to true. Highly serviced routes with imprecise schedules may need allowDuplication as true to avoid falsely identified duplicates being removed. Networks analysed at low coordinatePrecision may need allowDuplication as true to avoid falsely created duplicates being removed - separate trips that originally served different stops at the same time having been grouped into single nodes. By default duplicates between different routes are excluded, since high-frequency urban corridors may schedule more than one trip per minute between the same stops. This route limitation can be overriden by setting duplicationRouteOnly to false, which would then, for example, allow split services (see below) which have been allocated to different routes to be grouped properly.

A split is a vehicle trip that duplicates at least two, but not all, of its stop times with another trip on the same route (and in the same service period and direction). If allowSplit is set true, the duplicate trip will have its unique stops attributed as a "split", which means Aquius does not double-count the service over common sections of route. The original trip remains unchanged. By default, split services must share at least 2 concurrent nodes, a number that may be redefined using splitMinimumJoin. Genuine split operation occurs in multi-vehicle modes such as rail. Note that where such split trips have been misappropiated to indicate guaranteed connections onto entirely different routes, GTFS to Aquius will double-count the service at stops common to both routes because it only considers duplication within routes. In such cases, set allowSplit to false and maintain the default allowDuplication as false to discard such trips entirely.

The two prior duplication definitions do not consider any pickup or setdown restrictions, only that the vehicle trip is duplicated or split. If allowCabotage is set true, trips that would be categorised as duplicate or split, but contain pickup or setdown criteria which differ from the original trip, are processed differently: Such duplicates are grouped to indicate they are part of the same vehicle journey, and thus should only be counted once as a group, but are not removed because each expresses a unique combination of passenger journey opportunities, depending on the origin and destination of that passenger. Technically these are assigned an Aquius "block". Services operating between different administrative jurisdictions, especially different nations, may be restricted in the point between which passengers can be carried - for example, an international service may be able to pickup passengers at stops within one country and set them down in another, but not convey passengers solely within either country. Each set of restrictions may be represented in the GTFS file as a duplicate trip.

Overrides

Configuration key productOverride allows colors to be applied to all links of the same product, where product is defined by networkFilter.type (above). The productOverride key's value is is an Object whose key names refer to GTFS codes (agency_id values for "agency", or route_type values for "mode"), and whose values consist of an Object of properties to override any (and missing) GTFS values. Currently only two keys are supported: "route_color" and "route_text_color", each value is a string containing a 6-character hexadecimal HTML-style color (matching GTFS specification). These allow colors to be added by product, for example to apply agency-specific colors to their respective operations.

Configuration key routeOverride allows bespoke colors or route names to be applied by route (taking precedence). The routeOverride key's value is an Object whose keys are GTFS "route_id" values. The value of each "route_id" key is an Object containing keys "route_short_name", "route_color", "route_text_color" and/or "route_url", with content matching GTFS specification.

Configuration key stopOverride allows poorly geocoded stops to be given valid coordinates, and bespoke (user friendly) codes, names, and URLs to be specified. The stopOverride key's value is is an Object whose keys are GTFS "stop_id" values. The value of each "stop_id" key is an Object containing override coordinates (WGS 84 floats) "x" and "y", and/or "stop_code", "stop_name" and "stop_url". If GTFS to Aquius encounters 0,0 coordinates it will automatically add these stops to stopOverride for manual editing. Alternatively 0,0 coordinate stops can be skipped entirely by setting config allowZeroCoordinate to false.

Network Filter

Configuration key networkFilter defines groups of product IDs which the user can select to filter the results displayed. These filters are held in the network key of the Data Structure. networkFilter is an Object consisting one or more keys:

  • type - string either "agency", which assigns a product code for each operator identified in the GTFS (default), or "mode", which assigns a product code for each vehicle type identified in the GTFS (only the original types and "supported" extensions will be named).
  • network - Array containing network filter definitions, each an Array consisting: First, an Array of the GTFS codes (agency_id values for "agency", or route_type values for "mode") included in the filter, and second an Object of localised names in the style {"en-US": "English name"}. If the GTFS file contains no agency_id, specify the agency_name instead.
  • reference - Object whose keys are GTFS codes (agency_id values for "agency", or route_type values for "mode") and those value is an Object of localised names in the style {"en-US": "English name"}, which allows the respective names to be specified.

Tip: The easiest way to build bespoke network filters is to initially specify only type, process the GTFS data once, then manually edit the config.json produced. If using GTFS To Aquius via its user interface, a rough count of routes and services by each productFilter will be produced after processing, allowing the most important categories to be identified.

Caution: Pre-defining the networkFilter will prevent GTFS To Aquius adding or removing entries, so any new operators or modes subsequently added to the GTFS source will need to be added manually.

Service Filter

Configuration key serviceFilter can be used (in addition to any productFilter) to filter or summarise the number of service by different time periods. serviceFilter is an Object consisting one or more keys:

  • type - string currently always "period".
  • period - Array of time period definitions which are applied in GTFS processing.

Each time period definition within period is an Object consisting one or more keys:

  • day - Array of days of the week, lowercase English (as used in GTFS Calendar headers). The day evaluated is that assigned by date in the GTFS.
  • time - Array of time periods, each an Object with optional keys: start and end are strings in the format "HH:MM:SS" (as in GTFS times). Multiple sets of time periods can be specified as separate objects in the array.
  • name - Object consisting locale:name in that locale.

In categorising services by time, GTFS To Aquius interprets times as equal to or after start, but before end. Only the keys specified are evaluated, thus a start time with no end time will analyse the entire service at and after that start time. For scheduled trips, each vehicle journey is evaluated based by mean time - the average of the earliest and latest times in the trip schedule. For example, a trip that leaves its first stop at 09:00:00 and arrives at its final stop at 10:00:00 would match any time criteria that included 09:30:00. Frequency-based services count the number of services based on the average headway(s) during the time period defined.

Caution: GTFS to Aquius naively mirrors whatever convention the GTFS file has used for handling services operated wholly or partly after midnight: Some operators consider all trips that commence after midnight to be on a new day. Others continue the previous day into the next until a notional "end of service". Days are continued into the next by adding 24 hours to the clock time (for example 01:10:00 on the next day becomes 25:10:00 on the original day). Early morning services may require two time conditions, as shown in the example below. Night services are easily double-counted, so if in doubt spot-check the output of such periods against published timetables.

{
  "serviceFilter": {
    "type": "period",
    "period": [
      {
      "name": {"en-US": "00:00-06:00 Catchall"},
      "time": [
        {"end": "06:00:00"},
          // Before 06:00, as today
        {"start": "24:00:00", "end": "30:00:00"}
          // Midnight until before 06:00, as tomorrow
      ]
      // No "day" key = all days
      } 
    ]
  }
}

Caution: The serviceFilter always applies a single time criteria to the whole journey, an assumption that will become progressively less realistic the longer the GTFS network's average vehicle journey. For example, an urban network may be usefully differentiated between morning peak and inter-peak because most vehicle journeys on urban networks are completed within an hour, and thus the resulting analysis will be accurate within 30 minutes at all nodes on the route. In contrast, inter-regional vehicle journey duration may be much longer, and such detailed time periods risk misrepresenting passenger journey opportunities at certain nodes: For example, a 4-hour vehicle journey are commences at 07:30 might match a morning peak definition at its origin, but not by the time it reaches its final destination at 11.30. Such networks may be better summarised more broadly - perhaps morning, afternoon and evening. Long-distance or international networks, where vehicle journeys routinely span whole days, may be unsuitable for any form of serviceFilter.

Tip: Service totals within defined time periods are still calculated as specified by servicePer - with default 1, the total service per day. This is a pragmatic way of fairly summarising unfamiliar networks with different periods of operation. However if serviceFilter periods exclude the times of day when the network is closed, the servicePer setting may be set per hour (0.04167), which may make it easier to compare periods of unequal duration, especially on metro networks with defined opening and closing times.

GeoJSON File

Optionally, a GeoJSON file can be provided containing population data, which allows Aquius to summarise the people served by a network. The file must end in the extension .json or .geojson, must use (standard) WGS 84 coordinates, and must contain either Polygon or MultiPolygon geographic boundaries. Each feature should have a property containing the number of people (or equivalent demographic statistic), either using field name "population", or that defined in config.json as populationProperty. Excessively large or complex boundary files may delay processing, so before processing GTFS To Aquius, consider reducing the geographic area to only that required, or simplifying the geometry.

GTFS To Aquius will attempt to assign each node (stop) to the boundary it falls within. For consistent results, boundaries should not overlap and specific populations should not be counted more than once. The choice of boundaries should be appropriate for the scale and scope of the services within the GTFS file: Not so small as to routinely exclude nodes used by a local population, but not so large as to suggest unrealistic hinterlands or catchment areas. For example, an entire city may reasonably have access to an inter-regional network whose only stop is in the city centre, and thus city-level boundaries might be appropriate at inter-regional level. In contrast, an urban network within a city should use more detailed boundaries that reflect the inherently local nature of the areas served. Note that the population summaries produced by Aquius are not intended to be precise, rather to provide a broad summary of where people are relative to nearby routes, and to allow basic comparison of differences in network connectivity.

If configuration key inGeojson is true (the default), the entire dataset will be limited to services between stops within the GeoJSON boundaries. If configuration key inGeojson is false (and a GeoJSON file is provided), any stops in the dataset that fall outside a boundary will be associated to the nearest boundary, calculated on the distance between the stop and the boundary centroid. The false option is useful for relating stops that fall just outside administrative boundaries, such as ferry terminals. Note that, as currently implemented, in GeoJSON to Aquius the false option simply retains the stop without assigning it to a boundary.

Processing can be time consuming for Geojson files containing thousands of boundaries, so the results are cached as configuration key nodeGeojson. If populated, this key will assign nodes to boundaries based on the centroids provided, without first searching the entire Geojson file for a match. For extremely large datasets, this key can potentially be pre-populated with data calculated by (computationally more efficient) GIS software. The coordinate precision within nodeGeojson must match that of the key coordinatePrecision for the cache to be used. Any node not found in the cache will be processed as normal, so the cache need not be complete.

Caution: Caching with nodeGeojson assumes boundaries have unique centroids. Nodes cannot be reliably cached where different boundaries share the same centroid, for example perfect concentric rings. To work round this limitation exclude such nodes from nodeGeojson, or do not specify the key in the configuration (which defaults to empty).

GeoJSON to Aquius

This tool converts geographic network data in GeoJSON format into an Aquius dataset file. GeoJSON lines become Aquius links, GeoJSON points become Aquius nodes, and GeoJSON boundaries become Aquius places (population). Lines must be provided, other data is optional. If points are provided, they must match the coordinates of the start of end of lines to be processed usefully. GeoJSON files must be projected using WGS 84 (which is normally the default for the format).

Non-spatial data may be attached as GeoJSON field names (technically called properties). This data typically matches the Aquius Data Structure (for links and nodes respectively), except each piece of data takes a GeoJSON field of its own. For example, data items that Aquius contains within a reference Object within a properties Object are instead exposed as a top level GeoJSON property. Data that is otherwise held as an Array (notably product and service keys) is instead provided in the GeoJSON as a comma-separated string. For example, a network with two service filter definitions should attach a service property to each GeoJSON line with the string value "10,20", where 10 is the service count associated with the first filter and 20 the second filter.

GeoJSON files can be crafted to match the expected property names, or bespoke names in GeoJSON files can be assigned using a config.json file, as listed below. The optional file config.json is also important for defining product and service filters (which are otherwise left empty), and adding other header data, such as meta names and translations.

Key Type Default Description
blockProperty string "block" Field name in GeoJSON properties containing block
circularProperty string "circular" Field name in GeoJSON properties containing circular
colorProperty string "color" Field name in GeoJSON properties containing service color (6-hex, no hash, as GTFS specification)
coordinatePrecision integer 5 Coordinate decimal places (as Aquius to GTFS Coordinates)
directionProperty string "direction" Field name in GeoJSON properties containing direction
inGeojson boolean true If geojson boundaries are provided, only services at nodes within a boundary will be analysed
linkNameProperty string "name" Field name in GeoJSON properties containing service name
linkUrlProperty string "url" Field name in GeoJSON properties containing service url
meta object {} As Data Structure meta key
network object {} As Data Structure network key
nodeCodeProperty string "code" Field name in GeoJSON properties containing node code
nodeNameProperty string "name" Field name in GeoJSON properties containing node name
nodeUrlProperty string "url" Field name in GeoJSON properties containing node url
option object {} As Configuration/Data Structure option key
placeNameProperty string "name" Field name in GeoJSON properties containing the name or identifier of the place
populationProperty string "population" Field name in GeoJSON properties containing the number of people (or equivalent demographic statistic)
product Array [] As Data Structure network reference.product key (products in index order, referencing productProperty data)
productProperty string "product" Field name in GeoJSON properties containing product array as comma-separated string
service object {} As Data Structure service key
serviceProperty string "service" Field name in GeoJSON properties containing service array as comma-separated string
sharedProperty string "shared" Field name in GeoJSON properties containing shared product ID
textColorProperty string "text" Field name in GeoJSON properties containing service text color (6-hex, no hash, as GTFS specification)
translation object {} As Configuration/Data Structure translation key

Caution: Pickup, setdown, and split are not currently supported. Product and service filters are specified exactly as they appear in the output file, as described in Data Structure: There is no logical processing of product and service data of the type performed by GTFS To Aquius.

This script is currently under development, requiring both features and testing, so check the output carefully. A live demonstration is available here. Alternatively, run the geojson.min.js file privately, either:

  1. With a user interface: Within a webpage, load the script and call geojsonToAquius.init("aquius-div-id"), where "aquius-div-id" is the ID of an empty element on the page.
  2. From another script: Call geojsonToAquius.cartograph(geojson, options).

Required value geojson is an Array consisting of GeoJSON Objects - the original file content already processed by JSON.parse(). options is an optional Object that may contain the following keys, each value itself an Object:

  • callback - function to receive the result, which should accept 3 Object: error (javascript Error), output (as returned without callback, described below), options (as submitted, which also allows bespoke objects to be passed through to the callback).
  • config - contains key:value pairs for optional configuration settings. The minimum content of config.json is empty, vis: {}. To this Object one or more key: value pairs may be added, as listed above.

Without callback, the function returns an Object with possible keys:

  • aquius - as dataObject.
  • config - as config, but with defaults or calculated values applied.
  • error - Array of error message strings.

Merge Aquius

This tool merges Aquius dataset files into a single file. The tool cannot understand these files beyond their technical structure, so the files should be produced in a similar manner:

  • Files must not duplicate one another's service count - else service totals will erroneously add.
  • Files must all share the same or very similiar service filters - the filter indices and definitions are blindly assumed comparable.
  • Files should use (if any) the same place/demographic data - the original boundaries are not present, so no assessment can be made of overlaps (in future it should become possible to merge a GeoJSON file containing this data, but currently such data must be built into the original Aquius files, prior to merging).

Nodes and places are grouped by shared coordinates - the number of decimal places can be set as configuration key coordinatePrecision (described below). Products are grouped on name - all translations must be identical. Meta, option and translation content will be copied, but cannot always be merged - configuration keys can be used to supply definitions.

This script is currently under development, requiring both features and testing, so check the output carefully. A live demonstration is available here. Alternatively, run the merge.min.js file privately, either:

  1. With a user interface: Within a webpage, load the script and call mergeAquius.init("aquius-div-id"), where "aquius-div-id" is the ID of an empty element on the page.
  2. From another script: Call mergeAquius.merge(input, options).

Required value input is an Array consisting of one or more dataObject in the order to be processed. As a minimum, these must have meta.schema, link and node keys. options is an optional Object that may contain the following keys, each value itself an Object:

  • callback - function to receive the result, which should accept 3 Object: error (javascript Error), output (as returned without callback, described below), options (as submitted, which also allows bespoke objects to be passed through to the callback).
  • config - contains key:value pairs for optional configuration settings. The minimum content of config.json is empty, vis: {}. To this Object one or more key: value pairs may be added. Currently supported keys are:
Key Type Default Description
coordinatePrecision integer 5 Coordinate decimal places (as GTFS to Aquius, smaller values tend to group clusters of stops)
meta object {} As Data Structure meta key
option object {} As Configuration/Data Structure option key
translation object {} As Configuration/Data Structure translation key

Without callback, the function returns an Object with possible keys:

  • aquius - as dataObject.
  • config - as config, but with defaults or calculated values applied.
  • error - Array of error message strings.

Caution: Merge Aquius is intended to merge sets of files created in a similiar manner. Merging an adhoc sequence of Aquius files may appear successful, but the actual services presented may be extremely inconsistent, especially if service filters differ or different analysis periods have been used.

Tip: The original dataset files are processed in order of filename, which allows processing order to be controlled. The product filters of first file will appear at the top of the merged product filter. The nodes in the first file will tend to hold smaller index values, which may have a small impact on final file size. The first file is the first source consulted for meta, option, translation (all unless defined by configuration), and service filter.

Data Structure

Aquius requires a network dataset JSON file to work with. The dataset file uses a custom data structure, one intended to be sufficiently compact to load quickly, and thus shorn of much human readability and structural flexibility. The dataset file will require custom pre-processing by the creator of the network. GTFS To Aquius or GeoJSON to Aquius can be used to automate this pre-processing. Aquius performs some basic checks on data integrity (of minimum types and lengths) that should catch the more heinous errors, but it is beholden on the creator of the dataset to control the quality of the data therein. JSON files must be encoded to UTF-8.

Meta

The most basic dataset is a Object with a key "meta", that key containing another Object with key:value pairs. The only required key is schema, which is currently always a string "0". Other options are as shown in the example below:

{
  "meta": {
    "attribution": {
      "en-US": "Copyright and attribution",
        // Short, text only
      "es-ES": "Derechos"
    },
    "description": {
      "en-US": "Human readable description"
        // For future use
    },
    "name": {
      "en-US": "Human readable name",
        // Short, text only
      "es-ES": "Nombre"
    },
    "schema": "0",
      // Required, always "0"
    "url": "absolute/url/to/more/human/readable/information"
      // Will be wrapped around name
  }
}

Translation

An optional key translation may contain an Object with the same structure as that described for the translation Configuration option. Translations in the dataset take precedence over every translation except any in the translation option. If the dataset's network consists of Stations and not generic Stops, the dataset's translation key contains the best place to redefine that, for example:

"translation": {
  "en-US": {
    "node": "Stations"
  },
  "es-ES": {
    "node": "Estaciones"
  }
}

Option

An optional key option may contain an Object with the same structure as the Configuration second argument of aquius.init(), described earlier. Keys id, dataset, network and translation are ignored, all in their own way redundant. It is recommended to set a sensible initial User State for the map (map view, here click), but this key can also be used to apply Styling (color, scale), or even control the User Interface or set alternative base mapping. The option key only takes precedence over Aquius' defaults, not over valid hash or Configuration options.

Reference

An optional key reference may contain an Object, itself containing keys whose value is an Array of verbose recurrent data. Specific values within the dataset may reference this recurrent data by index position, which avoids repeating identical data throughout the dataset and thus reduces filesize. Possible reference keys are:

  • color - Array of string HTML color codes. Currently only 6-hexadecimal styles are supported, including the leading hash, for example #1e00ff.
  • product - Array of Object translations of product names, for example {"en-US":"English Name"}, whose index position corresponds to product ID.
  • url - Array of string URLs. URLs may contain one or more expression [*] which will be automatically replaced by a link/node specific identifier, such as the route number.

Network

Each link (detailed below) is categorised with an integer product ID. The definition of a product is flexible: The network might be organised by different brands, operators, or vehicles. One or more products ID(s) are grouped into network filters, each network filter becoming an option for the user. Products can be added to more than one network filter, and there is no limit on the total number of filters, beyond practical usability: An interface with a hundred network filters would be hard to both digest and navigate.

The dataset's network key consists of an Array of network filters, in the order they are to be presented in the User Interface. This order should be kept constant once the dataset is released, since each network filter is referenced in hashable options by its index in the Array. Each network filter itself consists of an Array of two parts:

  1. Array containing integer product IDs of those products that make up the network filter.
  2. Object containing key:text, where locale is the key, and text is a string containing the translated network filter name.
  3. Object containing optional properties, reserved for future use.
"network": [
  [
    [1, 2, 3],
    {"en-US": "All 3 products"},
    {}
  ],
  [
    [1, 3],
    {"en-US": "Just 1 and 3"},
    {}
  ]
]

Service

Each link (detailed below) is categorised with one or more counts of the number of services (typically vehicle journeys) associated with the link. The precise variable is flexible - for example, it could be used to indicate total vehicle capacity - but ensure the link key in translation contains an appropriate description.

Networks may be adequately described with just one service count per link. However service allows the same link to described for different time periods - for example, 2 journeys in the morning and 3 in the afternoon - especially important to differentiate services that are not provided at marginal times, such as evenings or weekends. The service key defines those time periods as service filters. Like network, each filter consist of a 3-part Array:

  1. Array of the index positions within each link service array that are to be summed to produce the total service count.
  2. Object containing the localised description of the filter.
  3. Object containing optional properties, reserved for future use.

In the example below, the corresponding link service array would consist of an Array of two numbers, the first morning, and second afternoon. The first filter would sum both, while the second "morning" filter would take only the first service count, the third "afternoon" filter only the second count.

"service": [
  [
    [0, 1],
      // Index positions in link service array
    {"en-US": "All day"},
    {}
  ],
  [
    [0],
    {"en-US": "Morning"},
    {}
  ],
  [
    [1],
    {"en-US": "Afternoon"},
    {}
  ]
]

Caution: Providing more that one index position in link service array should only be done where the values sum without invalidating the service count metric used. For example, if the metric used is "services per day", summing two days together will falsely double the service level "per day". Aquius does not know how the original metric was constructed, so cannot make logical assumptions such as averaging instead of summing.

Link

The link key contains an Array of lines of link data. Each line of link data is defined as a route upon which the entire service has identical stopping points and identical product. On some networks, every daily service will become a separate line of link data, on others almost the whole day's service can be attached to a single line of link data. Each line of link data is itself an Array consisting 4 parts:

  1. Product networks (Array of integer) - list of the Product IDs associated with this link, as described in Network above.
  2. Service levels (Array of float) - where each value in the array contains a count indicative of service level, such as total vehicle journeys operated (the sum of both directions, unless assigned the property direction below). To allow filtering, the position in the array must correspond to a position defined in the service key.
  3. Nodes served (Array of integer) - the Node ID of each point the services stops to serve passengers, in order. Routes are presumed to also operate in the reverse direction, but, as described below, the route can be define as one direction only, in which case the start point is only the first point in the Array. Node IDs reference an index position in node, and if the link is populated with data, so must node (and in turn place).
  4. Properties (Object) - an extendable area for keys containing optional data or indicating special processing conditions. In many cases this will be empty, vis: {}. Optional keys are described below (short forms are recommended to reduce filesize):
  • block or b - integer ID unique to a group of links which are actually provided by exactly the same vehicle journey(s), but where each link contains different properties. The ID has no meaning beyond uniquely defining the group. A block simply prevents the service total assigned to the group being counted more than once. If the properties of the block are the same, with the product and potentially the nodes differing, define a shared instead (which is generally simpler to manage and faster to process). Unlike shared, a block must contain links with the same service total, has no defined parent, can define groups of links which have the same product, and specifically identifies the links it is common to. The block is primarily used for cabotage, where pickup and setdown restrictions vary depending on the passenger journey being undertaken, not the vehicle journey. Flixbus, for example, manage such restrictions by creating multiple copies of each vehicle journey, each copy with different pickup and setdown conditions.
  • circular or c - boolean true or integer 1 indicates operation is actually a continuous loop, where the start and end points are the same node. Only the nodes for one complete loop should be included - the notional start and end node thus included twice. Circular services are processed so that their duplicated start/end node is only counted once. Figure-of-eight loops are intentionally double-counted at the midpoint each service passes twice per journey, since such services may reasonably be considered to offer two completely different routes to passengers, however this does result in arithmetic quirks (as demonstrated by Madrid Atocha's C-7).
  • direction or d - boolean true or integer 1 indicates operation is only in the direction recorded, not also in the opposite direction. As noted under Known Issues, services that are both circular and directional will produce numeric quirks. Tip: Services that loop only at one end of a route ("lasso" routes) should be recorded as uni-directional with nodes on the common section recorded twice, once in each direction - not recorded as circular.
  • duration as m - Array containing numeric average total minutes per trip by service period (experimental, not used in Aquius user interface, works best with service period time ranges).
  • dwell as w - Array containing numeric average total minutes of dwell time in node order (experimental, not used in Aquius user interface).
  • pickup or u - Array containing integer Node IDs describing nodes on the service's route where passengers can only board (get on), not alight (get off), expressed relative to order of the Nodes served. For a link summarising both directions, a pickup condition automatically becomes a setdown condition when that node order is reversed. If pickup and setdown are not mirrorred thus, define two separate links, one in each direction.
  • reference or r - Array containing one or more Object of descriptive data associated with the routes within the link - for example, route headcodes or display colors. Possible keys and values are described below.
  • setdown or s - Array containing integer Node IDs describing nodes on the service's route where passengers can only alight (get off), not board (get on), expressed relative to order of the Nodes served. For a link summarising both directions, a setdown condition automatically becomes a pickup condition when that node order is reversed. If setdown and pickup are not mirrorred thus, define two separate links, one in each direction.
  • shared or h - integer Product ID of the parent service. Shared allows an existing parent service to be assigned an additional child service of a different product category. The specific parent service is not specifically identified, only its product. Over common sections of route, only the parent will be processed and shown, however if the network is filtered to exclude the parent, the child is processed. The parent service should be defined as the longer of the two routes, such that the parent includes all the stops of the child. Shared was originally required to describe Renfe's practice of selling (state supported) local journey fare products on sections of (theoretically commercial) long distance services, but such operations are also common in aviation, where a single flight may carry seats sold by more than one airline.
  • split or t - Array containing integer Node IDs describing the unique nodes on the service's route. Split is assigned to one half of a service operated as two portions attached together over a common part of route. Splits can be affected at either or both ends of the route. In theory more than two portions can be handled by assigning a split to every portion except the first. Like shared services, and companion service is not specifically identified, however a split should be of the same Product ID as its companion service (else to avoid miscalculations network needs to be constructed so that both Product IDs fall into the same categories). Railway services south of London were built on this style of operation, while operators such as Renfe only split trains operated on very long distance routes.

Possible keys within the reference Object, all optional, although n is strongly recommended:

  • c - reference.color index associated with the route line.
  • i - reference code used in URL (see u below).
  • n - short name or human-readable reference code.
  • t - reference.color index associated with the route text (must contrast against the color of the line).
  • u - reference.url index of link providing further human-readable information. The optional [*] in that URL will be replaced by i if available, else n, else removed.

References should be consistently presented across the dataset - for example always "L1", not also "l1" and "line one". References should also be unique within localities - for example, the dataset may contain several different services referenced "L1", but should they serve the same node they will be aggregated together as "L1".

Caution: Keys split, and perhaps shared, can be hacked to mimic guaranteed onward connections to key destinations, especially from isolated branch lines or feeder services. However this feature should not be abused to imply all possible interchanges, and it may be more sensible to let users discover for themselves the options available at the end of the line.

Caution: The use of keys containing lists of nodes (pickup, setdown and split) should be avoided where they contain a node that is duplicated within the link - for example, the start/end node of a circular - because Aquius will attempt to apply the criteria to each duplicate, not just one.

Tip: Links follow the node order regardless of any pickup and setdown restrictions, so by assigning a node as both pickup and setdown only (thus allowing neither boarding nor alighting) the route shape can be altered without adding a stop.

Node

The node key contains an Array of node (stop, station) information. Each node is referenced (within link) by its index position. The format is simple:

  1. Longitude (float) - "x" coordinate of the node in WGS 84.
  2. Latitude (float) - "y" coordinate of the node in WGS 84.
  3. Properties (Object) - an extendable area for keys related to this node. Minimum empty, vis: {}.

Optional properties keys are:

  • place or p - integer index position in place (described below) for the place that contains this node. Recommended.
  • reference or r - Array containing one or more Object of descriptive data associated with the stop or stops within the node - for example, names or URLs containing further information.

Possible reference keys and values are described below, all optional, although n is strongly recommended:

  • i - reference code used in URL (see u below).
  • n - short name or human-readable reference code.
  • u - reference.url index of link providing further human-readable information. The optional [*] in that URL will be replaced by i if available, else n, else removed.

Tip: To reduce dataset file size, restrict the accuracy of coordinates to only that needed - metres, not millmetres. Likewise, while URLs and detailed names may provide useful reference information, these can inflate file size dramatically when lengthy or when attached to every node or service.

Place

The place key has a similar structure to node above - each place an Array referenced by index position. Place can be an entirely empty Array, vis: []. Places are intended to quickly summarise local demographics - how many people are connected together. Places are assigned simply to nodes (see node above), so each node has just one demographic association. For example, the Spanish Railway dataset uses the census of local municipalities, since municipalities tend to self-define the concept of locality, with both cities and villages falling into single municipalities. It is not possible to change the population catchment for specific Products within the same network, so the dataset creator will need to find an acceptable compromise that represents the realistic catchment of a node. Aquius was originally intended simply to show the broad presence of people nearby. As is, if precise catchment is important, networks containing a mix of intra and inter-urban services may be best split into two completely separate datasets, to be shown in two separate instances of Aquius.

  1. Longitude (float) - "x" coordinate of the place in WGS 84.
  2. Latitude (float) - "y" coordinate of the place in WGS 84.
  3. Properties (Object) - an extendable area for keys related to this place. Minimum empty, vis: {}.

Optional properties keys are:

  • population or p - integer total resident population, or equivalent statistic (recommended).
  • reference or r - Array containing one or more Object of descriptive data associated with the place. The only supported key is n - short name or human-readable reference code.

Tip: For bespoke analysis, the population can be hacked for any geospatial data that sums.

License

Aquius, with no dataset, is freely reusable and modifiable under a MIT License.

Dataset copyright will differ, and no licensing guarantees can be given unless made explicit by all entities represented within the dataset. Be warned that no protection is afforded by the logical nonsense of a public transport operator attempting to deny the public dissemination of their public operations. Nor should government-owned companies or state concessionaires be naively presumed to operate in some kind of public domain. Railways, in particular, can accumulate all manner of arcane legislation and strategic national paranoias. In the era of Google many public transport operators have grown less controlling of their information channels, but some traditional entities, such as Renfe, are not yet beyond claiming basic observable information to be a trade secret. Your mileage may vary.

Contributing

Contributors are most welcome. Check the Known Issues before making suggestions. Try to establish a consensus before augmenting data structures.