Skip to content

Latest commit

 

History

History
139 lines (112 loc) · 13.9 KB

README.md

File metadata and controls

139 lines (112 loc) · 13.9 KB

ACAI Place Data

Initial Draft Edition, v1.00, 2024-10-08

Rick Brannan, rick.brannan@biblionexus.org, and Jessica Parks, jessica.parks@biblionexus.org, 2024-05-10

Context

BiblioNexus and Biblica are working together to create data representing explicit and implicit instances of people and places (and other things) in the Bible. This information also includes entity-level (for places, place-level) information about the place, such as a title/default form (and localizations), description (and localizations), and other data.

This information is being compiled as part of the ACAI project, the Aquifer Concept Architecture for Information, which itself is a part of the Bible Aquifer project.

Sources

  • Biblica/Clear-Bible's Macula Hebrew
  • Biblica/Clear-Bible's Macula Greek
  • Biblica/Clear-Bible's speaker-quotations, an attempt to identify the original language words, in both the Old and New Testaments, translated as quotations (material using "double" and 'single' quotation marks) in various English Bibles. It also attempts to associate speakers with the quotations, where possible, using data from Faith Comes By Hearing.
  • United Bible Societies' UBS Dictionary of Biblical Hebrew (UBSDBH) and UBS Dictionary of the Greek New Testament (UBSDGNT). See SemanticDictionary.org for an implementation and the git repo ubs-open-license for data (English, French, Spanish, and Chinese). Macula Hebrew and Greek encode domains and references from these resources at the word level for most OT and NT words.
  • openbible.info Bible Geocoding Data
  • Robert Rouse's theographic-bible-metadata (aka viz.bible)
  • STEPBible TIPNR
  • Copenhagen Alliance versification-specification. Bible references within this data reflect the 'ORG' scheme specified by the Copenhagen Alliance. This means that Old Testament references assume the versification structure of the Hebrew Bible, and New Testament references assume the structure of the Greek New Testament. For use with translations, the references may need to be be converted to the Copenhagen Alliance 'ENG' scheme. The repo cited above has information and sample code for how to achieve that; if assistance is needed please contact us.

Each of these sources are available as CC-BY-4.0 or CC-BY-SA-4.0 licensed data.

In particular, the definitions provided by the UBS Dictionaries supply a decent starting point and several of our English place descriptions are directly inspired by these definitions. We owe much to the UBS team (and Reinier DuBlois in particular) for their work and for their licensing of the material under a CC-BY-SA-4.0 license.

In addition to these sources, BiblioNexus have done a significant amount of curation and supplementation in order to account for and model the data according to the needs of the ACAI project.

Aggregation

This process started with Biblica's Macula Hebrew and Macula Greek data, which has word-level semantic domain annotations from the UBS Dictionary of Biblical Hebrew (UBSDBH) and UBS Dictionary of Biblical Greek (UBSDBG). This was used in an initial pass to identify explicitly named people and places.

We next processed data from openbible.info, viz.bible, and STEPBible's TIPNR data to prepare place data to be integrated into one cohesive set.

Upon processing these datasets, we realized that STEPBible's TIPNR data did the best job of the lot of grouping together different methods of referring to the same location. So it made sense to begin with the TIPNR data as a basic representation, incorporate the semantic, instance, and referent information aggregated from the UBS Dictionaries and from the Macula datasets, and then fold in data from openbible.info and viz.bible.

We identified similar entities from the different datasets for merging through a comparison of labeling and known references. We then merged this data together with data that modeled relationships between places (and, importantly, curated descriptions of the locations) that had been on a separate development/curation track. In that merge, we considered this data primary and at times had to re-arrange our snapshot of the TIPNR data to support the merge.

Status / Completeness of Project

While we believe we've identified places explicitly named in the Bible and associated references with them, there are some aspects of this data that are not complete.

Areas Still Under Development

  • Subregions: A significant amount of work has been done with subregions, but there is still more to do.
  • Nearby Places: What exists is fairly subjective, we plan on using coordinate data to help identify nearby places and then curate that information to relevant nearby places.

The work on Subregions and Nearby Places is incomplete and left as-is for the v1.0 release. We may decide to pursue completing this work in the future, and we may not.

Word-level references, particularly within the Hebrew Bible, may also have omissions. The UBS Dictionary of the Hebrew Bible remains a work in progress, and not all word tokens of the Hebrew Bible have been analyzed. Also, the Macula analysis of the Hebrew Bible is not as developed as its Greek New Testament counterpart, and the referent data information can only be considered to be an initial draft.

JSON Schema Documentation

This documentation represents the current (as of 2024-01-22) schema and is based on the Python dataclasses used while processing the place-specific data. We will do our best to keep this up to date, but there may be discrepancies.

AcaiPlaceEntry

propertytypedescription
idstringa string representing the unique identifier of this place (e.g. `place:Rome`)
primary_idstringa place that has multiple representations has a primary entry identified by this string. The primary entry is identified where `primary_id` == `id`.
alternate_sourcesAlternateSources dataclassa dataclass that allows each possible alternate source to provide some information
typestringfor places, `place` is the only valid type
ubsdbglista list of domain.article (##.##) strings representing the UBSDBG annotation
ubsdbhlista list of strings representing the article identifier(s) from UBSDBH
localizationsdict[str, dict[str, list]]an object for collecting strings and other structures by language for localization purposes. This is where `preferred_label`, `alternate_labels`, and `description` are collected.
referred_to_aslista list of `id` for places/locations that are considered functionally equivalent
possibly_same_aslista list of `id` for places/locations that may be the same location. This is less sure that `referred_to_as`.
tribal_areastringthe `id` of ACAI group representing the tribe/nationality associated with the location
associated_placeslista list of `id` for places/locations that are associated for some reason. An example is Sodom and Gomorrah.
nearby_placeslista list of `id` for places/locations that are nearby (as determined by human curation, not by geographic comparison).
subregion_oflista list of `id` for places/locations that the current location is to be considered a subregion of.
mentioned_in_bibleboolIf `TRUE` this place is mentioned in the Bible (note: "Bible" includes the canon of the protestant edition of the NRSV with apocrypha).
only_mentioned_in_apocryphaboolIf `TRUE` this place is only mentioned in the apocryphal portions of the Bible (note: here "apocrypha" are the apocryphal books of the protestant edition of the NRSV with apocyrpha).
non_biblicalboolIf `TRUE` there is no mention whatever of this location in the Bible. These are included to provide context and typically also include `geocoordinates` data.
is_artifactboolThe identification is restricted to a human-created structure, such as a gate or pillar.
is_personboolSome locations are innately associated with people.
added_entryboolUsed only during creation of data, not relevant for any future use.
lemmasdict[str, list]The key is the language (`el`, `he`, or `arc`) with lemmas as values in the list.
place_typesdict[str, list]The key is the place-type scheme (`obi`, `vizbible`, `acai`, etc.) with values. Note that acai is to be considered the default. These values, along with descriptions and hierarchy, are available here.
geocoordinatesdict[str, dict[str, str]]A dictionary with various sources of latitude and longitude (point-based) for the location. The default (preferred) source is biblica-maps.
key_referenceslistA list of BCV8 style references, where eight digits encode the zero-padded book (two digits), chapter (three digits), and verse (three digits) of the reference: `BBCCCVVV`. Note these references reflect the versification of the Hebrew Bible and Greek New Testament via the Copenhagen Alliance 'ORG' scheme.
referenceslistA list of BCV8 style references, where eight digits encode the zero-padded book (two digits), chapter (three digits), and verse (three digits) of the reference: `BBCCCVVV`. Note these references reflect the versification of the Hebrew Bible and Greek New Testament via the Copenhagen Alliance 'ORG' scheme.
explicit_instancesdict[str, list]initial key is edition (SBLGNT, WLC), list is corpus-specific word references where a 13 digit string encodes the reference to the word part position: `[on]BBCCCVVVWWWP`. Note these references reflect the versification of the Hebrew Bible and Greek New Testament via the Copenhagen Alliance 'ORG' scheme.
pronominal_referentsdict[str, list]initial key is edition (SBLGNT, WLC), list is corpus-specific word references where a 13 digit string encodes the reference to the word part position: `[on]BBCCCVVVWWWP`. Note these references reflect the versification of the Hebrew Bible and Greek New Testament via the Copenhagen Alliance 'ORG' scheme.
subject_referentsdict[str, list]initial key is edition (SBLGNT, WLC), list is corpus-specific word references where a 13 digit string encodes the reference to the word part position: `[on]BBCCCVVVWWWP`. Note these references reflect the versification of the Hebrew Bible and Greek New Testament via the Copenhagen Alliance 'ORG' scheme.
speechesdict[str, list[dict[str, obj]]]initial key is edition (SBLGNT, WLC), list is word-level information regarding reported speech the entity is responsible for. Each speech has a `quote_type` property (usually `Normal` but sometimes `Questioning` or other contextually important data) as well as `words`;a list of corpus-specific word references where a 13 digit string encodes the reference to the word part position: `[on]BBCCCVVVWWWP`. Note these references reflect the versification of the Hebrew Bible and Greek New Testament via the Copenhagen Alliance 'ORG' scheme.

AlternateSources

propertytypedescription
aquiferlistA pipe-delimited string (id|name|resource) derived from aquifer.bible
obilistInformation derived from openbible.info
ubsdbhlistInformation derived from the UBS Dictionary of Biblical Hebrew
ubsdbglistInformation derived from the UBS Dictionary of New Testament Greek
digital_atlas_roman_empirelistRelevant URL and ID from the Digital Atlas of the Roman Empire, extracted from OpenBible.info data
pleiadeslistRelevant URL and ID from the Pleiades Project, extracted from OpenBible.info data
tipnrlistRelevant ID from Tyndale House's StepBible project, Translators Individualized Proper Names with all References (TIPNR), extracted from OpenBible.info data
wikidatalistRelevant ID from wikidata, extracted from OpenBible.info data
wikipedialistRelevant ID from wikidata, extracted from OpenBible.info data