This repo is about the digital processing of the transliterations of proto-cuneiform tablets from the Uruk IV-III periods.
Cuneiform tablets have been photographed, drawn as lineart, and transliterated in ATF files, in which the marks on a tablet are represented by ASCII characters.
While the ATF descriptions preserve an awesome amount of precise information about the marks that are visible in the clay and their spatial structure, it is not easy to process that information. Simple things are hard: counting, aggregating, let alone higher level tasks such as clustering, colocation, and other statistical operations.
That is why we have converted the transliterations to an other format, Text-Fabric which is optimized for processing, adding data and sharing it.
We also have drawn in photos and lineart, which can be used while computing, especially when done in Jupyter notebooks. Done this way, computer analysis turns into rich computational narratives.
We have chosen the Uruk-IV/III periods (4000-3100 BC) as a starting corpus for testing our approach. This is proto-cuneiform corpus of ca. 6000 tablets.
The second corpus we have brought into Text-Fabric is the Old Babylonian Letters.
We have downloaded transliterations and images from the Cuneiform Digital Library Initiative CDLI. They have a rich source of data, available to the public, visible on their website, and large portions are conveniently downloadable. We are indebted to the creators and maintainers of the CDLI website.
On the search page we entered under
Chronology - period: Uruk IV
and Uruk III
respectively. On the results
page, we have chosen Download all text
. Below we list the download
links per corpus.
In this repo we convert the following corpora to Text-Fabric:
- Uruk IV - 1861 texts
- Uruk III - 4882 texts
Note that these "corpora" are merely the results of a query by period. They are not corpora in the sense of an identified body of texts in which each individual text occupies a fixed position in the sequence.
The downloaded files contain metadata and transliterations. We have extracted the transliterations to separate files. We only use the excavation number from the metadata.
We have a specification of the transcription format and how we model the text in Text-Fabric.
We have checked the conversion from the ATF transliterations to Text-Fabric extensively. Cruelly, you might say. An account of the checking that we performed is in the checks notebook.
We have obtained three image sets from CDLI:
- photos of tablets;
- lineart images of tablets;
- lineart images of ideographs;
For details, see images.