Skip to content

Commit

Permalink
fully functional; docs and tutorials may be expanded; not yet archive…
Browse files Browse the repository at this point in the history
…d in zenodo and swh
  • Loading branch information
dirkroorda committed Jan 12, 2023
1 parent 6549d3a commit 4226952
Show file tree
Hide file tree
Showing 65 changed files with 82,255 additions and 54,620 deletions.
61 changes: 61 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,64 @@
# descartes-tf

Letters of Descartes in Text-Fabric with math display.
[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)

![descartes](docs/images/logo.png)

# René Descartes - Brieven

In this repository we prepare the letters of
[Descartes](https://en.wikipedia.org/wiki/René_Descartes)
for the application of data science.

The source files are provided by the Huygens Institute, as the result of the CKCC project which was completed
in 2012.

From there we converted it to a
[Text-Fabric](https://github.com/annotation/text-fabric)
representation.

The result can be readily loaded into Python programs for further processing.

See [about](about.md) for the provenance of the data.

See [transcription](transcription.md) for how the resulting data is modelled.

## How to use

### Having Text-Fabric installed

This data can be processed by
[Text-Fabric](https://annotation.github.io/text-fabric/tf).

Text-Fabric will automatically download the corpus data.

After [installing Text-Fabric](https://annotation.github.io/text-fabric/tf/about/install.html),
you can start the Text-Fabric browser by this command

```sh
text-fabric CLARIAH/descartes-tei
```

Alternatively, you can work in a Jupyter notebook and say

```python
from tf.app import use

A = use('CLARIAH/descartes-tei')
```

In both cases the data is downloaded and ends up in your home directory,
under `text-fabric-data`.

See also
[start](https://nbviewer.jupyter.org/github/CLARIAH/descartes-tei/blob/master/tutorial/start.ipynb)
and
[search](https://nbviewer.jupyter.org/github/CLARIAH/descartes-tei/blob/master/tutorial/search.ipynb).

# Author

See [about](about.md) for the authors/editors of the data.

[Dirk Roorda](https://github.com/dirkroorda) is the author of the representation in Text-Fabric of the data,
and the tutorials and documentation.
25 changes: 24 additions & 1 deletion app/app.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,40 @@
import types
from tf.advanced.find import loadModule
from tf.advanced.app import App


MODIFIERS = "italic margin sub sup".strip().split()


def fmt_layoutOrig(app, n, **kwargs):
return app._wrapHtml(n, None)


class TfApp(App):
def __init__(app, *args, silent=False, **kwargs):
app.fmt_layoutOrig = types.MethodType(fmt_layoutOrig, app)

super().__init__(*args, silent=silent, **kwargs)

app.image = loadModule("image", *args)

app.image.getImagery(app, silent, checkout=kwargs.get("checkout", ""))

app.reinit()

# PRETTY HELPERS
# FORMAT suppport

def _wrapHtml(app, n, kind):
api = app.api
F = api.F
Fs = api.Fs
trans = F.trans.v(n) or ""
punc = F.punc.v(n) or ""
material = f"{trans}{punc}"
clses = " ".join(cf for cf in MODIFIERS if Fs(f"is{cf}").v(n))
return f'<span class="{clses}">{material}</span>' if clses else f"{material}"

# GRAPHICS Support

def getGraphics(app, isPretty, n, nType, outer):
result = ""
Expand Down
31 changes: 27 additions & 4 deletions app/config.yaml
Original file line number Diff line number Diff line change
@@ -1,23 +1,46 @@
apiVersion: 3
dataDisplay:
exampleSectionHtml: <code>letter 1:1001</code>
textFormats:
layout-orig-full:
method: layoutOrig
docs:
docPage: about
featureBase: 'https://github.com/{org}/{repo}/blob/master/docs/transcription{docExt}'
featurePage: ''
interfaceDefaults:
showGraphics: true
showMath: true
standardFeatures: false
withLabels: true
provenanceSpec:
corpus: Descartes = Descartes, all letters
graphicsRelative: source/illustrations
version: 0.9
version: 1.0
webBase: http://emlo-portal.bodleian.ox.ac.uk/collections/?catalogue=rene-descartes
webHint: See how this corpus is included in the Bodleian catalog
moduleSpecs:
- corpus: Similar Sentences
relative: parallels/tf
typeDisplay:
volume:
featuresBare: n
label: '{n}'
template: 'vol. {n}'
page:
label: '{n}'
template: 'p. {n}'
letter:
featuresBare: id
label: '{id} {date} from {sender} to {recipient}'
template: '{id} {date} from {sender} to {recipient}'
features: senderloc recipientloc
p:
featuresBare: n
label: '{n}'
sentence:
label: '{n}'
condense: true
figure:
label: '{url}'
graphics: true
formula:
label: '{notation}'
features: tex
17 changes: 17 additions & 0 deletions app/static/display.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.italic {
font-style: italic;
}
.margin {
position: relative;
top: -0.3em;
font-weight: bold;
color: #0000ee;
}
.sub {
vertical-align: sub;
font-size: small;
}
.sup {
vertical-align: super;
font-size: small;
}
49 changes: 33 additions & 16 deletions docs/transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,24 @@ postscriptum of the letter.

Letters may contain illustrations, symbols, and mathematical formulas.

### Sentences

We have added the concept of sentence.
A sentence is a piece of text within a paragraph that is
terminated by a `.` .

Not all `.`s act as sentence terminator, though, e.g. in
`Kal. Aprilis` it marks an abbreviation.

We have tried to exclude most of these cases.

The purpose of adding sentences was to have a convenient
division within paragraphs. This division can be used to
display manageable chunks of the corpus.

It can also be used to detect parallel passages, i.e. pieces
where W.F. Hermans repeats himself.

## Text-Fabric model

The Text-Fabric model views the text as a series of atomic units, called
Expand All @@ -30,24 +48,11 @@ The Text-Fabric model views the text as a series of atomic units, called
On top of that, more complex textual objects can be represented as *nodes*. In
this corpus we have node types for:

volume 9 75811.11 100
letter 725 941.10 100
page 2884 236.58 100
p 8438 80.86 100
postscriptum 56 46.79 0
head 725 23.37 2
address 86 15.22 0
closer 541 13.10 1
hi 5972 4.63 4
opener 545 1.97 0
formula 6200 1.27 1
figure 319 1.00 0
word 682300 1.00 100

[*word*](#node-type-word),
[*hi*](#node-type-hi),
[*figure*](#node-type-figure),
[*formula*](#node-type-formula),
[*sentence*](#node-type-sentence),
[*head*](#node-type-head),
[*opener*](#node-type-opener),
[*closer*](#node-type-closer),
Expand Down Expand Up @@ -109,7 +114,7 @@ feature | values | description
**ismargin** | `1` | indicates the word is in the margin
**issub** | `1` | indicates the word is in subscript
**issup** | `1` | indicates the word is in superscript
**typ** | `empty` | indicates the kind of word
**typ** | `empty` `formula` | indicates the kind of word

* **typ** = `empty`: deliberately empty word, i.e. **trans** is empty or absent,
however, **punc** may contain something, typically a space
Expand Down Expand Up @@ -147,7 +152,18 @@ This gives you the opportunity to view the source code of formulas.

feature | values | description
------- | ------ | ------
**notation** | `A\over B` | TeX source code of a formula
**notation** | `TeX` | notation method of the formula
**tex** | `A\over B` | TeX source code of a formula

## Node type [*sentence*](#sentence)

Sentence, i.e. a part in a paragraph terminated by a full stop.
`.` that are used for other purposes do not count as a full stop,
e.g. in abbreviations and numbers.

feature | values | description
------- | ------ | ------
**n** | `1` `2` | sequence number of a sentence within the paragraph.

## Node type [*head*](#head)

Expand Down Expand Up @@ -230,6 +246,7 @@ The following text formats are defined (you can also list them with `T.formats`)
format | description
--- | ---
`text-orig-full` | the full text of all words
`layout-orig-full` | the full text of all words, with special formatting indicating special characteristics of the text.

The formats with `text` result in strings that are plain text, without additional formatting.

Expand Down
Loading

0 comments on commit 4226952

Please sign in to comment.