Skip to content

Latest commit

 

History

History

romances19

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Collection of 19th Century Portuguese Novels (1840-1910)

Contents

Currently, this collection contains 30 19th century novels from 14 Portuguese male authors. All in all, the texts amount to 2.5 million words.

The collection has been created by Ulrike Henny-Krahmer in the context of the young research group for Computational Literary Genres Stylistics (CLiGS) at the University of Würzburg in Germany.

Up to this point, the collection is not balanced with regards to authors, decades or subgenres. It was initially created to compare historical and non-historical novels, but has been expanded since then. It is meant as a small collection for use in different kinds of experiments (for example authorship attribution or topic modeling), so the selection of texts might have to be adjusted to match the use case. The indication of subgenres has been made tentatively and does not conform to a strict typology. Where the novel is primarily associated with a certain literary current (e.g. realist or expressionistic), this information has been given instead of a genre label.

See the metadata.csv file for basic information about the novels. The following tables give overviews of the corpus' characteristics:

Number of texts and words by decade:

decade number of texts (words)
1840s texts 3 (160k)
1850s texts 1 (197k)
1860s texts 6 (462k)
1870s texts 8 (510k)
1880s texts 2 (279k)
1890s texts 3 (186k)
1900s texts 6 (646k)
1910s texts 1 (66k)
total 30 (2,506k)

Number of texts and words by subgenre:

subgenre number of texts (words)
autobiographical 1 (197k)
crime 1 (67k)
expressionistic 2 (107k)
historical 10 (797k)
political 1 (107k)
realist 2 (279k)
sentimental 7 (528k)
social 5 (388k)
travel 1 (38k)
total 30 (2,506k)

Number of texts and words by author:

author number of texts (words)
Botelho, Abel 2 (241k)
Braga, Teófilo 1 (61k)
Brandão, Zeferino 4 (187k)
Campos, Alfredo 1 (28k)
Campos Júnior, Antonio 1 (234k)
Castelo Branco, Camilo 7 (591k)
Dinis, Júlio 2 (220k)
Garrett, Almeida 1 (38k)
Herculano, Alexandre 2 (121k)
Macedo, Diogo de 1 (244k)
Mascarenhas, Miguel 1 (59k)
Pimentel, Alberto 1 (57k)
Queirós, Eça de 4 (518k)
Queirós, Eça de & Ortigão, Ramalho 1 (67k)
Rodrigues, Manuel Maria 1 (60k)
total 30 (2,506k)

Sources

The texts have been compiled from the following sources:

Formats

  • tei: Encoded following the Guidelines of the Text Encoding Initiative and valid against the CLiGS schema (File names are given according to the following schema: identifier.xml)
  • txt_id: Simple plain text containing only the main text of the novels, without title pages, prefaces and other parts considered as paratexts (File names: identifier.txt)
  • annotated: TEI files further annotated with FreeLing and WordNet
  • pdf: Reading versions generated from the tei files

Schema

  • The TEI schema for the basic and the linguistically annotated TEI files corresponds to the general CLiGS schema which is available in the CLiGS reference repository.
  • The metadata keywords used in the text classification section of the TEI header are controlled by an external TEI keywords file and a schematron file which are stored in the keywords folder.

Data Curation

  • In some of the texts, the spelling is close to the spelling in the historical editions of the novels. A note on spelling is given in the source description where this is the case.

Copyright and Citation

  • The authors' copyright of the texts has already expired. This collection is published under a Creative Commons Attribution-ShareAlike 4.0 International licence (CC BY-SA 4.0).
  • Please provide a reference if you use this data in your teaching or research. The following is a citation suggestion: Collection of 19th Century Portuguese Novels (1840-1910), edited by Ulrike Henny-Krahmer. Würzburg: CLiGS, 2017. https://github.com/cligs/textbox/master/portuguese/romances19/.