Skip to content

digitalheir/family-names-in-the-netherlands

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Family names in the Netherlands

This project exists to make structured, machine readable data out of the Meertens Dutch Family Name Database, which contains about 320 000 last names that were recorded in a 2007 census as well as in a 1947 census.

Usage

A basic list is available in multiple formats:

Format
CSV
JSON
Fusion table
.lst (Alphabetic list of natural names)
.lst (names with frequency >= 5)
.lst (names with frequency < 5)

Top 50 last names in the Netherlands

no. name count in 2007
1 de Jong 83782
2 Jansen 73533
3 de Vries 71065
4 van den Berg 57377
5 van Dijk 56408
6 Bakker 55273
7 Janssen 54040
8 Visser 49525
9 Smit 42275
10 Meijer 38472
11 de Boer 38191
12 Mulder 36207
13 de Groot 36032
14 Bos 35402
15 Vos 30279
16 Peters 30106
17 Hendriks 29492
18 Dekker 27946
19 van Leeuwen 27819
20 Brouwer 25419
21 de Wit 24055
22 Dijkstra 23510
23 Smits 23205
24 de Graaf 21004
25 van der Meer 20591
26 Kok 20325
27 Jacobs 20148
28 van der Linden 20132
29 Vermeulen 20110
30 de Haan 20011
31 van den Heuvel 19899
32 van den Broek 18447
33 van der Veen 18366
34 de Bruin 17593
35 Schouten 17147
36 van Beek 16708
37 van der Heijden 16663
38 de Bruijn 16562
39 Willems 16508
40 van Vliet 16346
41 Maas 15620
42 Hoekstra 15613
43 Verhoeven 15525
44 Koster 15346
45 van Dam 15288
46 Prins 14894
47 Huisman 14682
48 Blom 14679
49 Peeters 14054
50 de Jonge 13989

Procedure

We scrape the Meertens website to generate a CSV file that contains: the family name, the number of times it was counted in 2007, and name as lemma (meaning the 'base' version names that has multiple variants, e.g. Jansen is the lemma for both Janßen and Jansen).

We then apply some formatting. Mainly, we add a column for the 'natural name', which transforms a name such as Veld, in 't to in 't Veld.

Note that the list still contains some strange cases, such as in 'tVeld (missing space) and van 0s (with the number 0 instead of the letter O), but we do not attempt to correct these.

Example

Below are some (a-)typical examples or rows you'd find in the family_names_in_the_netherlands_with_natural_name.csv file.

natural name meertens db name href count in 2007 lemma
Jansen Jansen link 73.533 Jansen
Janßen Janßen link < 5 Jansen
Trompeter Trompeter link 0 Trompeter
Trompetter Trompetter link 457 Trompetter
Trompper Trompper link 37 Trompper
Trompslager Trompslager link 0 Trompslager
Trompé Trompé link 6 Trompe (é)
Van 't Veld Veld, Van 't link < 5 Veld, van 't
van 't Veld Veld, van 't link 431 Veld, van 't
van 't Veldt Veldt, van 't link 37 Veldt, van 't
van 't Velt Velt, van 't link 17 Velt, van 't
van 't Oosten, zich noemende Heijkoop Oosten, zich noemende Heijkoop, van 't link 6
Prinses der Nederlan Hare Koninklijke Hoogheid Máxima Hare Koninklijke Hoogheid Máxima, Prinses der Nederlan link 0 Hare Koninklijke Hoogheid Máxima, Prinses der Nederlanden, Prinses van Oranje-Nassau, Mevrouw van Amsberg

License

Code is available under MIT License

Data is available under CC-0 License

More information

Source URL
Dutch name (Wikipedia) https://en.wikipedia.org/wiki/Dutch_name
Meertens Dutch Family Name Database http://www.meertens.knaw.nl/nfb/

Contact

Maarten Trompper (maartentrompper@freedom.nl)

Releases

No releases published

Packages

No packages published