a set of scripts for generating various linguistic datasets from a wikipedia dump built for testing nlp-compromise and wtf_wikipedia, but helpful for any sort of analyses. MIT