PGN Helper is a collection of python scripts to easily download and process pgn files.
With the fetch_files_from_pgnmentor.py script you can easily bulk-download all the pgn files from one of the three categories in which the website pgnmentor.com organizes them: by player, by opening, or by event.
Given a folder ('pgn') of pgn files, the create_opening_book.py script creates a json file which represents the tree of all the first n moves played in every match of each pgn file by players which elo is greater than x, where n and x are parameters of the script.
For instance, here is how a json opening book generated with 2500+ players and 2 moves by the script looks like (# is the number of times the move has been played):
{
"e4": {
"#": 478088,
"e5": {
"#": 118698,
"Nf3": {
"#": 74146,
"Nf6": {
"#": 4778
},
"Nc6": {
"#": 31850
},
"d6": {
"#": 445
}
},
...
}
...
}
...
}
Remember that you can, and should, minify the json file to save some space!
I also made a few opening books, each one with 2500+ players but different number of moves, so that you can directly download them.
The only dependency is beautifulsoup4: pip install beautifulsoup4