This is a very simple implementation of the initial filtering solution I described on VNDB discussion board. It involves three parts (not exactly the same as below due to technical limitations):
- compare VNDB release
extlink
(Steam) with CnGalSteamId
, and pick up those CnGal entries without any matched VNDB release (missing Steam release or not released on Steam at all) - compare VNDB
alttitle
with CnGalname
, and again pick up CnGal entries - compare release date (!), this could be wrong due to a bug on CnGal side, but the number of VNs you need to check would be significantly smaller I guess
zh-rel-on-vndb.py
: filter zh-Hans & zh-Hant releases on VNDB whose parent VN has an original Chinese language.cngal-data-format.py
: make exported CnGal entries match the format of VNDB one. Exported JSON from CnGal data page is needed.diff-cngal-vndb.py
: compare CnGal data w/ VNDB existing Chinese VN, and divides the results for future proofing.
Install Python & GNU Make, clone the repo and simply run make
.
Everything should be done now. Just check output
for the results.
To clean up the data and restart, run make clean
.
pip install -r requirements.txt
# Format CnGal data
python cngal-data-format.py
# Get VNDB data
# Get only Steam releases
python zh-rel-on-vndb.py -p 7 -s 1
# Get every zh-Hans & zh-Hant releases
python zh-rel-on-vndb.py -p 14 -s 0
# Diff CnGal & VNDB data
# Perform a fuzzy comparison
python diff-cngal-vndb.py -m 75 -n 50
cngal-releas-*
: formatted CnGal datavndb-release-*
: formatted VNDB datamiss-*
: missing CnGal entries on VNDB, add these firstfuzzy-*
: possibly missing CnGal entries on VNDB, check these later onmatch-*
: existing CnGal entries on VNDB, verify these at last
- Add glob support in
cngal-data-format.py
- Make Steam filter optional in
zh-rel-on-vndb.py
for better fuzzy finding - Add Makefile
- Use VNDB database dump for more complete matching (or update filters wisely)
- Sort fuzzy output descended by similarity
- Make metadata more informative
- Support more metadata like producer, staff & character
Data from CnGal and VNDB has their respective licenses applied, you need to search it on the website or source code repository. The scripts included in this repo is licensed under MIT.