Calling variations in aligned nucleotide sequences is a critical component genotyping experiments, and while many tools exist for this purpose, the outputs they yield are highly suspect due to a lack of an established baseline with which those results can be compared to provide a context for interpretation. We propose to establish such a baseline for commonly employed alignment and variant calling tools and then design an automated framework that permits researchers to actively and visually compare and validate the results obtained from those tools. The framework will provide a plug-and-play environment for the selection of bench-marked tools so that researchers can confidently assess the effect of tool choice on their results.