Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 736 Bytes

README.md

File metadata and controls

10 lines (6 loc) · 736 Bytes

fake-data-generator

Automatically exported from code.google.com/p/fake-data-generator

Testing data mining is hard. If all you have are real-world data sets, then you don't actually know what you're looking for. If you did, you probably wouldn't be using a data mining algorithm.

This tool-set is designed to create data sets with known properties and relations of varying complexity. It produces a full listing of the model it used to generate a data set, so the results of using a machine learning algorithm to try to develop a model of the data can be compared to the real model, since a real model exists. Varying amounts of noise and bias can be introduced, to simulate real-world imperfection of data.

License

GNU GPL v2