Skip to content

Incremental classification and regression tree learning algorithm.

License

Notifications You must be signed in to change notification settings

lucentcosmos/HoeffdingTree

Repository files navigation

HoeffdingTree

Implementation of algorithms for incremental classification [1,2] and regression [3,6] tree learning from time-changing data streams. See [8] for detailed description of the main ideas behind these algorithms.

As of this writing this is still work in progress. The plan is to integrate this into QMiner soon. (Note that this code requires GLib.)

Note: If you'll try to build this, note that GLib uses v0.10 version of libuv; so run git clone -b v0.10 https://github.com/joyent/libuv.git in case you need to build it

Note: We started integrating this code into QMiner.

Note: I will no longer maintain this repository. As of now the learner will be availble through QMiner. I might clean up the code so others can use it, but I believe it will be much, much nicer when exposed through QMiner Javascript API. It will be easy to use, assume zero C++ knowledge, require no recompilation, etc. See QMiner documentation for details.

Note: Check out incremental classification and regression tree learning in QMiner here. User code is in src/ht.js. (This is a working version on my fork. For the stable one go here.)

Simple usage example

Configuration file describes data stream. Below is a simple config example.

dataFormat: (status,age,sex,survived)
status: discrete(first,second,third,crew)
age: discrete(adult,child)
sex: discrete(male,female)
survived: discrete(yes,no)

Future version of the HoeffdingTree --- to be available in QMiner --- will accept JSON configuration.

{
	"dataFormat": ["status", "sex", "age", "survived"],
	"sex": {
		"type": "discrete",
		"values": ["male", "female"]
	},
	"status": {
		"type": "discrete",
		"values": ["first", "second", "third", "crew"]
	},
	"age": {
		"type": "discete",
		"values": ["child", "adult"]
	},
	"survived": {
		"type": "discrete",
		"values": ["yes", "no"]
	}
}

The goal is to expose it through QMiner Javascript API. (This will happen the next few days.)

Simple usage example.

// ... 
int main(int argc, char** argv) {
	// create environment
	Env = TEnv(argc, argv, TNotify::StdNotify);
	// pass some parameters 
	const bool ConceptDriftP = Env.IsArgStr("drift"); // use version that handles concept-drift? 
	const double SplitConfidence = Env.GetIfArgPrefixFlt("-splitConfidence:", 1e-6, "Split confidence"); // 1e-6 
	const double TieBreaking = Env.GetIfArgPrefixFlt("-tieBreaking:", 0.01, "Tie breaking"); // 1e-2 
	const TStr ConfigFNm = Env.GetIfArgPrefixStr("-config:", "titanic.config", "Config file");
	const TStr AttrHeuristic= Env.GetIfArgPrefixStr("-attrEval:", "InfoGain", "Attribute evaluation heuristic");
	const int GracePeriod = Env.GetIfArgPrefixInt("-gracePeriod:", 300, "Grace period"); // 3e2 
	const int DriftCheck = Env.GetIfArgPrefixInt("-driftCheck:", 10000, "Drift check"); // 1e4 
	const int WindowSize = Env.GetIfArgPrefixInt("-windowSize:", 100000, "Window size"); // 1e5 
	
	// usage example 
	PHoeffdingTree ht = THoeffdingTree::New("docs/" + ConfigFNm, GracePeriod, SplitConfidence, TieBreaking,
		DriftCheck, WindowSize);
	ht->SetAdaptive(ConceptDriftP);
	ProcessData("data/titanic-220M.dat", ht);
	ht->Export("exports/titanic-"+TInt(ht->ExportN).GetStr()+".gv", TExportType::DOT);
	return 0;
}
// process data line-by-line 
void ProcessData(const TStr& FileNm, PHoeffdingTree HoeffdingTree) {
	Assert(TFile::Exists(FileNm));
	TFIn FIn(FileNm);
	TStr Line;
	while (FIn.GetNextLn(Line)) { HoeffdingTree->Process(Line, ','); }
}

References

About

Incremental classification and regression tree learning algorithm.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published