GitHub - zhangjiajin/streamDM: Stream Data Mining Library for Spark Streaming

#streamDM for Spark Streaming

streamDM is a new open source software for mining big data streams using Spark Streaming, started at Huawei Noah's Ark Lab. streamDM is licensed under Apache Software License v2.0.

Big Data Stream Learning

Big Data stream learning is more challenging than batch or offline learning, since the data may not keep the same distribution over the lifetime of the stream. Moreover, each example coming in a stream can only be processed once, or they need to be summarized with a small memory footprint, and the learning algorithms must be very efficient.

Spark Streaming

Spark Streaming is an extension of the core Spark API that enables stream processing from a variety of sources. Spark is a extensible and programmable framework for massive distributed processing of datasets, called Resilient Distributed Datasets (RDD). Spark Streaming receives input data streams and divides the data into batches, which are then processed by the Spark engine to generate the results.

Spark Streaming data is organized into a sequence of DStreams, represented internally as a sequence of RDDs.

Included Methods

In this first pre-release of StreamDM, we have implemented:

In the next releases we plan to add:

Random Forests
Frequent Itemset Miner: IncMine

Going Further

For a quick introduction to running StreamDM, refer to the Getting Started document. The StreamDM Programming Guide presents a detailed view of StreamDM. The full API documentation can be consulted here.

##Mailing lists ###User support and questions mailing list: streamdm-user@googlegroups.com ###Development related discussions: streamdm-dev@googlegroups.com

Name		Name	Last commit message	Last commit date
Latest commit History 349 Commits
scripts		scripts
src		src
website		website
README.md		README.md
learn.sbt		learn.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Stream Learning

Spark Streaming

Included Methods

Going Further

About

Releases

Packages

Languages

zhangjiajin/streamDM

Folders and files

Latest commit

History

Repository files navigation

Big Data Stream Learning

Spark Streaming

Included Methods

Going Further

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages