-
-
Notifications
You must be signed in to change notification settings - Fork 1k
GSoC_2018_project_arrow
Now that more and more data science project starts to use Apache Arrow as a memory backend
or at least has the support to export the data into an Arrow Buffer (see for example SPARK-13534) it would be great that some of the Shogun's CFeatures
classes could use Arrow Buffer as a memory backend.
Medium.
You need know
- C++
- basic software engineering
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical dataIt provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
Using Arrow as CFeatures
would not only allow us for example to directly work over pandas DataFrame via pyarrow,
but in the long run, as the number of supported languages of Arrow is getting more and more, slowly and gradually we could
get rid of some of the SWIG based typemaps, which would result in a significant memory footprint reduction as well as
performance.
Start with checking out the prototype in the feature/arrow
branch.