-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arff deserialiser #4627
base: develop
Are you sure you want to change the base?
arff deserialiser #4627
Conversation
714905e
to
192281a
Compare
src/shogun/io/ARFFFile.cpp
Outdated
} | ||
for (auto iter = line.begin(); iter != line.end(); ++iter) | ||
{ | ||
if (iter->front() == '\'' || iter->front() == '\"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigsterkr does stuff like this work independent of platform?
src/shogun/io/ARFFFile.cpp
Outdated
std::vector<std::string> elems; | ||
auto innner_string = | ||
m_current_line.substr(strlen(m_attribute_string)); | ||
split(innner_string, " ,\t\r\f\v", std::back_inserter(elems)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigsterkr and this?
1667862
to
a0a778e
Compare
@karlnapf how does shogun handle missing data values? It is possible to have NaN values ("?" in arff files) ? |
Yes you can add it under the toy data in |
idk to be honest :D we might have to think/work a bit for that |
Nice! |
For unittest I could try and use strings? I can then feed the stream directly, rather than a file. Would just have to refactor the class a bit though by adding a constructor for streams rather than file names |
I think that’d be a nice feature anyways or? |
Yes, I need it for openml stuff! |
5ccd879
to
6b855f6
Compare
So datetime somewhat works, but might have to add strptime definition for the windows build? (see https://stackoverflow.com/a/321877). The conversion to the c++ convention of datetime format needs to be tested more too. I convert dates to unix timestamps btw, because it is the only representation that makes sense to me that can be stored in a numeric matrix |
@karlnapf I am not sure how to do the parsing of strings though, for that we would almost need a |
Great. Ill check it out |
Uh that will be tricky with parameter framework. Also So one thing might be to think about a matrix-like structure to handle non-primitives. I mean std::vector is fine with the param framework We have started using variant in the pipeline...although that causes all sort of headaches (needs an override clone/equals, but no idea how to make put/get work atm), but that might be fixed in any.h. Maybe a good point to start thinking about how to handle variant .... @vigsterkr ? |
583a724
to
101ef8d
Compare
@karlnapf @vigsterkr so the return type is now
The missing data can be ignored for now I think, and just be skipped by the parser, and then one day we can add imputers. |
also the error seems to be due to |
4174168
to
ada3794
Compare
class ARFFSerializer | ||
{ | ||
public: | ||
ARFFSerializer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf I had some time so I wrote this (pretty bad) serialiser.. Would it make sense to add a visitor pattern to SGObjects? Or at least for features?
977b9f8
to
63015ec
Compare
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Simple ARFF deserialiser. See the spec here.
@karlnapf shall I add a ARFF file as an example and a unittest/meta example?
TODO: