-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement read_csv
method.
#151
Comments
Hello, I am interested at this issue and would you like to share your designs about this method |
@yingmanwumen I've updated the description of this PR, please feel free to ask questions or share your opinions here. |
@tushushu |
And I will be glad to know what you think , I'm still a beginner and don't understand design and so on.Thank you for your patience : ) |
@yingmanwumen My idea is to use Rust csv crate to read the |
Hello, I have read the source code and found several issues here:
/// Abstract List with generic type elements.
pub trait List<T>
where
T: PartialEq + Clone,
Self: Sized,
{
...
}
// ... Some other codes
// a simple proof of my idea
fn do_read_csv(py: Python, mut reader: Reader<File>) -> PyResult<Vec<PyObject>> {
// Try to create a `Vec<PyObject>`
// Maybe complex to map a schema with unlimited length into Rust types,
// such as "{ "foo": "int", "bar": "string" }" maps into `Integer64` and `StringList`
let ulist = PyModule::import(py, "ulist.core")?
.call_method1("UltraFastList", (StringList::new(vec![]),))?;
let mut vec_list: Vec<PyObject> = Vec::new();
vec_list.push(ulist.into());
for iter in reader.records() {
match iter {
Err(err) => return Err(PyIOError::new_err(err.to_string())),
Ok(records) => {
for record in records.into_iter() {
// Just a simple proof to test how to call `append` method in rust
// I didn't found another way to directly call `append` in rust
// So if all in rust, things will be much more complicated
vec_list[0].call_method1(py, "append", (record,))?;
}
}
}
}
Ok(vec_list)
} So, after thinking for a long time(may be long), I think it's better to leave the IO aspect to python to reduce complexity. Maybe there are some other simple ways that I haven't found to do this in Rust. If there is another better way, please let me know. Also, I hope to obtain your opinion. |
Another issue is that not all fields in
|
I am working on another branch which allow |
@yingmanwumen Thanks very much for the effort and hard work. Could you look at this issue first to see if it can solve our issue? If not, then let's try function like this: struct AnotherStruct{
a: Vec<StringList>,
b: Vec<IntList>,
c: Vec<BoolList>,
}
fn read_csv(...) -> AnotherStruct And let Python to merge these lists(vectors) together, which will be very fast. If this still not work, then let's just use pure Python codes to implement Thanks again and have a nice day~ |
That's awesome, let me know if there is any problem~ |
OK, I saw that ^_^ |
The branch to handle missing values has been merged into the main branch. You may need to merge the main branch into your branch first, and then fix some branch conflicts. @yingmanwumen |
@yingmanwumen I wrote a half done |
In order to make the function efficient enough, we'd better use Rust's library to read the csv file rather than Python. The 1st version
read_csv
function looks like this:In the next version, the parameter
schema
is optional, when it's set toNone
, then theread_csv
function will auto detect the name and dtype of each column in the .csv file.The text was updated successfully, but these errors were encountered: