-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration test for scan rows with selection #2158
Changes from 7 commits
b701d22
af8b54f
333288d
f277b24
4b27280
7153bee
25cb93d
881752c
3f991d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,7 @@ use crate::arrow::record_reader::{ | |
buffer::{BufferQueue, ScalarBuffer, ValuesBuffer}, | ||
definition_levels::{DefinitionLevelBuffer, DefinitionLevelBufferDecoder}, | ||
}; | ||
use crate::column::page::PageIterator; | ||
use crate::column::{ | ||
page::PageReader, | ||
reader::{ | ||
|
@@ -184,11 +185,24 @@ where | |
/// # Returns | ||
/// | ||
/// Number of records skipped | ||
pub fn skip_records(&mut self, num_records: usize) -> Result<usize> { | ||
pub fn skip_records( | ||
&mut self, | ||
num_records: usize, | ||
pages: &mut dyn PageIterator, | ||
Ted-Jiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) -> Result<usize> { | ||
// First need to clear the buffer | ||
let end_of_column = match self.column_reader.as_mut() { | ||
Some(reader) => !reader.has_next()?, | ||
None => return Ok(0), | ||
None => { | ||
// If we skip records before all read operation | ||
// we need set `column_reader` by `set_page_reader` | ||
if let Some(page_reader) = pages.next() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix skip before all read operator, need set column_reader |
||
self.set_page_reader(page_reader?)?; | ||
false | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is wrong, as it will now only mark end_of_column when it reaches the end of the file, instead of the end of a column chunk within a row group. This will break record delimiting for repeated fields. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tustvold i move it out to
I think in this situation , only skip the first page without read any record the |
||
} else { | ||
return Ok(0); | ||
} | ||
} | ||
}; | ||
|
||
let (buffered_records, buffered_values) = | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -299,7 +299,7 @@ where | |
let mut remaining = num_records; | ||
while remaining != 0 { | ||
if self.num_buffered_values == self.num_decoded_values { | ||
let metadata = match self.page_reader.peek_next_page()? { | ||
let mut metadata = match self.page_reader.peek_next_page()? { | ||
None => return Ok(num_records - remaining), | ||
Some(metadata) => metadata, | ||
}; | ||
|
@@ -312,13 +312,20 @@ where | |
|
||
// If page has less rows than the remaining records to | ||
// be skipped, skip entire page | ||
if metadata.num_rows < remaining { | ||
while metadata.num_rows < remaining { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this necessary, there is already an outer while loop? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because first add below
if we still use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This while loop should result in the same behaviour as the previous There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh... it's an useless loop |
||
self.page_reader.skip_next_page()?; | ||
remaining -= metadata.num_rows; | ||
continue; | ||
metadata = match self.page_reader.peek_next_page()? { | ||
None => return Ok(num_records - remaining), | ||
Some(metadata) => metadata, | ||
}; | ||
} | ||
// because self.num_buffered_values == self.num_decoded_values means | ||
// we need reads a new page and set up the decoders for levels | ||
self.read_new_page()?; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we could check the return type of this, and short-circuit if it returns false? |
||
} | ||
|
||
// start skip values in page level | ||
let to_read = remaining | ||
.min((self.num_buffered_values - self.num_decoded_values) as usize); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix wrong logic,
remaining
record need read