-
Notifications
You must be signed in to change notification settings - Fork 164
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature: added parquet sampling (#1070)
* parquet sampling function developed in data_utils.py; Added sample_nrows argument in ParquetData class; Added test_len_sampled_data in test_parquet_data.py * resolved conflict with dev, added more tests * fixed sample empty column bug * fixed comments in data_utils.py, including: 1. added type of return in sample_parquet function; 2. changed variable names in sample_parquet function to more descriptive names (select -> sample_index, out -> sample_df); 3. created convert_unicode_col_to_utf8 function to reduce repeating code in sample_parquet and read_parquet_df functions * 1. renamed variable names in covert_unicode_col_to_utf8 function (data_utils.py) to be more descriptive (types -> input_column_types, col -> iter_column), other part unchanged 2. test_parquet_data.py, move import statement to the top of file 3. test_parquet_data.py, merged all tests about parquet sample feature to their original tests * checked the datatype and input file path before and after reload with sampling option enabled * test * delete test edit in avro_data.py, updated fastavro version in requirment.txt * remove fastavro.reader type * change fastavro version back to original * 1. sample_parquet function description 2. test_len_data method keep one sample length test 3. remove sampling test in test_specifying_data_type 4. remove sampling test in test_reload_data
- Loading branch information
Showing
4 changed files
with
138 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters