Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13255][SQL] Update vectorized reader to directly return ColumnarBatch instead of InternalRows. #11435

Closed
wants to merge 10 commits into from

Commits on Mar 4, 2016

  1. [SPARK-13255][SQL] Update vectorized reader to directly return Column…

    …arBatch instead of InternalRows.
    
    Currently, the parquet reader returns rows one by one which is bad for performance. This patch
    updates the reader to directly return ColumnarBatches. This is only enabled with whole stage
    codegen, which is the only operator currently that is able to consume ColumnarBatches (instead
    of rows). The current implementation is a bit of a hack to get this to work and we should do
    more refactoring of these low level interfaces to make this work better.
    
    Results:
    TPCDS:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
    ---------------------------------------------------------------------------------
    q55 (before)                             8897 / 9265         12.9          77.2
    q55                                      5486 / 5753         21.0          47.6
    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    59dec91 View commit details
    Browse the repository at this point in the history
  2. Rebase fixes.

    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    058556c View commit details
    Browse the repository at this point in the history
  3. Fix partition columns.

    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    2330576 View commit details
    Browse the repository at this point in the history
  4. Import order fixes

    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    42875ac View commit details
    Browse the repository at this point in the history
  5. Fix use after free issue.

    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    cab64e5 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f35394c View commit details
    Browse the repository at this point in the history
  7. CR

    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    3450313 View commit details
    Browse the repository at this point in the history
  8. Fix batching.

    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    f5f1e2b View commit details
    Browse the repository at this point in the history
  9. Fix test for bucketed tables.

    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    ed79eee View commit details
    Browse the repository at this point in the history
  10. CR

    nongli committed Mar 4, 2016
    Configuration menu
    Copy the full SHA
    48102e3 View commit details
    Browse the repository at this point in the history