Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Use stats to guide query parallelization #49

Open
jtaylor-sfdc opened this issue Feb 14, 2013 · 4 comments
Open

Use stats to guide query parallelization #49

jtaylor-sfdc opened this issue Feb 14, 2013 · 4 comments

Comments

@jtaylor-sfdc
Copy link
Contributor

We're currently not using stats, beyond a table-wide min key/max key cached per client connection, to guide parallelization. If a query targets just a few regions, we don't know how to evenly divide the work among threads, because we don't know the data distribution. This other issue is targeting gather and maintaining the stats, while this issue is focused on using the stats.

The main changes are:

  1. Create a PTableStats interface that encapsulates the stats information (and implements the Writable interface so that it can be serialized back from the server).
  2. Add a stats member variable off of PTable to hold this.
  3. From MetaDataEndPointImpl, lookup the stats row for the table in the stats table. If the stats have changed, return a new PTable with the updated stats information. We may want to cache the stats row and have the stats gatherer invalidate the cache row when updated so we don't have to always do a scan for it. Additionally, it would be idea if we could use the same split policy on the stats table that we use on the system table to guarantee co-location of data (for the sake of caching).
  4. modify the client-side parallelization (ParallelIterators.getSplits()) to use this information to guide how to chunk up the scans at query time.

This should help boost query performance, especially in cases where the data is highly skewed. It's likely the cause for the slowness reported in this issue: #47.

@ghost ghost assigned tonyhuang Feb 14, 2013
@jtaylor-sfdc jtaylor-sfdc mentioned this issue Feb 20, 2013
@jtaylor-sfdc
Copy link
Contributor Author

I'm going to split this up into multiple enhancement issues. Turns out that Jesse is already working on the stats system table (item 1-4). I'll split these from the others. Tony - how about you start from adding a new PTableStats interface stored off of PTable? You can dummy up the data until Jesse is ready and figure out how to change ParallelIterators to take advantage of the new byte[] guideposts.

@jyates
Copy link
Contributor

jyates commented Apr 5, 2013

Pushed up a branch with the initial implementation of hbase-stat (from an internal project). Right now, it builds on its own and makes its own jar. If you'd like James, I can work on a couple more commits on top of it to get it into shape for building along with the rest of Phoenix. However, it wouldn't be all that terrible if we wanted to release it separately from phoenix - your call.

@tonyhuang
Copy link
Contributor

Hi Jesse, can you send a pull request to James so we can pull the package into the phoenix project?

Thanks
Tony

@mujtabachohan
Copy link
Contributor

Stats needed to decide when to use Skip Scan

Schema
CREATE TABLE T (HOST VARCHAR NOT NULL, DATE DATE NOT NULL,CF1.A BIGINT,CF1.B BIGINT,CF2.C BIGINT CONSTRAINT PK PRIMARY KEY (HOST, DATE))

Query example
SELECT COUNT(1) FROM TABLE WHERE HOST LIKE '?%' AND DATE >CURRENT_DATE()-?;

Factor NOT affecting performance
Percentage of rows filtered by leading PK - Same performance curve as range scan

Factors affecting performance
Percentage of rows filtered by trailing PK - Affected by cardinality of leading PK. Low cardinality of leading PK: Skip scan gets faster as number of rows matching filter decreased, at worst (all rows match) it is equal to scan. High cardinality of leading PK: Always slower than scan and shows the same performance curve as range scan.

Cardinality of leading PK - As elaborated above, Skip Scan performance is dependent of cardinality of leading PK. Performance get better as number of rows filtered by trailing PK gets lower.

Therefore the main factor that affects skip scan performance is cardinality of leading PK (see graph below which shows query time on Y axis and percentage of rows filtered by trailing PK on X axis). Our test result shows that Skip Scan should be used when cardinality of leading PK is less than 75% (i.e. less than every 3 out of 4 leading PK is unique).

image
Legend: P1.2 - Phoenix 1.2 using Skip Scan, P1.1 - Phoenix 1.1 using Range Scan

Note: 20M rows were used on 4 machine cluster for this test running HBase 0.94.6.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants