Skip to content

Run test queries

Jeff LeFevre edited this page Jun 1, 2020 · 76 revisions

Be sure you have started a real or virtual Ceph cluster as per the Build page.


CREATE STORAGE POOL for the test data The below commands assume you have built SkyhookDM and are in the build dir. For a non-virtual cluster, you can remove the bin/ prefix.

bin/rados mkpool tpchdata;  # here we will use poolname=tpchdata for all of the below test queries.
# Alternativey, if that fails use this command:
bin/ceph osd pool create tpchdata 128 128 replicated; # +/- 256 placement groups works well with 1--8 OSDs in practice.  You should use a power of 2.

GET TEST DATA. Each object contains 10 rows, and is formatted as per type indicated, where type is one of SkyFormatType. There are 2 test data objects for each supported data format.

# choose one object format type
OBJ_TYPE=SFT_JSON;
OBJ_TYPE=SFT_ARROW;
OBJ_TYPE=SFT_FLATBUF_FLEX_ROW; 

# get the sample data
OBJ_BASE_NAME=skyhook.${OBJ_TYPE}.lineitem;
for i in {0..1}; do 
    rm -rf ${OBJ_BASE_NAME}.$i;   # remove the old test objects
    wget https://users.soe.ucsc.edu/~jlefevre/skyhookdb/testdata/${OBJ_BASE_NAME}.$i;
done;

STORE TEST DATA into Ceph objects. Setting the PATH variable is only needed when using a virtual dev cluster from the current build dir.

yes | PATH=$PATH:bin ../src/progly/rados-store-glob.sh tpchdata  public  lineitem skyhook.${OBJ_TYPE}.lineitem.*

VERIFY DATA OBJECTS LOADED, should see tpchdata pool with 2 objects stored.

bin/rados df; 
bin/rados --pool tpchdata ls -

To REMOVE an object

bin/rados --pool <poolname> rm <objname>

SHOW CLI Help.

bin/run-query --help;

SHOW the table schema.

  • i.e., DESCRIBE TABLE T, add the --header flag and --limit 0 flag
  • the --header flag can be added to any query to show the schema of the query result.
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --header --limit 0;

EXECUTE EXAMPLE QUERIES. "CLS" execution indicates partial query processing to be done by each Ceph storage server (using Ceph's object class mechanism) before returning data to client, else client retrieves all object data from storage servers and processes queries entirely. Examples use TPC-H LINEITEM table. OPTIONAL FLAGS:

  • Add or remove --quiet flag to see summary only or result data
  • Add or remove --use-cls flag to push execution onto storage servers using object classes ('cls')

SELECT/PROJECT QUERIES.


SELECT * FROM lineitem; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem";

SELECT * FROM lineitem; --expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "*";

SELECT * FROM lineitem WHERE orderkey<=5 AND linenumber>4; -- expected=4/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "orderkey,leq,5;linenumber,gt,4";

SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --project "orderkey,tax,comment,linenumber,returnflag"

SELECT linenumber,returnflag FROM lineitem; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --project "linenumber,returnflag";

SELECT linenumber FROM lineitem; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --project "linenumber";

SELECT returnflag FROM lineitem; -- expected=20/20;
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --project "returnflag";

SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey<3; -- expected=7/20 
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "orderkey,lt,3"  --project "orderkey,tax,comment,linenumber,returnflag";

SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey>=3; -- expected=13/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "orderkey,geq,3"  --project "orderkey,tax,comment,linenumber,returnflag";

SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey<2; -- expected=6/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "orderkey,lt,2"  --project "orderkey,tax,comment,linenumber,returnflag";

SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey<=1; -- expected=6/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "orderkey,leq,1"   --project "orderkey,tax,comment,linenumber,returnflag";

SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey=1; -- expected=6/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select  "orderkey,eq,1"   --project "orderkey,tax,comment,linenumber,returnflag";

SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey<1; -- expected=0/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,lt,1" --project "orderkey,tax,comment,linenumber,returnflag";

SELECT * FROM lineitem WHERE comment  LIKE %bold%; -- expected=1/20 
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,bold";

SELECT * FROM lineitem WHERE comment  LIKE %bld%; -- expected=0/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,bld";

SELECT * FROM lineitem WHERE comment  LIKE %[o|u]%; -- expected=18/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,[o|u]";

SELECT * FROM lineitem WHERE comment  LIKE %[o|u]s%; -- expected=8/20;  
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,[o|u]s";

SELECT * FROM lineitem WHERE comment  LIKE %[o|u]x%; -- expected=3/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,[o|u]x";

SELECT * FROM lineitem WHERE comment  LIKE %[o]%; -- expected=13/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,[o]";

SELECT * FROM lineitem WHERE comment  LIKE %[u]x%; -- expected=14/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,[u]";

SELECT * FROM lineitem WHERE comment  LIKE %u%; -- expected=14/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,u";

SELECT * FROM lineitem WHERE comment  LIKE %.+%; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "comment,like,.+" ;

SELECT * FROM lineitem WHERE shipdate BEFORE 1992-04-27; -- expected=0/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "shipdate,before,1992-04-27";

SELECT * FROM lineitem WHERE shipdate = 1992-04-27; -- expected=1/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "shipdate,eq,1992-04-27";

SELECT * FROM lineitem WHERE shipdate AFTER 1992-04-27; -- expected=19/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select "shipdate,after,1992-04-27";

SELECT * FROM lineitem WHERE shipdate < 1992-04-27; -- expected=0/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,lt,1992-04-27";

SELECT * FROM lineitem WHERE shipdate <= 1992-04-27; -- expected=1/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,leq,1992-04-27";

SELECT * FROM lineitem WHERE shipdate > 1992-04-27; -- expected=19/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,gt,1992-04-27";

SELECT * FROM lineitem WHERE shipdate != 1992-04-27; -- expected=19/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,ne,1992-04-27";

SELECT * FROM lineitem WHERE shipdate >= 1992-04-27; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem"  --select  "shipdate,geq,1992-04-27";

SELECT linenumber,shipdate FROM lineitem WHERE shipdate >= 1992-04-27; -- expected=19/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,ne,1992-04-27" --project "linenumber,shipdate";

BASIC AGGREGATION QUERIES. Supports min/max/sum/count with multiple predicates

SELECT SUM(tax) FROM lineitem WHERE orderkey>=5;
-- returns 1 result row from each object, containing the local sum (note:sum starting from 0 as noted below)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,geq,5;tax,sum,0;" --project "tax"

INDEX CREATION. Regular (primary/secondary) indexes may have up to 4 integer cols, may be unique cols or not (primary key cols in lineitem table=linenumber,orderkey). RID indexes do not require col names, simply specify _RID_INDEX_ for cols arg. TXT indexes are single column of type string/varchar, and may optionally specify delims and stopwords. NOTE: --use-cls flag is necessary to create object-local indexes on each storage server.

CREATE INDEX on RID; -- (unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols _RID_INDEX_ --use-cls 
CREATE INDEX on linenumber; -- (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols linenumber --use-cls
CREATE INDEX on linenumber,orderkey; -- (unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols linenumber,orderkey --use-cls
CREATE INDEX on orderkey,linenumber; -- (unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols orderkey,linenumber --use-cls;
CREATE INDEX on orderkey,partkey,suppkey,linenumber; -- (unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols orderkey,partkey,suppkey,linenumber --use-cls
CREATE INDEX on orderkey,partkey,suppkey; -- (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols orderkey,partkey,suppkey --use-cls
CREATE TXT INDEX on comment; -- (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols comment --use-cls
CREATE TXT INDEX on comment; -- ignore stopwords (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols comment --index-ignore-stopwords --use-cls
CREATE TXT INDEX on comment; -- ignore stopwords, use char 'i' as text delim  (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols comment --index-ignore-stopwords --index-delims "i" --use-cls
CREATE TXT INDEX on comment; -- ignore stopwords, use chars 'aeiou' as text delim  (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols comment --index-ignore-stopwords --index-delims "aeiou" --use-cls

VIEW the keys/vals in omap (RocksDB). (to see keys on both test objects, use obj.0 or obj.1)

bin/rados -p tpchdata listomapkeys obj.0;
bin/rados -p tpchdata listomapvals obj.0;

VIEW each osd's log for cls messages (if compiled with debug code). NOTE: Use osd.0, 1, or 2 since there are 3 osds in a virtual test cluster or the osd number in a real cluster. From a vstart cluster

grep "<cls>" <build-path>/skyhookdm-ceph/build/out/osd.0.log

From a real cluster,

grep "<cls>" /var/log/ceph/ceph-osd.0.log

INDEX QUERIES. If index exists, will first pushdown index preds to index, then apply selection preds to rows returned by index.

SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=1 AND orderkey=71;
-- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000001-00000000000000000071
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,1;orderkey,eq,71" --use-cls --project "linenumber,orderkey,extendedprice"
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=0 AND orderkey=71;  -- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000000-00000000000000000071
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,0;orderkey,eq,71" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=3 AND orderkey=67; -- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000003-00000000000000000067
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,3;orderkey,eq,67" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=6 AND orderkey=67;-- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000006-00000000000000000067
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,6;orderkey,eq,67"  --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=6 AND orderkey=67 AND tax>=0.05; -- expected x/20 rows (x/20 rows match the tax predicate), omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000006-00000000000000000067
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,6;orderkey,eq,67" --select "tax,leq,0.05" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=6 AND orderkey>=67;  -- expect ASSERT ERR SkyIndexUnsupportedOpType due to >= since only equality preds currently supported
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,6;orderkey,geq,67" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE (linenumber=6 AND orderkey=67) AND linenumber>3; -- expected  x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000006-00000000000000000067
-- testing additional select predicates are applied after index predicates
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,6;orderkey,eq,67" --select "linenumber,gt,3" --use-cls


DATA TRANSFORMATIONS. This converts existing format on disk (currently flatbuffer by default) to Apache Arrow format/Parquet on disk. Specify project cols to create an Arrow buffer with only those cols projected (defaults to all cols if none specified). The original data is then a sequence of Arrow buffers on disk, each with a subset of all cols.


bin/run-query  --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --transform-db --transform-format-type arrow --project extendedprice


PostgreSQL queries. Requires Postgres10.0+ installed with File FDW extension. Regardless of underlying data format stored in Skyhook (row/col flatbuffer/arrow, json, etc.), results are returned from Skyhook as tuples to Postgres in PG_BINARY data format. Results are serialized into a postgres foreign table. Views can then be defined over this table for further analysis without returning to the foreign base data. The foreign table schema should match the schema of the base data, this assumption is currently not verfied.

bin/run-query  --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --output-format SFT_PG_BINARY

Clone this wiki locally