-
Notifications
You must be signed in to change notification settings - Fork 10
Run test queries
Be sure you have started a real or virtual Ceph cluster as per the Build page.
CREATE STORAGE POOL for the test data
The below commands assume you have built SkyhookDM and are in the build dir. For a non-virtual cluster, you can remove the bin/
prefix.
bin/rados mkpool tpchdata; # here we will use poolname=tpchdata for all of the below test queries.
# Alternativey, if that fails use this command:
bin/ceph osd pool create tpchdata 128 128 replicated; # +/- 256 placement groups works well with 1--8 OSDs in practice. You should use a power of 2.
GET TEST DATA. Each object contains 10 rows, and is formatted as per type indicated, where type is one of SkyFormatType. There are 2 test data objects for each supported data format.
# choose one object format type
OBJ_TYPE=SFT_JSON;
OBJ_TYPE=SFT_ARROW;
OBJ_TYPE=SFT_FLATBUF_FLEX_ROW;
# get the sample data
OBJ_BASE_NAME=skyhook.${OBJ_TYPE}.lineitem;
for i in {0..1}; do
rm -rf ${OBJ_BASE_NAME}.$i; # remove the old test objects
wget https://users.soe.ucsc.edu/~jlefevre/skyhookdb/testdata/${OBJ_BASE_NAME}.$i;
done;
STORE TEST DATA into Ceph objects. Setting the PATH variable is only needed when using a virtual dev cluster from the current build dir.
yes | PATH=$PATH:bin ../src/progly/rados-store-glob.sh tpchdata public lineitem skyhook.${OBJ_TYPE}.lineitem.*
VERIFY DATA OBJECTS LOADED, should see tpchdata pool with 2 objects stored.
bin/rados df;
bin/rados --pool tpchdata ls -
To REMOVE an object
bin/rados --pool <poolname> rm <objname>
SHOW CLI Help.
bin/run-query --help;
SHOW the table schema.
- i.e.,
DESCRIBE TABLE T
, add the--header
flag and--limit 0
flag - the
--header
flag can be added to any query to show the schema of the query result.
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --header --limit 0;
EXECUTE EXAMPLE QUERIES. "CLS" execution indicates partial query processing to be done by each Ceph storage server (using Ceph's object class mechanism) before returning data to client, else client retrieves all object data from storage servers and processes queries entirely. Examples use TPC-H LINEITEM table. OPTIONAL FLAGS:
- Add or remove
--quiet
flag to see summary only or result data - Add or remove
--use-cls
flag to push execution onto storage servers using object classes ('cls')
SELECT/PROJECT QUERIES.
SELECT * FROM lineitem; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem";
SELECT * FROM lineitem; --expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "*";
SELECT * FROM lineitem WHERE orderkey<=5 AND linenumber>4; -- expected=4/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,leq,5;linenumber,gt,4";
SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --project "orderkey,tax,comment,linenumber,returnflag"
SELECT linenumber,returnflag FROM lineitem; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --project "linenumber,returnflag";
SELECT linenumber FROM lineitem; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --project "linenumber";
SELECT returnflag FROM lineitem; -- expected=20/20;
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --project "returnflag";
SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey<3; -- expected=7/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,lt,3" --project "orderkey,tax,comment,linenumber,returnflag";
SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey>=3; -- expected=13/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,geq,3" --project "orderkey,tax,comment,linenumber,returnflag";
SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey<2; -- expected=6/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,lt,2" --project "orderkey,tax,comment,linenumber,returnflag";
SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey<=1; -- expected=6/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,leq,1" --project "orderkey,tax,comment,linenumber,returnflag";
SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey=1; -- expected=6/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,eq,1" --project "orderkey,tax,comment,linenumber,returnflag";
SELECT orderkey,tax,comment,linenumber,returnflag FROM lineitem WHERE orderkey<1; -- expected=0/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,lt,1" --project "orderkey,tax,comment,linenumber,returnflag";
SELECT * FROM lineitem WHERE comment LIKE %bold%; -- expected=1/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,bold";
SELECT * FROM lineitem WHERE comment LIKE %bld%; -- expected=0/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,bld";
SELECT * FROM lineitem WHERE comment LIKE %[o|u]%; -- expected=18/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,[o|u]";
SELECT * FROM lineitem WHERE comment LIKE %[o|u]s%; -- expected=8/20;
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,[o|u]s";
SELECT * FROM lineitem WHERE comment LIKE %[o|u]x%; -- expected=3/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,[o|u]x";
SELECT * FROM lineitem WHERE comment LIKE %[o]%; -- expected=13/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,[o]";
SELECT * FROM lineitem WHERE comment LIKE %[u]x%; -- expected=14/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,[u]";
SELECT * FROM lineitem WHERE comment LIKE %u%; -- expected=14/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,u";
SELECT * FROM lineitem WHERE comment LIKE %.+%; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "comment,like,.+" ;
SELECT * FROM lineitem WHERE shipdate BEFORE 1992-04-27; -- expected=0/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,before,1992-04-27";
SELECT * FROM lineitem WHERE shipdate = 1992-04-27; -- expected=1/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,eq,1992-04-27";
SELECT * FROM lineitem WHERE shipdate AFTER 1992-04-27; -- expected=19/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,after,1992-04-27";
SELECT * FROM lineitem WHERE shipdate < 1992-04-27; -- expected=0/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,lt,1992-04-27";
SELECT * FROM lineitem WHERE shipdate <= 1992-04-27; -- expected=1/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,leq,1992-04-27";
SELECT * FROM lineitem WHERE shipdate > 1992-04-27; -- expected=19/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,gt,1992-04-27";
SELECT * FROM lineitem WHERE shipdate != 1992-04-27; -- expected=19/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,ne,1992-04-27";
SELECT * FROM lineitem WHERE shipdate >= 1992-04-27; -- expected=20/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,geq,1992-04-27";
SELECT linenumber,shipdate FROM lineitem WHERE shipdate >= 1992-04-27; -- expected=19/20
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "shipdate,ne,1992-04-27" --project "linenumber,shipdate";
BASIC AGGREGATION QUERIES. Supports min/max/sum/count with multiple predicates
SELECT SUM(tax) FROM lineitem WHERE orderkey>=5;
-- returns 1 result row from each object, containing the local sum (note:sum starting from 0 as noted below)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --select "orderkey,geq,5;tax,sum,0;" --project "tax"
INDEX CREATION. Regular (primary/secondary) indexes may have up to 4 integer cols, may be unique cols or not (primary key cols in lineitem table=linenumber,orderkey). RID indexes do not require col names, simply specify _RID_INDEX_
for cols arg. TXT indexes are single column of type string/varchar, and may optionally specify delims and stopwords. NOTE: --use-cls
flag is necessary to create object-local indexes on each storage server.
CREATE INDEX on RID; -- (unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols _RID_INDEX_ --use-cls
CREATE INDEX on linenumber; -- (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols linenumber --use-cls
CREATE INDEX on linenumber,orderkey; -- (unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols linenumber,orderkey --use-cls
CREATE INDEX on orderkey,linenumber; -- (unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols orderkey,linenumber --use-cls;
CREATE INDEX on orderkey,partkey,suppkey,linenumber; -- (unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols orderkey,partkey,suppkey,linenumber --use-cls
CREATE INDEX on orderkey,partkey,suppkey; -- (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols orderkey,partkey,suppkey --use-cls
CREATE TXT INDEX on comment; -- (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols comment --use-cls
CREATE TXT INDEX on comment; -- ignore stopwords (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols comment --index-ignore-stopwords --use-cls
CREATE TXT INDEX on comment; -- ignore stopwords, use char 'i' as text delim (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols comment --index-ignore-stopwords --index-delims "i" --use-cls
CREATE TXT INDEX on comment; -- ignore stopwords, use chars 'aeiou' as text delim (non-unique)
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-create --index-cols comment --index-ignore-stopwords --index-delims "aeiou" --use-cls
VIEW the keys/vals in omap (RocksDB). (to see keys on both test objects, use obj.0 or obj.1)
bin/rados -p tpchdata listomapkeys obj.0;
bin/rados -p tpchdata listomapvals obj.0;
VIEW each osd's log for cls messages (if compiled with debug code). NOTE: Use osd.0, 1, or 2 since there are 3 osds in a virtual test cluster or the osd number in a real cluster. From a vstart cluster
grep "<cls>" <build-path>/skyhookdm-ceph/build/out/osd.0.log
From a real cluster,
grep "<cls>" /var/log/ceph/ceph-osd.0.log
INDEX QUERIES. If index exists, will first pushdown index preds to index, then apply selection preds to rows returned by index.
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=1 AND orderkey=71;
-- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000001-00000000000000000071
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,1;orderkey,eq,71" --use-cls --project "linenumber,orderkey,extendedprice"
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=0 AND orderkey=71; -- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000000-00000000000000000071
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,0;orderkey,eq,71" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=3 AND orderkey=67; -- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000003-00000000000000000067
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,3;orderkey,eq,67" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=6 AND orderkey=67;-- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000006-00000000000000000067
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,6;orderkey,eq,67" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=6 AND orderkey=67 AND tax>=0.05; -- expected x/20 rows (x/20 rows match the tax predicate), omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000006-00000000000000000067
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,6;orderkey,eq,67" --select "tax,leq,0.05" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE linenumber=6 AND orderkey>=67; -- expect ASSERT ERR SkyIndexUnsupportedOpType due to >= since only equality preds currently supported
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,6;orderkey,geq,67" --use-cls
SELECT linenumber,orderkey,extendedprice FROM lineitem WHERE (linenumber=6 AND orderkey=67) AND linenumber>3; -- expected x/20 rows, omap key search on IDX_REC:*-LINEITEM:LINENUMBER-ORDERKEY:00000000000000000006-00000000000000000067
-- testing additional select predicates are applied after index predicates
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --index-read --index-cols linenumber,orderkey --index-preds "linenumber,eq,6;orderkey,eq,67" --select "linenumber,gt,3" --use-cls
DATA TRANSFORMATIONS. This converts existing format on disk (currently flatbuffer by default) to Apache Arrow format/Parquet on disk. Specify project cols to create an Arrow buffer with only those cols projected (defaults to all cols if none specified). The original data is then a sequence of Arrow buffers on disk, each with a subset of all cols.
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --transform-db --transform-format-type arrow --project extendedprice
PostgreSQL queries. Requires Postgres10.0+ installed with File FDW extension. Regardless of underlying data format stored in Skyhook (row/col flatbuffer/arrow, json, etc.), results are returned from Skyhook as tuples to Postgres in PG_BINARY data format. Results are serialized into a postgres foreign table. Views can then be defined over this table for further analysis without returning to the foreign base data. The foreign table schema should match the schema of the base data, this assumption is currently not verfied.
bin/run-query --num-objs 2 --pool tpchdata --oid-prefix "public" --table-name "lineitem" --output-format SFT_PG_BINARY
Concepts
> Architecture
> Data formats
> > Flatbuffers
> > Arrow
> Test Integration
Tutorials
> Build
> Dev/Test environment
> Run test queries
> Ceph-SkyhookDM cluster setup
Technical Reports
> Google Summer of Code 2019 Report
> Google Summer of Code 2020 Report
> Flatbuffers and Flexbuffers access experiments
Archives
> CloudLab Ceph Deployment Notes
> Deploy Notes
> Running CloudLab Experiments
> Installing a Non Release Fork on CloudLab
> Installing with Skyhook-Ansible
> FBU Queries (PDSW19)
> Paper Experiments
> Skyhook Development on CloudLab
> Building Deb Files for Cloudlab Installs (Sp2019)
> CephFS FUSE on CloudLab