forked from liyang0920/cores
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.html
26 lines (25 loc) · 2.62 KB
/
README.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<html>
<head><title>CORES</title></head>
<body>
<h1>Introduction</h1>
<p>This project, CORES (Column-Oriented Regeneration Embedding Scheme), aims at pushing high-selective filters down into the column-based storage, where each filter consists of several filtering conditions on a field. By applying the filtering conditions to the column scan in storage, it tends to reduce both the I/O and the deserialization cost by introducing a fine-gained composition based on bitset. It also generalizes this technique by two pair-wise operations rollup and drilldown, such that a series of conjunctive filters can effectively deliver their payloads in nested schema. It can be applied to the nested relational model especially when hierarchical entities are frequently required by adhoc queries.
</p>
<p>This code is released under the Apache License, See LICENSE.txt and NOTICE.txt for more info.</p>
<h1>Important Implementations</h1>
<p>1.cores.avro.FilterBatchColumnReader, the class that reads the cores files with filters. It conducts the columns about filters, initializes a bitset, delivers it through the scheme path, and then reads the fetching columns according the bitset.</p>
<p>2.cores.avro.FilterOperator,the interface that defines two functions of filters, getName() and isMatch(T t). getName() returns the name of the filter column, isMatch(T t) returns whether t is hitted by the filter.</p>
<p>3.cores.avro.mapreduce.NeciFilterRecordReader, the class that extends org.apache.hadoop.mapreduce.RecordReader, uses cores.avro.FilterBatchColumnReader to read the cores files in HDFS.</p>
<p>4.cores.core.UnionOutputBuffer/UnionInputBuffer, the class that writes/reads the columns with UNION type.</p>
<h3>test</h3>
<p>Contains the test framework. Several tests are making sure the examples runs. The test framework uses TestNG.</p>
<h1>State-of-the-art Comparison</h1>
<h3>[project](https://github.com/liyang0920/cores)</h3>
<p><a href="https://github.com/liyang0920/cores/tree/master/avro/src/test/java/local/cores/query">cores test</a><br/>
<a href="https://github.com/liyang0920/cores/tree/master/avro/src/test/java/local/avro/query">avro test</a><br/>
<a href="https://github.com/liyang0920/cores/tree/master/avro/src/test/java/local/trevni/query">trevni test</a><br/>
<a href="https://github.com/liyang0920/cores/tree/master/avro/src/test/java/local/parquet/query">parquet test</a><br/>
<a href="https://github.com/lwhay/spark-tpch/tree/parquet/src/main/scala/cores">Spark with hdfs/parquet/json/avro</a><br/>
<a href="ch07/readme.html">ch07</a>Hive with MapReudce and Tez<br/>
(*) no build file presently but Eclipse and IntelliJ IDEA files are working
</p>
</body></html>