Skip to content

Commit

Permalink
add tpch-dbgen (ydb-platform#4658)
Browse files Browse the repository at this point in the history
  • Loading branch information
iddqdex authored May 18, 2024
1 parent 70713cc commit c3a3f37
Show file tree
Hide file tree
Showing 100 changed files with 30,331 additions and 0 deletions.
320 changes: 320 additions & 0 deletions ydb/library/benchmarks/gen/tpch-dbgen/LICENSE

Large diffs are not rendered by default.

220 changes: 220 additions & 0 deletions ydb/library/benchmarks/gen/tpch-dbgen/PORTING.NOTES
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# @(#)PORTING.NOTES 2.1.8.1

Table of Contents
==================
1. General Program Structure
2. Naming Conventions and Variable Usage
3. Porting Procedures
4. Compilation Options
5. Customizing QGEN
6. Further Enhancements
7. Known Porting Problems
8. Reporting Problems

1. General Program Structure

The code provided with TPC-H and TPC-R benchmarks includes a database
population generator (DBGEN) and a query template translator(QGEN). It
is written in ANSI-C, and is meant to be easily portable to a broad variety
of platforms. The program is composed of five source files and some
support and header files. The main modules are:

build.c: each table in the database schema is represented by a
routine mk_XXXX, which populates a structure
representing one row in table XXXX.
See Also: dss_types.h, bm_utils.c, rnd.*
print.c: each table in the database schema is represented by a
routine pr_XXXX, which prints the contents of a
structure representing one row in table XXX.
See Also: dss_types.h, dss.h
driver.c: this module contains the main control functions for
DBGEN, including command line parsing, distribution
management, database scaling and the calls to mk_XXXX
and pr_XXXX for each table generated.
qgen.c: this module contains the main control functions for
QGEN, including query template parsing.
varsub.c: each query template includes one or more parameter
substitution points; this routine handles the
parameter generation for the TPC-H/TPC-R benchmark.

The support utilities provide a generalized set of functions for data
generation and include:

bm_utils.c: data type generators, string management and
portability routines.

rnd.*: a general purpose random number generator used
throughout the code.

dss.h:
shared.h: a set of '#defines' for limits, formats and fixed
values
dsstypes.h: structure definitions for each table definition

2. Naming Conventions and Variable Usage

Since DBGEN will be maintained by a large number of people, it is
particularly important to observe the coding, variable naming and usage
conventions detailed here.

#define
--------
All #define directives are found in header files (*.h). In general,
the header files segregate variables and macros as follows:
rnd.h -- anything exclusively referenced by rnd.c
dss.h -- general defines for the benchmark, including *all*
extern declarations (see below).
shared.h -- defines related to the tuple definitions in
dsstypes.h. Isolated to ease automatic processing needed by many
direct load routines (see below).
dsstypes.h -- structure definitons and typedef directives to
detail the contents of each table's tuples.
config.h -- any porting and configuration related defines should
go here, to localize the changes necessary to move the suite
from one machine to another.
tpcd.h -- defines related to QGEN, rather than DBGEN

extern
------
DBGEN and QGEN make extensive use of extern declarations. This could
probably stand to be changed at some point, but has made the rapid
turnaround of prototypes easier. In order to be sure that each
declaration was matched by exactly one definition per executatble,
they are all declared as EXTERN, a macro dependent on DECLARER. In
any module that defines DECLARER, all variables declared EXTERN will
be defined as globals. DECLARER should be declared only in modules
containing a main() routine.

Naming Conventions
------------------
defines
o All defines use upper case
o All defines use a table prefix, if appropriate:
O_* relates to orders table
L_* realtes to lineitem table
P_* realtes to part table
PS_* relates to partsupplier table
C_* realtes to customer table
S_* relates to supplier table
N_* relates to nation table
R_* realtes to region table
T_* relates to time table
o All defines have a usage prefix, if appropriate:
*_TAG environment variable name
*_DFLT environment variable default
*_MAX upper bound
*_MIN lower bound
*_LEN average length
*_SD random number seed (see rnd.*)
*_FMT printf format string
*_SCL divisor (for scaled arithmetic)
*_SIZE tuple length

3. Porting Procedures

The code provided should be easily portable to any machine providing an
ANSI C compiler.
-- Copy makefile.suite to makefile
-- Edit the makefile to match the name of your C compiler
and to include appropriate compilation options in the CFLAGS
definition
-- make.

Special care should be taken in modifying any of the monetary calcu-
lations in DBGEN. These have proven to be particularly sensitive to
portability problems. If you decide to create the routines for inline
data load (see below), be sure to compare the resulting data to that
generated by a flat file data generation to be sure that all numeric
conversions have been correct.

If the compile generates errors, refer to "Compilation Options", below.
The problem you are encountering may already have been addressed in the
code.

If the compile is successful, but QGEN is not generating the appropriate
query syntax for your environment, refer to "Customizing QGEN", below.

For other problems, refer to "Reporting Problems" at the end of this
document.

4. Compilation Options

config.h and makefile.suite contain a number of compile time options intended
to make the process of porting the code provided with TPC-H/TPC-R as easy as
possible on a broad range of platforms. Most ports should consist of reviewing
the possible settings described in config.h and modifying the makefile
to employ them appropriately.

5. Customizing QGEN

QGEN relies on a number of vendor-specific conventions to generate
appropriate query syntax. These are controlled by #defines in tpcd.h,
and enabled by a #define in config.h. If you find that the syntax
generated by QGEN is not sufficient for your environment you will need
to modify these to files. It is strongly recomended that you not change
the general organization of the files.

Currently defined options are:

VTAG -- marks a variable substitution point [:]
QDIR_TAG -- environent variable which points to query templates
[DSS_QUERY]
GEN_QUERY_PLAN -- syntax to generate a query plan ["Set Explain On;"]
START_TRAN -- syntax to begin a transaction ["Begin Work;"]
END_TRAN -- syntax to end a transaction ["Commit Work;"]
SET_OUTPUT -- syntax to redirect query output ["Output to"]
SET_ROWCOUNT -- syntax to set the number of rows returned
["{return %d rows}"]
SET_DBASE -- syntax to connect to a database

6. Further Enhancements

load_stub.c provides entry points for two likely enhancements.

The ld_XXXX routines make it possible to load the
database directly from DBGEN without first writing the database
population out to the filesystem. This may prove particularly useful
when loading larger database populations. Be particularly careful about
monetary amounts. To assure portability, all monetary calcualtion are
done using long integers (which hold money amounts as a number of
pennies). These will need to be scaled to dollars and cents (by dividing
by 100), before the values are presented to the DBMS.

The hd_XXXX routines allow header information to be written before the
creation of the flat files. This should allow system which require
formatting information in database load files to use DBGEN with only
a small amount of custom code.

qgen.c defines the translation table for query templates in the
routine qsub().

varsub.c defines the parameter substitutions in the routine varsub().

If you are porting DBGEN to a machine that is not supports a native word
size larger that 32 bits, you may wish to modify the default values for
BITS_PER_LONG and MAX_LONG. These values are used in the generation of
the sparse primary keys in the order and lineitem tables. The code has
been structured to run on any machine supporting a 32 bit long, but
may be slightly more efficient on machines that are able to make use of
a larger native type.

7. Known Porting Problems

The current codeline will not compile under SunOS 4.1. Solaris 2.4 and later
are supported, and anyone wishing to use DBGEN on a Sun platform is
encouraged to use one of these OS releases.


8. Reporting Problems

The code provided with TPC-H/TPC-R has been written to be easily portable,
and has been tested on a wide variety of platforms, If you have any
trouble porting the code to your platform, please help us to correct
the problem in a later release by sending the following information
to the TPC D subcommittee:

Computer Make and Model
Compiler Type and Revision Number
Brief Description of the problem
Suggested modification to correct the problem

5 changes: 5 additions & 0 deletions ydb/library/benchmarks/gen/tpch-dbgen/answers/q1.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
l|l|sum_qty |sum_base_price |sum_disc_price |sum_charge |avg_qty |avg_price |avg_disc |count_order
A|F|37734107.00|56586554400.73|53758257134.87|55909065222.83|25.52|38273.13|0.05| 1478493
N|F|991417.00|1487504710.38|1413082168.05|1469649223.19|25.52|38284.47|0.05| 38854
N|O|74476040.00|111701729697.74|106118230307.61|110367043872.50|25.50|38249.12|0.05| 2920374
R|F|37719753.00|56568041380.90|53741292684.60|55889619119.83|25.51|38250.85|0.05| 1478870
21 changes: 21 additions & 0 deletions ydb/library/benchmarks/gen/tpch-dbgen/answers/q10.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
c_custkey |c_name |revenue |c_acctbal |n_name |c_address |c_phone |c_comment
57040|Customer#000057040 |734235.25|632.87|JAPAN |Eioyzjf4pp |22-895-641-3466|sits. slyly regular requests sleep alongside of the regular inst
143347|Customer#000143347 |721002.69|2557.47|EGYPT |1aReFYv,Kw4 |14-742-935-3718|ggle carefully enticing requests. final deposits use bold, bold pinto beans. ironic, idle re
60838|Customer#000060838 |679127.31|2454.77|BRAZIL |64EaJ5vMAHWJlBOxJklpNc2RJiWE |12-913-494-9813| need to boost against the slyly regular account
101998|Customer#000101998 |637029.57|3790.89|UNITED KINGDOM |01c9CILnNtfOQYmZj |33-593-865-6378|ress foxes wake slyly after the bold excuses. ironic platelets are furiously carefully bold theodolites
125341|Customer#000125341 |633508.09|4983.51|GERMANY |S29ODD6bceU8QSuuEJznkNaK |17-582-695-5962|arefully even depths. blithely even excuses sleep furiously. foxes use except the dependencies. ca
25501|Customer#000025501 |620269.78|7725.04|ETHIOPIA | W556MXuoiaYCCZamJI,Rn0B4ACUGdkQ8DZ |15-874-808-6793|he pending instructions wake carefully at the pinto beans. regular, final instructions along the slyly fina
115831|Customer#000115831 |596423.87|5098.10|FRANCE |rFeBbEEyk dl ne7zV5fDrmiq1oK09wV7pxqCgIc|16-715-386-3788|l somas sleep. furiously final deposits wake blithely regular pinto b
84223|Customer#000084223 |594998.02|528.65|UNITED KINGDOM |nAVZCs6BaWap rrM27N 2qBnzc5WBauxbA |33-442-824-8191| slyly final deposits haggle regular, pending dependencies. pending escapades wake
54289|Customer#000054289 |585603.39|5583.02|IRAN |vXCxoCsU0Bad5JQI ,oobkZ |20-834-292-4707|ely special foxes are quickly finally ironic p
39922|Customer#000039922 |584878.11|7321.11|GERMANY |Zgy4s50l2GKN4pLDPBU8m342gIw6R |17-147-757-8036|y final requests. furiously final foxes cajole blithely special platelets. f
6226|Customer#000006226 |576783.76|2230.09|UNITED KINGDOM |8gPu8,NPGkfyQQ0hcIYUGPIBWc,ybP5g, |33-657-701-3391|ending platelets along the express deposits cajole carefully final
922|Customer#000000922 |576767.53|3869.25|GERMANY |Az9RFaut7NkPnc5zSD2PwHgVwr4jRzq |17-945-916-9648|luffily fluffy deposits. packages c
147946|Customer#000147946 |576455.13|2030.13|ALGERIA |iANyZHjqhyy7Ajah0pTrYyhJ |10-886-956-3143|ithely ironic deposits haggle blithely ironic requests. quickly regu
115640|Customer#000115640 |569341.19|6436.10|ARGENTINA |Vtgfia9qI 7EpHgecU1X |11-411-543-4901|ost slyly along the patterns; pinto be
73606|Customer#000073606 |568656.86|1785.67|JAPAN |xuR0Tro5yChDfOCrjkd2ol |22-437-653-6966|he furiously regular ideas. slowly
110246|Customer#000110246 |566842.98|7763.35|VIETNAM |7KzflgX MDOq7sOkI |31-943-426-9837|egular deposits serve blithely above the fl
142549|Customer#000142549 |563537.24|5085.99|INDONESIA |ChqEoK43OysjdHbtKCp6dKqjNyvvi9 |19-955-562-2398|sleep pending courts. ironic deposits against the carefully unusual platelets cajole carefully express accounts.
146149|Customer#000146149 |557254.99|1791.55|ROMANIA |s87fvzFQpU |29-744-164-6487| of the slyly silent accounts. quickly final accounts across the
52528|Customer#000052528 |556397.35|551.79|ARGENTINA |NFztyTOR10UOJ |11-208-192-3205| deposits hinder. blithely pending asymptotes breach slyly regular re
23431|Customer#000023431 |554269.54|3381.86|ROMANIA |HgiV0phqhaIa9aydNoIlb |29-915-458-2654|nusual, even instructions: furiously stealthy n
Loading

0 comments on commit c3a3f37

Please sign in to comment.