TPCH database and dbgen data generation utility, courtesy of www.tpc.org,
were developed to provide an approach to benchmarking and include:
dbgenutility, a utility to populate the database with a specified amount of data (Scale Factor)
tpchbenchmark queries, a set of pre-defined data warehouse queries to run against the database
We will show the details of the creation of the tpch database and it's population using the
utility to generate data.
1 Download dbgen
tpch dbgen utility generates, by default, a set of flat files suitable for loading into
tpch schema with the size based on the “Scale Factor” argument.
A scale factor of 1 produces a complete data set of approximately 1 GB,
a scale factor of 10 produces a data set of approximately 10 GB etc.
dbgen source code:
$ git clone https://github.com/electrum/tpch-dbgen.git
gcccompiler installed on your machine.
2 Compile dbgen
In the downloaded directory (tpch-dbgen), edit the file
makefile.suite and set
the following variables to the appropriate vaules:
CC=gcc DATABASE=INFORMIX MACHINE=LINUX WORKLOAD=TPCH
The run the make utility:
$ make -f makefile.suite
3 Test dbgen
Now you are ready to generate the
Change to the appropriate directory where you want to generate
tpchfiles. For example, create a subdirectory under the
$ mkdir data $ cd data
dbgenexecutable file and
$ cp ../dbgen . $ cp ../dists.dss .
dbgenfor the appropriate database size factor (1GB in the sample).Copy
./dbgen -s 1
Generation may take a while. When completed, you can see the resulting files.
$ ls -l
As a sample, generation of
TPCH scale 10 on an Intel NUC i7-8550U, 1.9Ghz with NVME disk takes 2 minutes and it's load takes 5 minutes
4 Scale factor
Database will be sized according the selected scale factor.
4.1 Table sizes
4.2 Number of rows according scale