网站首页图片切换代码,wordpress minfy,企业所得税税收优惠政策,哪里有微信网站建设一、数据集准备
数据集可以使用官网数据集#xff0c;也可以用ssb-dbgen来准备
1.准备数据
这里最后生成表的数据行数为60亿行#xff0c;数据量为300G左右
git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen/
make1.1 生成数据
# -s 指生成多少G的数据…一、数据集准备
数据集可以使用官网数据集也可以用ssb-dbgen来准备
1.准备数据
这里最后生成表的数据行数为60亿行数据量为300G左右
git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen/
make1.1 生成数据
# -s 指生成多少G的数据
$ ./dbgen -s 40 -T c
$ ./dbgen -s 40 -T l
$ ./dbgen -s 40 -T p
$ ./dbgen -s 40 -T s1.2 创建表
CREATE TABLE customer
(C_CUSTKEY UInt32,C_NAME String,C_ADDRESS String,C_CITY LowCardinality(String),C_NATION LowCardinality(String),C_REGION LowCardinality(String),C_PHONE String,C_MKTSEGMENT LowCardinality(String)
)
ENGINE MergeTree ORDER BY (C_CUSTKEY);CREATE TABLE lineorder
(LO_ORDERKEY UInt32,LO_LINENUMBER UInt8,LO_CUSTKEY UInt32,LO_PARTKEY UInt32,LO_SUPPKEY UInt32,LO_ORDERDATE Date,LO_ORDERPRIORITY LowCardinality(String),LO_SHIPPRIORITY UInt8,LO_QUANTITY UInt8,LO_EXTENDEDPRICE UInt32,LO_ORDTOTALPRICE UInt32,LO_DISCOUNT UInt8,LO_REVENUE UInt32,LO_SUPPLYCOST UInt32,LO_TAX UInt8,LO_COMMITDATE Date,LO_SHIPMODE LowCardinality(String)
)
ENGINE MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);CREATE TABLE part
(P_PARTKEY UInt32,P_NAME String,P_MFGR LowCardinality(String),P_CATEGORY LowCardinality(String),P_BRAND LowCardinality(String),P_COLOR LowCardinality(String),P_TYPE LowCardinality(String),P_SIZE UInt8,P_CONTAINER LowCardinality(String)
)
ENGINE MergeTree ORDER BY P_PARTKEY;CREATE TABLE supplier
(S_SUPPKEY UInt32,S_NAME String,S_ADDRESS String,S_CITY LowCardinality(String),S_NATION LowCardinality(String),S_REGION LowCardinality(String),S_PHONE String
)
ENGINE MergeTree ORDER BY S_SUPPKEY;1.3 导入数据
$ clickhouse-client --query INSERT INTO db_bench.customer FORMAT CSV customer.tbl
$ clickhouse-client --query INSERT INTO db_bench.part FORMAT CSV part.tbl
$ clickhouse-client --query INSERT INTO db_bench.supplier FORMAT CSV supplier.tbl
$ clickhouse-client --query INSERT INTO db_bench.lineorder FORMAT CSV lineorder.tbl1.4 join表
这个操作耗时两个小时占用内存为29G
# 因为这个操作比较耗费内存所以要事先设置好内存限制
SET max_memory_usage 30000000000;CREATE TABLE lineorder_flat
ENGINE MergeTree ORDER BY (LO_ORDERDATE, LO_ORDERKEY)
AS SELECTl.LO_ORDERKEY AS LO_ORDERKEY,l.LO_LINENUMBER AS LO_LINENUMBER,l.LO_CUSTKEY AS LO_CUSTKEY,l.LO_PARTKEY AS LO_PARTKEY,l.LO_SUPPKEY AS LO_SUPPKEY,l.LO_ORDERDATE AS LO_ORDERDATE,l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,l.LO_QUANTITY AS LO_QUANTITY,l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,l.LO_DISCOUNT AS LO_DISCOUNT,l.LO_REVENUE AS LO_REVENUE,l.LO_SUPPLYCOST AS LO_SUPPLYCOST,l.LO_TAX AS LO_TAX,l.LO_COMMITDATE AS LO_COMMITDATE,l.LO_SHIPMODE AS LO_SHIPMODE,c.C_NAME AS C_NAME,c.C_ADDRESS AS C_ADDRESS,c.C_CITY AS C_CITY,c.C_NATION AS C_NATION,c.C_REGION AS C_REGION,c.C_PHONE AS C_PHONE,c.C_MKTSEGMENT AS C_MKTSEGMENT,s.S_NAME AS S_NAME,s.S_ADDRESS AS S_ADDRESS,s.S_CITY AS S_CITY,s.S_NATION AS S_NATION,s.S_REGION AS S_REGION,s.S_PHONE AS S_PHONE,p.P_NAME AS P_NAME,p.P_MFGR AS P_MFGR,p.P_CATEGORY AS P_CATEGORY,p.P_BRAND AS P_BRAND,p.P_COLOR AS P_COLOR,p.P_TYPE AS P_TYPE,p.P_SIZE AS P_SIZE,p.P_CONTAINER AS P_CONTAINER
FROM lineorder AS l
INNER JOIN customer AS c ON c.C_CUSTKEY l.LO_CUSTKEY
INNER JOIN supplier AS s ON s.S_SUPPKEY l.LO_SUPPKEY
INNER JOIN part AS p ON p.P_PARTKEY l.LO_PARTKEY;二、基准测试
1.benchmark的使用
1.1 基本用法
# 以下几种写法都可以
$ clickhouse-benchmark --query [single query] [keys]
$ echo single query | clickhouse-benchmark [keys]
$ clickhouse-benchmark [keys] single query
clickhouse-benchmark [keys] queries_file;
# 比较两个clickhouse性能
$ echo SELECT * FROM system.numbers LIMIT 10000000 OFFSET 10000000 | clickhouse-benchmark --hostlocalhost --port9001 --hostlocalhost --port9000 -i 101.2 参数详解
--queryQUERY — 要执行的查询。 如果未传递此参数clickhouse-benchmark 将从标准输入读取查询。
-c N, --concurrencyN — clickhouse-benchmark 同时发送的查询数。 默认值1。
-d N, --delayN — 中间报告之间的间隔以秒为单位以禁用报告集 0。 默认值1。
-h HOST, --hostHOST — 服务器主机。 默认值本地主机。 对于比较模式您可以使用多个 -h 键。
-p N, --portN — 服务器端口。 默认值9000。对于比较模式您可以使用多个 -p 键。
-i N, --iterationsN — 查询总数。 默认值0永远重复。
-r, --randomize — 如果有多个输入查询则查询执行的随机顺序。
-s, --secure — 使用 TLS 连接。
-t N, --timelimitN — 时间限制以秒为单位。 当达到指定的时间限制时clickhouse-benchmark 将停止发送查询。 默认值0时间限制禁用。
--confidenceN — T 检验的置信度。 可能的值0 (80%)、1 (90%)、2 (95%)、3 (98%)、4 (99%)、5 (99.5%)。 默认值5。在比较模式下clickhouse-benchmark 执行独立双样本学生 t 检验以确定两个分布在所选置信水平下是否没有差异。
--cumulative — 打印累积数据而不是每个间隔的数据。
--databaseDATABASE_NAME — ClickHouse 数据库名称。 默认值默认。
--jsonFILEPATH — JSON 输出。 设置密钥后clickhouse-benchmark 会将报告输出到指定的 JSON 文件。
--userUSERNAME — ClickHouse 用户名。 默认值默认。
--passwordPSWD — ClickHouse 用户密码。 默认值空字符串。
--stacktrace — 堆栈跟踪输出。 设置密钥后clickhouse-bencmark 会输出异常的堆栈跟踪。
--stageWORD — 服务器上的查询处理阶段。 ClickHouse 在指定阶段停止查询处理并向 clickhouse-benchmark 返回答案。 可能的值complete、fetch_columns、with_mergeable_state。 默认值完整。
--help — 显示帮助消息。
如果要对查询应用某些设置请将它们作为键传递 --session setting name SETTING_VALUE。 例如--max_memory_usage1048576。1.3 结果分析
# 执行的查询数字段中的查询数。
Queries executed: 72 (1800.000%).
# ClickHouse 服务器的端点。
# queries已处理查询的数量。
# QPS在 --delay 参数指定的时间段内服务器每秒执行的查询数量。
# RPS在 --delay 参数指定的时间段内服务器每秒读取的行数。
# MiB/s在 --delay 参数中指定的时间段内服务器每秒读取多少兆字节。
# result RPS在 --delay 参数中指定的时间段内服务器每秒将多少行放入查询结果中。
# result MiB/s。 在 --delay 参数指定的时间段内服务器每秒向查询结果放置多少兆字节。localhost:9000, queries 2, QPS: 0.156, RPS: 432704682.870, MiB/s: 1370.478, result RPS: 2.185, result MiB/s: 0.000.
# 查询执行时间的百分位数。
0.000% 0.217 sec.
10.000% 0.217 sec.
20.000% 0.217 sec.
30.000% 0.217 sec.
40.000% 0.217 sec.
50.000% 12.594 sec.
60.000% 12.594 sec.
70.000% 12.594 sec.
80.000% 12.594 sec.
90.000% 12.594 sec.
95.000% 12.594 sec.
99.000% 12.594 sec.
99.900% 12.594 sec.
99.990% 12.594 sec.状态字符串包含按顺序ClickHouse 服务器的端点。
已处理查询的数量。
QPS在 --delay 参数指定的时间段内服务器每秒执行的查询数量。
RPS在 --delay 参数指定的时间段内服务器每秒读取的行数。
MiB/s在 --delay 参数中指定的时间段内服务器每秒读取多少兆字节。
结果 RPS在 --delay 参数中指定的时间段内服务器每秒将多少行放入查询结果中。
结果 MiB/s。 在 --delay 参数指定的时间段内服务器每秒向查询结果放置多少兆字节。
查询执行时间的百分位数。2.基本测试
基准测试的内容可以看官网具体的sql在这里查看。我是共写了4个sql文件内容如下
# test1.sql
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM db_bench.lineorder_flat WHERE toYear(LO_ORDERDATE) 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY 25;
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM db_bench.lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM db_bench.lineorder_flat WHERE toISOWeek(LO_ORDERDATE) 6 AND toYear(LO_ORDERDATE) 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;# test2.sql
SELECT sum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM db_bench.lineorder_flat WHERE P_CATEGORY MFGR#12 AND S_REGION AMERICA GROUP BY year,P_BRAND ORDER BY year,P_BRAND;
SELECT sum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM db_bench.lineorder_flat WHERE P_BRAND MFGR#2221 AND P_BRAND MFGR#2228 AND S_REGION ASIA GROUP BY year,P_BRAND ORDER BY year,P_BRAND;
SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM db_bench.lineorder_flat WHERE P_BRAND MFGR#2239 AND S_REGION EUROPE GROUP BY year, P_BRAND ORDER BY year, P_BRAND;# test3.sql
SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM db_bench.lineorder_flat WHERE C_REGION ASIA AND S_REGION ASIA AND year 1992 AND year 1997 GROUP BY C_NATION, S_NATION, year ORDER BY year ASC, revenue DESC;
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM db_bench.lineorder_flat WHERE C_NATION UNITED STATES AND S_NATION UNITED STATES AND year 1992 AND year 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC, revenue DESC;
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM db_bench.lineorder_flat WHERE (C_CITY UNITED KI1 OR C_CITY UNITED KI5) AND (S_CITY UNITED KI1 OR S_CITY UNITED KI5) AND year 1992 AND year 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC, revenue DESC;
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM db_bench.lineorder_flat WHERE (C_CITY UNITED KI1 OR C_CITY UNITED KI5) AND (S_CITY UNITED KI1 OR S_CITY UNITED KI5) AND toYYYYMM(LO_ORDERDATE) 199712 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC, revenue DESC;# test4.sql
SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM db_bench.lineorder_flat WHERE C_REGION AMERICA AND S_REGION AMERICA AND (P_MFGR MFGR#1 OR P_MFGR MFGR#2) GROUP BY year, C_NATION ORDER BY year ASC, C_NATION ASC;
SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM db_bench.lineorder_flat WHERE C_REGION AMERICA AND S_REGION AMERICA AND (year 1997 OR year 1998) AND (P_MFGR MFGR#1 OR P_MFGR MFGR#2) GROUP BY year, S_NATION, P_CATEGORY ORDER BY year ASC, S_NATION ASC, P_CATEGORY ASC;
SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM db_bench.lineorder_flat WHERE S_NATION UNITED STATES AND (year 1997 OR year 1998) AND P_CATEGORY MFGR#14 GROUP BY year, S_CITY, P_BRAND ORDER BY year ASC, S_CITY ASC, P_BRAND ASC;2.1 测试方法
clickhouse-benchmark test1.sql
clickhouse-benchmark test2.sql
clickhouse-benchmark test3.sql
clickhouse-benchmark test4.sql2.2 测试结果
# test1
Queries executed: 921 (30700.000%).localhost:9000, queries 2, QPS: 5.558, RPS: 263878534.377, MiB/s: 2012.050, result RPS: 5.558, result MiB/s: 0.000.0.000% 0.091 sec.
10.000% 0.091 sec.
20.000% 0.091 sec.
30.000% 0.091 sec.
40.000% 0.091 sec.
50.000% 0.268 sec.
60.000% 0.268 sec.
70.000% 0.268 sec.
80.000% 0.268 sec.
90.000% 0.268 sec.
95.000% 0.268 sec.
99.000% 0.268 sec.
99.900% 0.268 sec.# test2
Queries executed: 32 (1066.667%).localhost:9000, queries 1, QPS: 0.054, RPS: 326066467.053, MiB/s: 2797.293, result RPS: 3.043, result MiB/s: 0.000.0.000% 18.401 sec.
10.000% 18.401 sec.
20.000% 18.401 sec.
30.000% 18.401 sec.
40.000% 18.401 sec.
50.000% 18.401 sec.
60.000% 18.401 sec.
70.000% 18.401 sec.
80.000% 18.401 sec.
90.000% 18.401 sec.
95.000% 18.401 sec.
99.000% 18.401 sec.
99.900% 18.401 sec.
99.990% 18.401 sec.# test3
localhost:9000, queries 73, QPS: 0.082, RPS: 340111314.396, MiB/s: 2527.187, result RPS: 15.938, result MiB/s: 0.000.0.000% 0.182 sec.
10.000% 0.217 sec.
20.000% 0.230 sec.
30.000% 10.547 sec.
40.000% 12.614 sec.
50.000% 14.860 sec.
60.000% 16.560 sec.
70.000% 18.072 sec.
80.000% 18.285 sec.
90.000% 19.915 sec.
95.000% 19.962 sec.
99.000% 20.011 sec.
99.900% 20.059 sec.
99.990% 20.059 sec.# test4
Queries executed: 3 (100.000%).localhost:9000, queries 1, QPS: 0.474, RPS: 683988835.693, MiB/s: 9777.042, result RPS: 378.949, result MiB/s: 0.004.0.000% 2.111 sec.
10.000% 2.111 sec.
20.000% 2.111 sec.
30.000% 2.111 sec.
40.000% 2.111 sec.
50.000% 2.111 sec.
60.000% 2.111 sec.
70.000% 2.111 sec.
80.000% 2.111 sec.
90.000% 2.111 sec.
95.000% 2.111 sec.
99.000% 2.111 sec.
99.900% 2.111 sec.
99.990% 2.111 sec.2.3 cpu情况 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND7031 999 20 0 0.257t 1.470g 99080 S 4656 0.8 3643:13 clickhouse-serv2.4 读取数据情况 结论 可以看到读取数据的速度还是非常快的每秒读取的行数和数据量都很大读取时非常耗cpu资源但内存占用缺极少