193 lines
4.9 KiB
Markdown
193 lines
4.9 KiB
Markdown
---
|
||
title: "ClickHouse 数据查询"
|
||
date: 2020-10-08T19:27:00+08:00
|
||
lastmod: 2020-10-08T22:17:00+08:00
|
||
tags: []
|
||
categories: ["clickhouse"]
|
||
---
|
||
|
||
# 查询注意
|
||
- **避免使用 SELECT * 查询**
|
||
|
||
# WITH
|
||
- **WITH 子句只能返回一行数据**
|
||
- 定义变量
|
||
```sql
|
||
WITH 10 AS var_name SELECT ...
|
||
```
|
||
|
||
- 调用函数
|
||
```sql
|
||
WITH SUM(column_name) AS with_name SELECT ...
|
||
```
|
||
|
||
- 定义子查询
|
||
```sql
|
||
WITH (
|
||
SELECT ...
|
||
) AS with_name
|
||
SELECt ...
|
||
```
|
||
|
||
- WITH 子句可在子查询中嵌套使用
|
||
|
||
# FROM
|
||
- 支持表、表函数和子查询
|
||
- 可用 FINAL 修饰以强制合并,会降低性能,应尽量避免使用
|
||
|
||
# SAMPLE
|
||
- 返回采样数据,减少查询负载,适用于近似查询
|
||
- 只能用于 MergeTree 系列引擎表,且声明了 SAMPLE BY 抽样表达式
|
||
- 虚拟字段 \_sample_factor 是采样系数
|
||
- 不采样
|
||
```sql
|
||
SELECT ... SAMPLE 0
|
||
-- 或
|
||
SELECT ... SAMPLE 1
|
||
```
|
||
|
||
- SAMPLE factor,factor 是采样因子,取值 0~1
|
||
```sql
|
||
SELECT ... SAMPLE 0.1
|
||
-- 或者
|
||
SELECT ... SAMPLE 1/10
|
||
```
|
||
|
||
- SAMPLE rows,采样**近似**行数,必须大于 1
|
||
```sql
|
||
SELECT ... SAMPLE 10000
|
||
```
|
||
|
||
- SAMPLE factor OFFSET n,偏移 n\*100% 的数据量后才开始按 factor 因子采样,取值都在 0~1
|
||
```sql
|
||
SELECT ... SAMPLE 0.4 OFFSET 0.5
|
||
```
|
||
|
||
# ARRAY JOIN
|
||
- 允许在数据表内部,与数组或嵌套字段进行 JOIN 操作,操作时把数组或嵌套字段拆成多行
|
||
- 支持 INNER 和 LEFT,默认 INNER
|
||
```sql
|
||
SELECT ... FROM table_name ARRAY JOIN column_name AS alias_name
|
||
SELECT ... FROM table_name LEFT ARRAY JOIN column_name AS alias_name
|
||
```
|
||
|
||
# JOIN
|
||
## 连接精度
|
||
- ALL: 默认,左表的每行数据,在右表中有多行连接匹配,返回右表全部连接数据
|
||
- ANY: 左表的每行数据,在右表中有多行连接匹配,返回右表第一行连接数据
|
||
- ASOF: 增加模糊连接条件,对应字段必须是整数、浮点数和日期这类有序数据类型
|
||
```sql
|
||
SELECT ... FROM table_a ASOF INNER JOIN table_b USING(key_1, key_2)
|
||
-- key_1 字段是 join key,key_2 是模糊连接条件字段
|
||
```
|
||
|
||
## 连接类型
|
||
- INNER: 内连接,返回交集部分
|
||
- OUTER: 外链接
|
||
- LEFT: 左表数据全部返回,右表匹配则返回,不匹配则填充相应字段的默认值
|
||
- RIGHT: 与 LEFT 相反
|
||
- FULL: 先 LEFT,右表剩下的数据再 RIGHT
|
||
|
||
- CROSS: 交叉连接,返回笛卡儿积
|
||
|
||
## JOIN 查询优化
|
||
- 左大右小,小表放右侧,右表会被加载到内存中
|
||
- JOIN 查询无缓存,应用可考虑实现查询缓存
|
||
- 大量维度属性补全时,建议使用字典表代替 JOIN 查询
|
||
- USING 语法简写
|
||
```sql
|
||
SELECT ... FROM table_1 INNTER JOIN table_2 USING key_1
|
||
```
|
||
|
||
# PREWHERE
|
||
- 只能用于 MergeTree 系列表引擎
|
||
- 与 WHERE 不同之处:
|
||
- 只读取 PREWHERE 指定的列字段,条件过滤
|
||
- 根据过滤好的数据再读取 SELECT 指定的列字段
|
||
|
||
- clickhouse 会在合适条件下自动把 WHERE 替换成 PREWHERE
|
||
|
||
# GROUP BY
|
||
- WITH ROLLUP,按聚合键从右向左上卷数据,基于聚合函数依次生成分组小计和总计
|
||
```sql
|
||
SELECT table, name, SUM(bytes_on_disk) FROM system.parts
|
||
GROUP BY table,name
|
||
WITH ROLLUP
|
||
ORDER BY table
|
||
```
|
||
|
||
- WITH CUBE,基于聚合键之间的所有组合生成小计信息
|
||
```sql
|
||
SELECT ...
|
||
GROUP BY key1,key2,key3, ...
|
||
WITH CUBE
|
||
...
|
||
```
|
||
|
||
- WITH TOTALS,常规聚合完成后,增加一行对所有数据的汇总统计
|
||
```sql
|
||
SELECT ...
|
||
GROUP BY key1
|
||
WITH TOTALS
|
||
...
|
||
```
|
||
|
||
# HAVING
|
||
- 必须与 GROUP BY 配合使用,把聚合结果二次过滤
|
||
```sql
|
||
SELECT ... GROUP BY ... HAVING ...
|
||
```
|
||
|
||
# ORDER BY
|
||
- 默认 ASC(升序)
|
||
- NULLS LAST,默认,其他值 -> NaN -> NULL
|
||
- NULLS FIRST,NULL -> NaN -> 其他值
|
||
|
||
# LIMIT BY
|
||
- 返回指定分组的最多前 n 行数据
|
||
```sql
|
||
LIMIT n BY key1,key2 ...
|
||
```
|
||
|
||
- 支持 OFFSET
|
||
```sql
|
||
LIMIT n OFFSET m BY key1,key2 ...
|
||
-- 简写
|
||
LIMIT m,n BY key1,key2 ...
|
||
```
|
||
|
||
# LIMIT
|
||
- 返回指定的前 n 行数据
|
||
```sql
|
||
LIMIT n
|
||
LIMIT n OFFSET m
|
||
LIMIT m,n
|
||
```
|
||
|
||
- 推荐搭配 ORDER BY,保证全局顺序
|
||
|
||
# SELECT
|
||
- 查询正则匹配的列字段
|
||
```sql
|
||
SELECT COLUMNS('^n'), COLUMNS('p') FROM system.databases
|
||
```
|
||
|
||
# DISTINCT
|
||
- 去重
|
||
- 先 DISTINCT 后 ORDER BY
|
||
|
||
# UNION ALL
|
||
- 联合左右两边的子查询,一并返回结果,可多次声明使用联合多组查询
|
||
```sql
|
||
SELECT c1, c2 FROM t1 UNION ALL SELECT c3, c4 FROM t2
|
||
```
|
||
|
||
- 两边列字段数量必须一样,类型兼容,查询结果列名以左侧为准
|
||
|
||
# SQL 执行计划
|
||
- 设置日志到 DEBUG 或 TRACE 级别,可查看 SQL 执行日志
|
||
- SQL 需真正执行后才有日志,如果查询量大,推荐 LIMIT
|
||
- **不要用 SELECT * 查询**
|
||
- 尽可能利用索引,避免全表扫描
|
||
|