www.colben.cn/content/post/ch-search.md
2021-11-14 14:32:08 +08:00

193 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "ClickHouse 数据查询"
date: 2020-10-08T19:27:00+08:00
lastmod: 2020-10-08T22:17:00+08:00
tags: []
categories: ["clickhouse"]
---
# 查询注意
- **避免使用 SELECT * 查询**
# WITH
- **WITH 子句只能返回一行数据**
- 定义变量
```sql
WITH 10 AS var_name SELECT ...
```
- 调用函数
```sql
WITH SUM(column_name) AS with_name SELECT ...
```
- 定义子查询
```sql
WITH (
SELECT ...
) AS with_name
SELECt ...
```
- WITH 子句可在子查询中嵌套使用
# FROM
- 支持表、表函数和子查询
- 可用 FINAL 修饰以强制合并,会降低性能,应尽量避免使用
# SAMPLE
- 返回采样数据,减少查询负载,适用于近似查询
- 只能用于 MergeTree 系列引擎表,且声明了 SAMPLE BY 抽样表达式
- 虚拟字段 \_sample_factor 是采样系数
- 不采样
```sql
SELECT ... SAMPLE 0
-- 或
SELECT ... SAMPLE 1
```
- SAMPLE factorfactor 是采样因子,取值 0~1
```sql
SELECT ... SAMPLE 0.1
-- 或者
SELECT ... SAMPLE 1/10
```
- SAMPLE rows采样**近似**行数,必须大于 1
```sql
SELECT ... SAMPLE 10000
```
- SAMPLE factor OFFSET n偏移 n\*100% 的数据量后才开始按 factor 因子采样,取值都在 0~1
```sql
SELECT ... SAMPLE 0.4 OFFSET 0.5
```
# ARRAY JOIN
- 允许在数据表内部,与数组或嵌套字段进行 JOIN 操作,操作时把数组或嵌套字段拆成多行
- 支持 INNER 和 LEFT默认 INNER
```sql
SELECT ... FROM table_name ARRAY JOIN column_name AS alias_name
SELECT ... FROM table_name LEFT ARRAY JOIN column_name AS alias_name
```
# JOIN
## 连接精度
- ALL: 默认,左表的每行数据,在右表中有多行连接匹配,返回右表全部连接数据
- ANY: 左表的每行数据,在右表中有多行连接匹配,返回右表第一行连接数据
- ASOF: 增加模糊连接条件,对应字段必须是整数、浮点数和日期这类有序数据类型
```sql
SELECT ... FROM table_a ASOF INNER JOIN table_b USING(key_1, key_2)
-- key_1 字段是 join keykey_2 是模糊连接条件字段
```
## 连接类型
- INNER: 内连接,返回交集部分
- OUTER: 外链接
- LEFT: 左表数据全部返回,右表匹配则返回,不匹配则填充相应字段的默认值
- RIGHT: 与 LEFT 相反
- FULL: 先 LEFT右表剩下的数据再 RIGHT
- CROSS: 交叉连接,返回笛卡儿积
## JOIN 查询优化
- 左大右小,小表放右侧,右表会被加载到内存中
- JOIN 查询无缓存,应用可考虑实现查询缓存
- 大量维度属性补全时,建议使用字典表代替 JOIN 查询
- USING 语法简写
```sql
SELECT ... FROM table_1 INNTER JOIN table_2 USING key_1
```
# PREWHERE
- 只能用于 MergeTree 系列表引擎
- 与 WHERE 不同之处:
- 只读取 PREWHERE 指定的列字段,条件过滤
- 根据过滤好的数据再读取 SELECT 指定的列字段
- clickhouse 会在合适条件下自动把 WHERE 替换成 PREWHERE
# GROUP BY
- WITH ROLLUP按聚合键从右向左上卷数据基于聚合函数依次生成分组小计和总计
```sql
SELECT table, name, SUM(bytes_on_disk) FROM system.parts
GROUP BY table,name
WITH ROLLUP
ORDER BY table
```
- WITH CUBE基于聚合键之间的所有组合生成小计信息
```sql
SELECT ...
GROUP BY key1,key2,key3, ...
WITH CUBE
...
```
- WITH TOTALS常规聚合完成后增加一行对所有数据的汇总统计
```sql
SELECT ...
GROUP BY key1
WITH TOTALS
...
```
# HAVING
- 必须与 GROUP BY 配合使用,把聚合结果二次过滤
```sql
SELECT ... GROUP BY ... HAVING ...
```
# ORDER BY
- 默认 ASC(升序)
- NULLS LAST默认其他值 -> NaN -> NULL
- NULLS FIRSTNULL -> NaN -> 其他值
# LIMIT BY
- 返回指定分组的最多前 n 行数据
```sql
LIMIT n BY key1,key2 ...
```
- 支持 OFFSET
```sql
LIMIT n OFFSET m BY key1,key2 ...
-- 简写
LIMIT m,n BY key1,key2 ...
```
# LIMIT
- 返回指定的前 n 行数据
```sql
LIMIT n
LIMIT n OFFSET m
LIMIT m,n
```
- 推荐搭配 ORDER BY保证全局顺序
# SELECT
- 查询正则匹配的列字段
```sql
SELECT COLUMNS('^n'), COLUMNS('p') FROM system.databases
```
# DISTINCT
- 去重
- 先 DISTINCT 后 ORDER BY
# UNION ALL
- 联合左右两边的子查询,一并返回结果,可多次声明使用联合多组查询
```sql
SELECT c1, c2 FROM t1 UNION ALL SELECT c3, c4 FROM t2
```
- 两边列字段数量必须一样,类型兼容,查询结果列名以左侧为准
# SQL 执行计划
- 设置日志到 DEBUG 或 TRACE 级别,可查看 SQL 执行日志
- SQL 需真正执行后才有日志,如果查询量大,推荐 LIMIT
- **不要用 SELECT * 查询**
- 尽可能利用索引,避免全表扫描