MySQL advancedquerytechniques

LearningMySQL advancedquerytechniques, Master complex dataquery method

1. aggregatefunction

aggregatefunction用于 for 一组值执行计算并返回单个值, 常用于statistics and analysisdata. MySQLproviding了 many 种aggregatefunction:

1.1 常用aggregatefunction

  • COUNT(): 计算行数
  • SUM(): 计算数值总 and
  • AVG(): 计算数值平均值
  • MAX(): 获取最 big 值
  • MIN(): 获取最 small 值

1.2 aggregatefunctionexample

-- 计算user总数
SELECT COUNT(*) AS user_count FROM users;

-- 计算user总数 (排除NULL值) 
SELECT COUNT(email) AS user_count FROM users;

-- 计算工资总 and 
SELECT SUM(salary) AS total_salary FROM employees;

-- 计算工资平均值
SELECT AVG(salary) AS avg_salary FROM employees;

-- 获取最 high 工资
SELECT MAX(salary) AS max_salary FROM employees;

-- 获取最 low 工资
SELECT MIN(salary) AS min_salary FROM employees;

2. groupquery

groupquery用于将data按specific列group, 然 after for 每组dataapplicationaggregatefunction. usingGROUP BY子句implementationgroup.

2.1 basicgroupquery

-- 按部门group, 计算每个部门 员工数量
SELECT department, COUNT(*) AS employee_count 
FROM employees 
GROUP BY department;

-- 按部门group, 计算每个部门 平均工资
SELECT department, AVG(salary) AS avg_salary 
FROM employees 
GROUP BY department;

-- 按部门 and 性别group, 计算每个部门不同性别 员工数量
SELECT department, gender, COUNT(*) AS employee_count 
FROM employees 
GROUP BY department, gender;

2.2 group筛选

usingHAVING子句 for group结果for筛选 (class似于WHERE, 但用于group after ) :

-- 筛选员工数量 big 于5 部门
SELECT department, COUNT(*) AS employee_count 
FROM employees 
GROUP BY department 
HAVING COUNT(*) > 5;

-- 筛选平均工资 big 于8000 部门
SELECT department, AVG(salary) AS avg_salary 
FROM employees 
GROUP BY department 
HAVING AVG(salary) > 8000;

2.3 groupsort

-- 按部门group, 计算平均工资, 并按平均工资降序sort
SELECT department, AVG(salary) AS avg_salary 
FROM employees 
GROUP BY department 
ORDER BY avg_salary DESC;

3. 子query

子query is 嵌套 in otherSQL语句in query, 也称 for in 部query. 子query可以返回单个值, many 个值 or 结果集.

3.1 单行子query

返回单个值 子query, 通常用于比较operation:

-- query工资 high 于平均工资 员工
SELECT * FROM employees 
WHERE salary > (SELECT AVG(salary) FROM employees);

-- query and 张三同部门 员工
SELECT * FROM employees 
WHERE department = (SELECT department FROM employees WHERE name = '张三');

-- query工资最 high  员工
SELECT * FROM employees 
WHERE salary = (SELECT MAX(salary) FROM employees);

3.2 many 行子query

返回 many 个值 子query, 通常 and IN, ANY, ALLetc.operation符一起using:

-- query in techniques部 or 销售部 员工
SELECT * FROM employees 
WHERE department IN (SELECT department FROM departments WHERE department_name IN ('techniques部', '销售部'));

-- query工资 high 于techniques部任何员工 other部门员工
SELECT * FROM employees 
WHERE department != 'techniques部' 
AND salary > ANY (SELECT salary FROM employees WHERE department = 'techniques部');

-- query工资 high 于techniques部所 has 员工 other部门员工
SELECT * FROM employees 
WHERE department != 'techniques部' 
AND salary > ALL (SELECT salary FROM employees WHERE department = 'techniques部');

3.3 关联子query

子queryin引用了 out 部query 列, 形成关联relationships:

-- query每个部门in工资 high 于该部门平均工资 员工
SELECT e1.* FROM employees e1 
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.department = e1.department);

-- query每个部门in入职时间最早 员工
SELECT e1.* FROM employees e1 
WHERE hire_date = (SELECT MIN(hire_date) FROM employees e2 WHERE e2.department = e1.department);

3.4 子queryserving as表

子query结果serving as临时表using:

-- query每个部门 平均工资 and 员工数量
SELECT d.department_name, t.avg_salary, t.employee_count 
FROM departments d 
JOIN (SELECT department, AVG(salary) AS avg_salary, COUNT(*) AS employee_count 
      FROM employees 
      GROUP BY department) t 
ON d.department_id = t.department;

4. 联合query

联合query用于merge两个 or many 个SELECT语句 结果集, usingUNION or UNION ALLoperation符.

4.1 UNION operation符

merge结果集并去除重复行:

-- merge两个表 结果
SELECT name, age FROM students 
UNION 
SELECT name, age FROM teachers;

4.2 UNION ALL operation符

merge结果集但保留重复行:

-- merge两个表 结果 (保留重复) 
SELECT name, age FROM students 
UNION ALL 
SELECT name, age FROM teachers;

4.3 联合query 条件

  • 每个SELECT语句必须具 has 相同数量 列
  • for 应列 dataclass型必须兼容
  • 列 顺序必须相同

5. advancedWHERE子句

5.1 正则表达式

usingREGEXPoperation符for正则表达式匹配:

-- query邮箱以gmail.com结尾 user
SELECT * FROM users WHERE email REGEXP 'gmail\.com$';

-- queryuser名以a开头, 以e结尾 user
SELECT * FROM users WHERE username REGEXP '^a.*e$';

-- query电话号码格式 for XXX-XXXX-XXXX 记录
SELECT * FROM contacts WHERE phone REGEXP '^[0-9]{3}-[0-9]{4}-[0-9]{4}$';

5.2 模糊queryadvancedtechniques

-- queryuser名package含admin user
SELECT * FROM users WHERE username LIKE '%admin%';

-- queryuser名以a开头 user
SELECT * FROM users WHERE username LIKE 'a%';

-- queryuser名以e结尾 user
SELECT * FROM users WHERE username LIKE '%e';

-- queryuser名 for 5个字符 user
SELECT * FROM users WHERE username LIKE '_____';

6. 窗口function

窗口function (MySQL 8.0+) 用于 for 结果集 子集执行计算, 而不改变结果集 行数.

6.1 排名function

-- 按工资降序排名 (相同值排名相同,  after 续排名跳跃) 
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rank FROM employees;

-- 按工资降序排名 (相同值排名相同,  after 续排名不跳跃) 
SELECT name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees;

-- 按工资降序排名 (行号, 不考虑相同值) 
SELECT name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num FROM employees;

6.2 partition窗口function

-- 按部门partition, 每个部门 in 按工资降序排名
SELECT name, department, salary, 
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank 
FROM employees;

7. complex queryexample

7.1 完整example

-- creationexampledata
CREATE DATABASE IF NOT EXISTS company_db;
USE company_db;

CREATE TABLE IF NOT EXISTS departments (
    id INT PRIMARY KEY AUTO_INCREMENT,
    department_name VARCHAR(50) NOT NULL
);

CREATE TABLE IF NOT EXISTS employees (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50) NOT NULL,
    department_id INT NOT NULL,
    salary DECIMAL(10, 2) NOT NULL,
    hire_date DATE NOT NULL,
    FOREIGN KEY (department_id) REFERENCES departments(id)
);

-- 插入data
INSERT INTO departments (department_name) VALUES
('techniques部'),
('销售部'),
('人力resource部'),
('财务部');

INSERT INTO employees (name, department_id, salary, hire_date) VALUES
('张三', 1, 9000.00, '2023-01-15'),
('李四', 2, 6500.00, '2023-02-20'),
('王五', 1, 10000.00, '2023-03-10'),
('赵六', 3, 5000.00, '2023-04-05'),
('钱七', 1, 8500.00, '2023-05-18'),
('孙八', 2, 7000.00, '2023-06-22'),
('周九', 4, 6000.00, '2023-07-30'),
('吴十', 2, 7500.00, '2023-08-15');

-- 1. query每个部门 平均工资 and 员工数量
SELECT d.department_name, 
       COUNT(e.id) AS employee_count, 
       AVG(e.salary) AS avg_salary 
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.department_name;

-- 2. query工资 high 于where部门平均工资 员工
SELECT e1.name, d.department_name, e1.salary,
       (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e1.department_id) AS dept_avg_salary
FROM employees e1
JOIN departments d ON e1.department_id = d.id
WHERE e1.salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e1.department_id);

-- 3. 按部门group, 获取每个部门工资最 high  员工
SELECT d.department_name, e.name, e.salary
FROM departments d
JOIN employees e ON d.id = e.department_id
WHERE (e.department_id, e.salary) IN (
    SELECT department_id, MAX(salary) 
    FROM employees 
    GROUP BY department_id
);

-- 4. using窗口function排名
SELECT d.department_name, e.name, e.salary,
       RANK() OVER (PARTITION BY d.department_name ORDER BY e.salary DESC) AS dept_rank
FROM departments d
JOIN employees e ON d.id = e.department_id;

实践练习

  1. creation一个名 for school_db datalibrary
  2. in datalibraryincreationstudents表, package含id, name, class, score, genderetc.字段
  3. 向表in插入10条学生记录, 分布 in 3个不同 班级
  4. usingaggregatefunction计算所 has 学生 平均分数
  5. 按班级group, 计算每个班级 平均分数 and 学生数量
  6. query每个班级in分数最 high 学生
  7. using子queryquery分数 high 于全校平均分数 学生
  8. using联合querymerge不同班级 学生information
  9. using窗口function for 学生按分数排名
  10. using正则表达式query姓名inpackage含" small " 学生