HiveQL Basics语法

LearningHiveQL basic语法, includingdatalibrary and 表 creation, data加载, basicquery and deleteoperation

HiveQL Basics语法

1. HiveQL Introduction

HiveQL (Hive Query Language) is Hive providing class似 SQL querylanguage, 允许userusing熟悉 SQL 语法query and analysisstore in Hadoop distributedstoresystemin large-scaledata.

1.1 HiveQL and 标准 SQL 区别

  • HiveQL support Hadoop ecosystem specificfunctions, such aspartition, 分桶etc.
  • HiveQL 不support所 has 标准 SQL features, such astransaction (早期version) , update and deleteoperation 限制
  • HiveQL query会转换 for MapReduce, Tez or Spark job执行
  • HiveQL support自定义function and scale

1.2 HiveQL 主要functions

  • datalibrary and 表 creation, modify and delete
  • data加载 and export
  • dataquery and analysis
  • dataaggregate and statistics
  • data连接 and 子query
  • partition and 分桶management

2. datalibraryoperation

2.1 creationdatalibrary

creationdatalibrary basic语法:

CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];

example:

-- creation名 for  test_db  datalibrary
CREATE DATABASE test_db;

-- creationdatalibrary并添加comment
CREATE DATABASE IF NOT EXISTS test_db 
COMMENT 'This is a test database'
LOCATION '/user/hive/warehouse/test_db.db'
WITH DBPROPERTIES ('creator'='admin', 'create_date'='2025-01-15');

2.2 查看datalibrary

-- 查看所 has datalibrary
SHOW DATABASES;

-- 模糊querydatalibrary
SHOW DATABASES LIKE 'test*';

-- 查看datalibrary详细information
DESCRIBE DATABASE test_db;

-- 查看datalibraryscaleinformation
DESCRIBE DATABASE EXTENDED test_db;

2.3 usingdatalibrary

-- using指定datalibrary
USE test_db;

2.4 modifydatalibrary

-- modifydatalibraryproperty
ALTER DATABASE test_db SET DBPROPERTIES ('creator'='user1');

-- modifydatalibrary位置
ALTER DATABASE test_db SET LOCATION '/new/path';

2.5 deletedatalibrary

-- delete空datalibrary
DROP DATABASE test_db;

-- 强制deletedatalibrary (package含表) 
DROP DATABASE IF EXISTS test_db CASCADE;

3. 表operation

3.1 creation表

creation表 basic语法:

CREATE TABLE [IF NOT EXISTS] table_name
(col_name data_type [COMMENT col_comment], ...)
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) 
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)];
dataclass型
  • basicdataclass型: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, STRING, VARCHAR, CHAR, TIMESTAMP, DATE
  • complex dataclass型: ARRAY, MAP, STRUCT, UNION
store格式
  • TEXTFILE: 文本file格式 (默认)
  • ORC: optimization 行列store格式
  • PARQUET: 列式store格式
  • SEQUENCEFILE: 二进制序列file格式
  • AVRO: 基于JSON 序列化格式

example:

-- creationbasic表
CREATE TABLE IF NOT EXISTS employees (
    id INT,
    name STRING,
    age INT,
    department STRING,
    salary DOUBLE
) COMMENT 'Employee information'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/hive/warehouse/employees';

-- creation带partition 表
CREATE TABLE IF NOT EXISTS logs (
    id INT,
    user_id STRING,
    action STRING,
    timestamp BIGINT
) COMMENT 'User logs'
PARTITIONED BY (date STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS ORC;

-- creation complex class型表
CREATE TABLE IF NOT EXISTS students (
    id INT,
    name STRING,
    courses ARRAY<STRING>,
    scores MAP<STRING, INT>,
    address STRUCT<city:STRING, street:STRING, zip:INT>
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n';

3.2 查看表

-- 查看当 before datalibraryin 所 has 表
SHOW TABLES;

-- 模糊query表
SHOW TABLES LIKE 'emp*';

-- 查看表structure
DESCRIBE employees;
DESC employees;

-- 查看表详细information
DESCRIBE EXTENDED employees;
DESC FORMATTED employees;

3.3 modify表

-- modify表名
ALTER TABLE employees RENAME TO new_employees;

-- 添加列
ALTER TABLE employees ADD COLUMNS (email STRING COMMENT 'Employee email');

-- modify列
ALTER TABLE employees CHANGE COLUMN age employee_age INT COMMENT 'Employee age';

-- replace列
ALTER TABLE employees REPLACE COLUMNS (
    id INT,
    name STRING,
    age INT,
    email STRING
);

-- modify表property
ALTER TABLE employees SET TBLPROPERTIES ('creator'='admin');

3.4 delete表

-- delete表
DROP TABLE IF EXISTS employees;

-- 清空表data (保留表structure) 
TRUNCATE TABLE employees;

4. data加载 and export

4.1 加载data to 表in

from 本地filesystem加载data:

LOAD DATA LOCAL INPATH '/path/to/local/file' 
[OVERWRITE] INTO TABLE table_name 
[PARTITION (partition_column=partition_value, ...)];

from HDFS 加载data:

LOAD DATA INPATH '/path/to/hdfs/file' 
[OVERWRITE] INTO TABLE table_name 
[PARTITION (partition_column=partition_value, ...)];

example:

--  from 本地加载data to  employees 表
LOAD DATA LOCAL INPATH '/home/user/employees.csv' 
INTO TABLE employees;

--  from  HDFS 加载data并覆盖现 has data
LOAD DATA INPATH '/user/hive/input/employees.csv' 
OVERWRITE INTO TABLE employees;

-- 加载data to 指定partition
LOAD DATA LOCAL INPATH '/home/user/logs_2025-01-15.txt' 
INTO TABLE logs 
PARTITION (date='2025-01-15');

4.2 from query结果插入data

-- 插入单行data
INSERT INTO TABLE employees VALUES (1, 'John', 30, 'IT', 50000.0);

--  from query结果插入data
INSERT OVERWRITE TABLE employees_backup 
SELECT * FROM employees WHERE department='IT';

--  many 插入
FROM employees
INSERT OVERWRITE TABLE it_employees SELECT * WHERE department='IT'
INSERT OVERWRITE TABLE hr_employees SELECT * WHERE department='HR';

4.3 exportdata

-- exportdata to  HDFS
INSERT OVERWRITE DIRECTORY '/user/hive/output/employees' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
SELECT * FROM employees;

-- exportdata to 本地filesystem
export table employees to '/path/to/local/export';

5. basicqueryoperation

5.1 SELECT 语句

basic语法:

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list [ASC|DESC]]
[LIMIT [offset,] row_count];

example:

-- query所 has 列
SELECT * FROM employees;

-- queryspecific列
SELECT id, name, department FROM employees;

-- query去重结果
SELECT DISTINCT department FROM employees;

-- using别名
SELECT id AS employee_id, name AS employee_name FROM employees;

-- 带条件query
SELECT * FROM employees WHERE department='IT' AND salary > 50000;

-- sortquery
SELECT * FROM employees ORDER BY salary DESC, name ASC;

-- 限制结果数量
SELECT * FROM employees LIMIT 10;
SELECT * FROM employees LIMIT 5 OFFSET 10;

5.2 aggregatefunction

常用aggregatefunction:

  • COUNT(): 计算行数
  • SUM(): 计算总 and
  • AVG(): 计算平均值
  • MIN(): 计算最 small 值
  • MAX(): 计算最 big 值

example:

-- 计算员工总数
SELECT COUNT(*) AS total_employees FROM employees;

-- 计算平均工资
SELECT AVG(salary) AS avg_salary FROM employees;

-- 计算各部门员工数量
SELECT department, COUNT(*) AS emp_count FROM employees GROUP BY department;

-- 计算各部门平均工资并筛选
SELECT department, AVG(salary) AS avg_salary 
FROM employees 
GROUP BY department 
HAVING AVG(salary) > 50000;

5.3 连接query

support 连接class型:

  • INNER JOIN: in 连接
  • LEFT JOIN: left 连接
  • RIGHT JOIN: right 连接
  • FULL OUTER JOIN: 全 out 连接
  • CROSS JOIN: 交叉连接

example:

-- creation部门表
CREATE TABLE departments (
    dept_id INT,
    dept_name STRING,
    location STRING
);

--  in 连接
SELECT e.id, e.name, e.department, d.location 
FROM employees e 
INNER JOIN departments d ON e.department = d.dept_name;

--  left 连接
SELECT e.id, e.name, d.dept_name, d.location 
FROM employees e 
LEFT JOIN departments d ON e.department = d.dept_name;

--  right 连接
SELECT e.id, e.name, d.dept_name, d.location 
FROM employees e 
RIGHT JOIN departments d ON e.department = d.dept_name;

-- 全 out 连接
SELECT e.id, e.name, d.dept_name, d.location 
FROM employees e 
FULL OUTER JOIN departments d ON e.department = d.dept_name;

5.4 子query

example:

-- 子queryserving as条件
SELECT * FROM employees 
WHERE department IN (SELECT dept_name FROM departments WHERE location='Beijing');

-- 子queryserving as表
SELECT dept_name, emp_count 
FROM (
    SELECT department AS dept_name, COUNT(*) AS emp_count 
    FROM employees 
    GROUP BY department
) AS dept_stats 
WHERE emp_count > 10;

-- EXISTS 子query
SELECT * FROM employees e 
WHERE EXISTS (SELECT 1 FROM departments d WHERE e.department = d.dept_name AND d.location='Shanghai');

6. commonfunction

6.1 stringfunction

-- string long 度
SELECT LENGTH(name) FROM employees;

-- string拼接
SELECT CONCAT(name, ' - ', department) FROM employees;

-- string分割
SELECT SPLIT('a,b,c', ',');

-- stringreplace
SELECT REPLACE('Hello World', 'World', 'Hive');

-- string截取
SELECT SUBSTRING(name, 1, 3) FROM employees;

-- string转 big 写/ small 写
SELECT UPPER(name), LOWER(name) FROM employees;

6.2 数值function

-- 绝 for 值
SELECT ABS(-100);

-- 向 on /向 under 取整
SELECT CEIL(3.14), FLOOR(3.14);

-- 四舍五入
SELECT ROUND(3.14159, 2);

-- 随机数
SELECT RAND();

-- 幂运算
SELECT POW(2, 3);

-- 取模
SELECT MOD(10, 3);

6.3 日期function

-- 当 before 日期/时间
SELECT CURRENT_DATE(), CURRENT_TIMESTAMP();

-- 日期format
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP(), 'yyyy-MM-dd HH:mm:ss');

-- 日期差值
SELECT DATEDIFF('2025-01-15', '2025-01-01');

-- 日期添加
SELECT DATE_ADD('2025-01-01', 10);

-- 日期提取
SELECT YEAR('2025-01-15'), MONTH('2025-01-15'), DAY('2025-01-15');

6.4 条件function

-- CASE 语句
SELECT name, salary, 
    CASE 
        WHEN salary > 80000 THEN 'High' 
        WHEN salary > 50000 THEN 'Medium' 
        ELSE 'Low' 
    END AS salary_level 
FROM employees;

-- IF function
SELECT name, IF(salary > 50000, 'High', 'Low') AS salary_level FROM employees;

-- COALESCE function
SELECT COALESCE(email, 'no-email@example.com') FROM employees;

实践练习

练习1: datalibraryoperation

  1. creation一个名 for company_db datalibrary, 添加适当 comment and property
  2. 查看所 has datalibrary, 确认 company_db 已creation
  3. using company_db datalibrary
  4. 查看 company_db 详细information
  5. modify company_db property

练习2: 表operation

  1. in company_db increation employees 表, package含 id, name, age, department, salary 列
  2. creation一个带partition orders 表, 按日期partition
  3. 查看 employees 表 structure and 详细information
  4. 向 employees 表添加 email 列
  5. 将 employees 表rename for staff

练习3: data加载 and query

  1. creation一个本地 CSV file, package含员工data
  2. 将本地 CSV file加载 to staff 表in
  3. query所 has 员工information
  4. query IT 部门 员工, 按工资降序sort
  5. 计算各部门 员工数量 and 平均工资
  6. query工资 high 于平均工资 员工

练习4: functionusing

  1. usingstringfunctionprocessing员工姓名
  2. using数值function计算员工工资 各种statistics值
  3. using日期functionprocessing日期data
  4. using条件function for 员工工资foretc.级划分

7. summarized

本tutorial介绍了 HiveQL basic语法 and 常用operation, including:

  • HiveQL basicconcepts and and 标准 SQL 区别
  • datalibrary creation, 查看, using and deleteoperation
  • 表 creation, modify, 查看 and deleteoperation
  • data加载 and exportmethod
  • basicqueryoperation, including SELECT, WHERE, ORDER BY, LIMIT etc.
  • aggregatefunction and groupquery
  • 连接query and 子query
  • commonfunction using

through本tutorial Learning, 您应该able toMaster HiveQL basic语法, 并able tousing HiveQL forbasic dataquery and analysisoperation.