Hive Introduction and installationconfiguration

1. Hive basicconcepts

Apache Hive is a 基于 Hadoop data仓librarytool, providing了class似 SQL querylanguage (HiveQL) , 允许userusing SQL 语法query and analysisstore in HDFS or otherdistributedstoresystemin large-scaledata.

1.1 Hive core特点

class SQL 语法: providingclass似 SQL HiveQL, 降 low Learning门槛
可scale性: 基于 Hadoop, support水平scale
flexible性: support many 种store格式 and 压缩方式
data仓libraryfunctions: supportpartition, 分桶, indexetc.
集成性: and Hadoop ecosystem无缝集成

1.2 Hive and 传统datalibrary 区别

features	Hive	传统datalibrary
querylanguage	HiveQL (class SQL)	SQL
执行引擎	MapReduce/Tez/Spark	in 置执行引擎
latency	high latency (批processing)	low latency (实时processing)
data规模	PB 级	GB/TB 级
updateoperation	has 限support	完全support

2. Hive architecture

Hive adopts分层architecturedesign, 主要including以 under component:

2.1 userinterface层

CLI: commands行interface, 直接 and Hive 交互
HiveServer2: support JDBC/ODBC 连接, 允许远程访问
Web UI: providing Web 界面, 方便management and monitor

2.2 驱动层

JDBC 驱动: Java datalibrary连接驱动
ODBC 驱动: 开放datalibrary连接驱动
Thrift 客户端: 基于 Thrift protocol 客户端

2.3 coreservice层

query解析器: 将 HiveQL 转换 for abstraction语法tree
queryoptimization器: optimizationquery计划, improving执行efficiency
执行计划生成器: 生成可执行 job计划

2.4 元datastore

store Hive 元datainformation, including:

datalibrary and 表 structure
partition and 分桶information
列class型 and property
表 store位置 and 格式

2.5 执行引擎

MapReduce: 传统批processing引擎
Tez: optimization DAG 执行引擎
Spark: fast 速 memory计算引擎

3. Hive installation before 准备

3.1 environment要求

Java: JDK 1.8 or 更 high version
Hadoop: Hadoop 2.x or 3.x version
operationsystem: Linux (推荐 CentOS/Ubuntu)

3.2 installation Hadoop (简要)

such as果还没 has installation Hadoop, 可以按照以 under 步骤forinstallation:

#  under 载 Hadoop
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz

# 解压file
tar -zxvf hadoop-3.3.4.tar.gz

# configurationenvironmentvariable
echo 'export HADOOP_HOME=/path/to/hadoop-3.3.4' >> ~/.bashrc
echo 'export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin' >> ~/.bashrc
source ~/.bashrc

3.3 configuration Hadoop

需要configuration Hadoop corefile:

core-site.xml: configuration Hadoop coreparameter
hdfs-site.xml: configuration HDFS parameter
yarn-site.xml: configuration YARN parameter
mapred-site.xml: configuration MapReduce parameter

4. Hive installationconfiguration

4.1 under 载 Hive

#  under 载 Hive 3.1.3
wget https://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz

# 解压file
tar -zxvf apache-hive-3.1.3-bin.tar.gz

4.2 configurationenvironmentvariable

echo 'export HIVE_HOME=/path/to/apache-hive-3.1.3-bin' >> ~/.bashrc
echo 'export PATH=$PATH:$HIVE_HOME/bin' >> ~/.bashrc
source ~/.bashrc

4.3 configuration Hive corefile

4.3.1 creation hive-site.xml

cd $HIVE_HOME/conf
cp hive-default.xml.template hive-site.xml

4.3.2 configuration元datastore

默认circumstances under , Hive using Derby datalibrarystore元data. 但 in produceenvironmentin, 通常using MySQL or PostgreSQL.

using MySQL serving as元datastore:

installation MySQL:

sudo yum install mysql-server mysql-devel

creation Hive datalibrary and user:

mysql -u root -p
CREATE DATABASE hive_meta;
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON hive_meta.* TO 'hive'@'localhost';
FLUSH PRIVILEGES;

configuration hive-site.xml:

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost:3306/hive_meta?createDatabaseIfNotExist=true</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
</property>

under 载 MySQL 驱动:

wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.28.tar.gz
tar -zxvf mysql-connector-java-8.0.28.tar.gz
cp mysql-connector-java-8.0.28/mysql-connector-java-8.0.28.jar $HIVE_HOME/lib/

4.4 初始化元data

schematool -dbType mysql -initSchema

4.5 启动 Hive

# 启动 Hive commands行客户端
hive

#  or 者启动 Beeline 客户端
beeline -u jdbc:hive2://localhost:10000

4. Hive clusterinstallation

in produceenvironmentin, 通常需要installation Hive cluster. 以 under is 简要步骤:

4.1 configurationplanning

主node: installation Hive service and HiveServer2
from node: 无需installation Hive
元datastore: 独立 MySQL or PostgreSQL datalibrary

4.2 installation步骤

in 主node on installation Hive (同 on )
configuration元datastore (同 on )
in 所 has node on configuration Hive environmentvariable
启动 HiveServer2:

hiveserver2 &

verification连接:

beeline -u jdbc:hive2://master:10000 -n username

5. commonissues and solution

5.1 初始化元data失败

issues: 执行 schematool commands时失败

solution:

check MySQL service is 否正 in run
check MySQL user名 and password is 否正确
check MySQL 驱动 is 否正确放置 in $HIVE_HOME/lib Table of Contents under

5.2 HiveServer2 无法启动

issues: 执行 hiveserver2 commands after 无法启动

solution:

check Hadoop cluster is 否正常run
check hive-site.xml configuration is 否正确
查看logfile ($HIVE_HOME/logs) 获取详细errorinformation

5.3 客户端连接失败

issues: using Beeline 连接 HiveServer2 失败

solution:

checknetwork连接 is 否正常
check HiveServer2 is 否正 in run
check防火墙设置, 确保端口 10000 开放

实践练习

练习1: Hive 单nodeinstallation

准备一台 Linux server
installation JDK 1.8
installation Hadoop 3.x
installation MySQL
installation并configuration Hive
初始化元data
启动 Hive 客户端并执行 simple query

练习2: Hive clusterinstallation

准备 3 台 Linux server
搭建 Hadoop cluster
in 主node on installation并configuration Hive
configuration MySQL serving as元datastore
启动 HiveServer2
from othernodeusing Beeline 连接 Hive

练习3: configurationoptimization

modify hive-site.xml in logpath
configuration Hive 执行引擎 for Tez
configuration Hive memoryusing

6. summarized

本tutorial介绍了 Hive basicconcepts, architecture and installationconfigurationmethod, including:

Hive core特点 and and 传统datalibrary 区别
Hive 分层architecturedesign
Hive 单nodeinstallation 详细步骤
using MySQL serving as元datastore
Hive clusterinstallation basic流程
common problem solutions

through本tutorial Learning, 您应该able to成功installation and configuration Hive environment, 并开始using Hive fordataquery and analysis.

返回tutoriallist under 一课: HiveQL Basics语法