1. Hive版本与下载地址
  2. 项目编译
  3. 配置Hive
  4. 在Intellij IDEA中导入与调试项目

1. Hive版本与下载地址

http://archive.apache.org/dist/hive/hive-1.2.1/

2. 项目编译

依赖:Apache Maven 3.6.0、JDK 1.8.0_144、Hadoop 2.X

打包命令:mvn clean package -Phadoop-2 -DskipTests -Pdist

clean表示删除$HIVE_HOME/packaging/target目录 -Pdist表示使用pom.xml中名为dist的profile;-Phadoop-2表示支持hadoop 2;-DskipTests表示跳过测试。当命令执行完毕后,我们可以在“apache-hive-1.2.1-src/packaging/target/apache-hive-1.2.1-bin/apache-hive-1.2.1-bin”找到编译完成的完整的项目。为了能够正常使用编译好的Hive,我们对它进行相应的配置。

3. 配置Hive

为了能够中终端中使用hive命令,我们在.bashrc(Mac OS为.bash_profile文件)中追加

export HIVE_HOME=/Users/pwrliang/Projects/apache-hive-1.2.1-src/packaging/target/apache-hive-1.2.1-bin/apache-hive-1.2.1-bin
export PATH=$PATH:$HIVE_HOME/bin

我们在conf/hive-site.xml中添加以下内容来配置metadata存贮的位置,下面的配置文件使用derby数据库存储metadata。mapreduce.framework.name变量表示,我们尽可能的使用local模式来执行SQL。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->

<configuration>
   <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse-testing</value>
      <description>Local or HDFS directory where Hive keeps table contents.</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:derby:;databaseName=/Users/pwrliang/Projects/apache-hive-1.2.1-src/packaging/target/apache-hive-1.2.1-bin/apache-hive-1.2.1-bin/metastore_db;create=true</value>
      <description>The JDBC connection URL.</description>
   </property>
   <property>
      <name>mapreduce.framework.name</name>
      <value>local</value>
   </property>
   <property>
     <name>hive.querylog.location</name>
     <value>/tmp/hive-log/${user.name}</value>
     <description>Location of Hive run time structured log file</description>
   </property>
</configuration>

我们接下来修改hive-env.sh,中其中指定HADOOP_HOME与HIVE_CONF_DIR的位置

# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/Users/pwrliang/hadoop-2.7.7

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/Users/pwrliang/Projects/apache-hive-1.2.1-src/packaging/target/apache-hive-1.2.1-bin/apache-hive-1.2.1-bin/conf

当上述配置修改完毕后,我们使用source ~/.bashrc命令使终端中的环境变量生效。接下来执行hive,创建新表来测试环境是否配置成功。

4. 在Intellij IDEA中导入与调试项目

点击File->Close关闭当前项目,接下来点击Import Project导入“apache-hive-1.2.1-src”。当导入完毕后,我们点击Idea右边的Maven,展开Profiles,勾选hadoop-2。

我们点击Run-Edit Configurations,中Templates中选择Remote。新建一个远程调试的Configuration。然后我们使用以下命令启动一个启用了远程调试的hive进程

hive --debug -hiveconf hive.root.logger=DEBUG,console

接下来,点击刚刚中Idea中创建的远程Configuration,来连接到正在等待的hive。下面,我们就可以中Idea中下断点,对Hive进行单步调试与分析了。

参考:

  1. Hive源码编译及阅读修改调试
  2. Hive Developer FAQ
pwrliang Hive

Leave a Reply

Your email address will not be published. Required fields are marked *