以下内容为原创,转载请注明出处! 

hue部署以及配置组件说明 


环境搭建

  • 安装依赖

sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel

  • 安装Oracle JDK

sudo yum install java-1.8.0-openjdk

  • 安装maven包

wget http://mirrors.hust.edu.cn/apache/maven/maven-3/3.5.4/binaries/apache-maven-3.5.4-bin.tar.gz
tar -zxvf apache-maven-3.5.4-bin.tar.gz
ln -s apache-maven-3.5.4 /opt/dp/maven
vim .bashrc
         export MAVEN_HOME=/opt/dp/maven
         export PATH=$PATH:$MAVEN_HOME/bin
source .bashrc

  • 安装hue(这里需要用到maven ,必须联网编译)

git clone https://github.com/cloudera/hue.git
cd hue
make apps
build/env/bin/hue runserver

  • 映射数据库(默认sqlite,如果需要更换,请往下看)

build/env/bin/hue migrate

  • 启动kt_renewer

创建文件夹(文件夹权限改为项目启动权限用户)或者修改配置文件(修改ccache_path)
启动: build/env/bin/hue kt_renewer

  • 启动hue

build/env/bin/hue runserver 0.0.0.0:8081
apache启动方式:/sbin/service httpd restart



apache负载均衡配置

  • 安装apache

yum insatll httpd
yum install -y httpd-devel
yum install mod_wsgi

  • 修改配置文件

http.conf 中添加:LoadModule wsgi_module modules/mod_wsgi.so

  • 添加自己的配置文件,实现负载均衡以及会话保持:

vim hue_httpd_ke.conf
#<VirtualHost *:8082>
LoadModule wsgi_module modules/mod_wsgi.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_http_module modules/mod_proxy_http.so

WSGIScriptAlias / /home/dp/hue-cdh6.0.0-release/desktop/core/src/desktop/wsgi.py
WSGIPythonPath /home/dp/hue-cdh6.0.0-release/desktop/core/src/desktop:/home/dp/hue-cdh6.0.0-release/build/env/lib/python2.7/site-packages
WSGIDaemonProcess hue_httpd_project home=/home/dp/hue-cdh6.0.0-release python-path=/home/dp/hue-cdh6.0.0-release/desktop/core/src/desktop:/home/dp/hue-cdh6.0.0-release/build/env/lib/python2.7/site-packagesthreads=30
WSGIProcessGroup hue_httpd_project

<Directory /home/dp/hue-cdh6.0.0-release/desktop/core/src/desktop>
<Files wsgi.py>
Order Deny,Allow

# If apache 2.4
Require all granted

# otherwise
# Allow from all

# Some systems, like Redhat, lock down /var/run, so you may need to change where to store the socket with:
# WSGISocketPrefix run/wsgi
</Files>
</Directory>
Header add Set-Cookie "routeId=.%{BALANCER_WORKER_ROUTE}e;path=/" env=BALANCER_ROUTE_CHANGED
ProxyRequests Off

<Location /balancer-manager>
SetHandler balancer-manager
order Deny,Allow 
Allow from all 
Allow from localhost 
</Location>
ProxyPass /balancer-manager !

#轮询式负载均衡
<Proxy "balancer://mycluster">
BalancerMember http://hue-test158.dp.jpushoa.com route=hue158
BalancerMember http://hue-test166.dp.jpushoa.com route=hue166
</Proxy>
ProxyPass / balancer://mycluster/ stickysession=routeId

ErrorLog /etc/httpd/logs/hue.error.log
LogLevel warn
#</VirtualHost>

  • 修改源码

# 修改下面的代码貌似是python版本问题,无关痛痒
vim /home/dp/hue-cdh6.0.0-release/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py
from pkg_resources.py31compat import makedirs
makedirs(dirname, exist_ok=True)
# 不知道为什么在配置文件里面将runcpserver 设置为false还是没效果,只好直接注释掉源码,如果哪位大神知道了,请务必邮件给我答案:669090202@qq.com
vim /home/dp/hue-cdh6.0.0-release/desktop/core/src/desktop/__init__.py
#from desktop.supervisor import DjangoCommandSupervisee

#SUPERVISOR_SPEC = DjangoCommandSupervisee(
# "runcpserver", drop_root=False)

#修改这里是因为用了apache部署之后找不到log包,就换了种导入方式
vim /home/dp/hue-cdh6.0.0-release/desktop/core/src/desktop/settings.py
import desktop.log

  • 重启apache

       chomd -R 777 /home/dp/
       /sbin/service httpd restart



组件配置

开发模式下的配置是desktop/conf/pseudo-distributed.ini

  • kerberos配置

[[kerberos]]

# Path to Hue's Kerberos keytab file

hue_keytab=/home/dp/hue-cdh6.0.0-release/hue.keytab
# Kerberos principal name for Hue
hue_principal=hue
# Frequency in seconds with which Hue will renew its keytab
## keytab_reinit_frequency=3600
# Path to keep Kerberos credentials cached
## ccache_path=/var/run/hue/hue_krb5_ccache
# Path to kinit
kinit_path=/bin/kinit

# Mutual authentication from the server, attaches HTTP GSSAPI/Kerberos Authentication to the given Request object
## mutual_authentication="OPTIONAL" or "REQUIRED" or "DISABLED"

  • mysql配置(从sqlite更换到mysql需要重新映射数据库),该项目用了很多方式以及接口来连接数据库,比如:rdbms。为了以防万一在不同的方式下面都添加

[[database]]
engine=mysql
host=
port=
user=<用户名>
password=<密码>
name=

# mysql, oracle, or postgresql configuration.
[[[mysql]]]
# Name to show in the UI.
nice_name="My SQL DB"

# For MySQL and PostgreSQL, name is the name of the database.
# For Oracle, Name is instance of the Oracle server. For express edition
# this is 'xe' by default.
name=mysqldb

# Database backend to use. This can be:
# 1. mysql
# 2. postgresql
# 3. oracle
engine=mysql

# IP or hostname of the database to connect to.
host='you host'

# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
port=3306

# Username to authenticate with when connecting to the database.
user='you user'

# Password matching the username to authenticate with when
# connecting to the database.
password='you password'

  • hdfs配置

[[hdfs_clusters]]
# HA support by using HttpFs

[[[default]]]
# Enter the filesystem uri
fs_defaultfs=viewfs://cluster8

# NameNode logical name.
## logical_name=

# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.

webhdfs_url=http://nfjd-hadoop02-node179.jpushoa.com:14000/webhdfs/v1

# Change this if your HDFS cluster is Kerberos-secured
security_enabled=true

# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True

# Directory of the Hadoop configuration
hadoop_conf_dir=/etc/hadoop/conf

  • hive配置

[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=nfjd-hadoop-test03.jpushoa.com
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/etc/hive/conf
# Timeout in seconds for thrift calls to Hive service
server_conn_timeout=120
# Choose whether to use the old GetLog() thrift call from before Hive 0.14 to retrieve the logs.
# If false, use the FetchResults() thrift call from Hive 1.0 or more instead.
## use_get_log_api=false

  • yarn配置(HA)

[[yarn_clusters]]

[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=http://nfjd-hadoop-test02.jpushoa.com

# The port where the ResourceManager IPC listens on
resourcemanager_port=8032

# Whether to submit jobs to this cluster
submit_to=True

# Resource Manager logical name (required for HA)
logical_name=yarnRM

# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false

# URL of the ResourceManager API
resourcemanager_api_url=http://nfjd-hadoop-test02.jpushoa.com:8088

# URL of the ProxyServer API
proxy_api_url=http://nfjd-hadoop-test02.jpushoa.com:8088

# URL of the HistoryServer API
history_server_api_url=http://nfjd-hadoop-test02.jpushoa.com:19888

# URL of the Spark History Server
## spark_history_server_url=http://localhost:18088

# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True

# HA support by specifying multiple clusters.
# Redefine different properties there.
# e.g.

[[[ha]]]
# Resource Manager logical name (required for HA)
logical_name=yarnRM
resourcemanager_host=http://nfjd-hadoop-test03.jpushoa.com
# Un-comment to enable
submit_to=True
# URL of the ResourceManager API
resourcemanager_api_url=http://nfjd-hadoop-test03.jpushoa.com:8088
history_server_api_url=http://nfjd-hadoop-test03.jpushoa.com:19888
proxy_api_url=http://nfjd-hadoop-test03.jpushoa.com:8088
# ...

  • oozie配置

    [liboozie]
    # The URL where the Oozie service runs on. This is required in order for
    # users to submit jobs. Empty value disables the config check.
    oozie_url=http://nfjd-hadoop-test02.jpushoa.com:11000/oozie

    # Requires FQDN in oozie_url if enabled
    security_enabled=true

    # Location on HDFS where the workflows/coordinator are deployed when submitted.
    remote_deployement_dir=/user/hue/jobsub