Spark on Kyuubi With Yarn

测试Spark on Yarn

确保spark正常运行,并且可以正常运行任务在yarn上,可以使用下面demo进行测试

bin/spark-submit \
--master yarn \
--class org.apache.spark.examples.SparkPi \
examples/jars/spark-examples_2.12-3.2.1.jar \
10

配置Kyuubi

修改kyuubi配置目录下conf/kyuubi-defaults.conf,增加以下配置

conf/kyuubi-defaults.conf
kyuubi.authentication           NONE
kyuubi.frontend.bind.host       localhost
kyuubi.frontend.bind.port       10009

# 通过zookeeper配置kyuubi的HA
# 需要注意多台机器启动kyuubi,这样会在zookeeper中注册多台机器信息,才能起到HA作用
kyuubi.ha.enabled true
kyuubi.ha.zookeeper.quorum localhost
kyuubi.ha.zookeeper.client.port 2181
kyuubi.ha.zookeeper.session.timeout 600000

# 默认提交队列
spark.yarn.queue default
# master 必须指定为yarn才会提交到yarn集群中
spark.master yarn

修改kyuubi配置目录kyuubi-env.sh

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_291.jdk/Contents/Home
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.3.1/libexec
export HADOOP_CONF_DIR=/usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop
export SPARK_HOME=/Users/liu/Documents/spark-3.2.1-bin-hadoop3.2/
export SPARK_CONF_DIR=/Users/liu/Documents/spark-3.2.1-bin-hadoop3.2/conf

启动kyuubi

bin/kyuubi start

通过beeline连接spark

通过ip:host方式连接
bin/beeline

!connect jdbc:hive2://localhost:10009/;#spark.master=yarn;spark.yarn.queue=thequeue
通过zookeeper发现kyuubi方式连接
!connect jdbc:hive2://localhost/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;hive.server2.proxy.user=liu

Pasted image 20230818174809.png

查看zookeeper上面存储信息
其中serviceUri 代表的就是一个kyuubi server,我们需要在别的机器上同样启动kyuubi,这样才能有多个kyuubi,起到HA作用。
Pasted image 20230818180606.png