This article is a step-by-step description of setting up Apache ZooKeeper. The goal is to set up ZooKeeper on 5-node Hadoop cluster.
Below table shows the list of all nodes where ZooKeeper will be set up, their IP addresses and their role in the Hadoop cluster.
|Hostname||IP Address||Cluster Role|
Follow the below steps to set up Apache ZooKeeper.
Download a stable release of ZooKeeper from one of the Apache mirror sites. The file will be named of the form zookeeper-x.y.z.tar.gz (eg. zookeeper-3.4.9.tar.gz).
Let us start by unpacking the tarball on all the nodes in the cluster.
tar xvfz zookeeper.3.4.9.tar.gz
After unpacking, zookeeper-3.4.9 is the directory you will get. This directory, with its full path from root directory, will be called ZOOKEEPER_HOME
Make the below directory on the nodes where ZooKeer will be set up. Here in my case, it is on all the nodes in the cluster.
mkdir /opt/hadoop/zookeeper mkdir /opt/hadoop/zookeeper/data1
The configuration file can be by any name and can be placed anywhere. But, the default configuration file for ZooKeeper is $ZOOKEEPER_HOME/conf/zoo.cfg. Non-default configuration file must be specified explicitly while starting zookeeper.
cat $ZOOKEEPER_HOME/conf/zoo.cfg clientPort=2181 initLimit=10 syncLimit=5 dataDir=/opt/hadoop/zookeeper/data1 tickTime=2000 server.1=orcl1:2888:3888 server.2=orcl2:2888:3888 server.3=orcl3:2888:3888 server.4=orcl4:2888:3888 server.5=orcl5:2888:3888
dataDir is used by ZooKeeper to store the snapshot of in-memory database as well as the transaction logs of the database.
On all the nodes where ZooKeeper will be installed, ensure the below environment variable is also set
ZOOKEEPER_HOME=/opt/hadoo/zookeeper-3.4.9 export ZOOKEEPER_HOME
On each node, create a file by name myid in the directory pointed to by dataDir in zoo.cfg. Here, it is /opt/hadoop/zookeeper/data1/myid. myid file on each node should contain a single number which should match the number specified in conf file. Here, server.1=orcl1:2888:3888. Hence, myid file on node orcl1 must have 1.
Below table adds more clarity on the value to be set in myid file on each server.
|Node||Value in myid file|
To start ZooKeeper, use the below command. This command needs to be run on all nodes where ZooKeeper is set up.
$ZOOKEEPER_HOME/bin/zkServer.sh start [<full path to non-default config file>]
Output from above command is below
ZooKeeper JMX enabled by default Using config: /opt/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
If you get Connection refused error, please refer here
The output from the above command is as below.
ZooKeeper JMX enabled by default Using config: /opt/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg Mode: follower
Execute all the below commands on each node. If the output imok is not received, there is some issue and warrants troubleshooting.
$ echo ruok | nc orcl1 2181 imok $ echo ruok | nc orcl2 2181 imok $ echo ruok | nc orcl3 2181 imok $ echo ruok | nc orcl4 2181 imok $ echo ruok | nc orcl5 2181 imok
To get more verbose details of status, use the below command
$ echo mntr | nc <node name> 2181
The output from the above command is shown below
zk_version 3.4.9-1757313, built on 08/23/2016 06:05 GMT zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 2 zk_packets_sent 1 zk_num_alive_connections 1 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 4 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size 27 zk_open_file_descriptor_count 21 zk_max_file_descriptor_count 65535
Execute the command echo mntr | nc <Server where ZooKeeper is running> 2181 on all nodes where ZooKeeper is running to know the status