Setting up ZooKeeper on a 5-node Hadoop cluster

This article is a step-by-step description of setting up Apache ZooKeeper. The goal is to set up ZooKeeper on 5-node Hadoop cluster.

Below table shows the list of all nodes where ZooKeeper will be set up, their IP addresses and their role in the Hadoop cluster.

HostnameIP AddressCluster Role
orcl2192.168.1.76Secondary NameNode

Follow the below steps to set up Apache ZooKeeper.

Download the ZooKeeper binary

Download a stable release of ZooKeeper from one of the Apache mirror sites. The file will be named of the form zookeeper-x.y.z.tar.gz (eg. zookeeper-3.4.9.tar.gz).

Unpach the binary file tarball

Let us start by unpacking the tarball on all the nodes in the cluster.

tar xvfz zookeeper.3.4.9.tar.gz

After unpacking, zookeeper-3.4.9 is the directory you will get. This directory, with its full path from root directory, will be called ZOOKEEPER_HOME

Make required directory

Make the below directory on the nodes where ZooKeer will be set up. Here in my case, it is on all the nodes in the cluster.

mkdir /opt/hadoop/zookeeper
mkdir /opt/hadoop/zookeeper/data1
Configuration file

The configuration file can be by any name and can be placed anywhere. But, the default configuration file for ZooKeeper is $ZOOKEEPER_HOME/conf/zoo.cfg. Non-default configuration file must be specified explicitly while starting zookeeper.

cat $ZOOKEEPER_HOME/conf/zoo.cfg

dataDir is used by ZooKeeper to store the snapshot of in-memory database as well as the transaction logs of the database.

Environment Setting

On all the nodes where ZooKeeper will be installed, ensure the below environment variable is also set

Create myid file

On each node, create a file by name myid in the directory pointed to by dataDir in zoo.cfg. Here, it is /opt/hadoop/zookeeper/data1/myid. myid file on each node should contain a single number which should match the number specified in conf file. Here, server.1=orcl1:2888:3888. Hence, myid file on node orcl1 must have 1.

Below table adds more clarity on the value to be set in myid file on each server.

NodeValue in myid file
Starting ZooKeeper

To start ZooKeeper, use the below command. This command needs to be run on all nodes where ZooKeeper is set up.

$ZOOKEEPER_HOME/bin/ start [<full path to non-default config file>]

Output from above command is below

ZooKeeper JMX enabled by default
Using config: /opt/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

If you get Connection refused error, please refer here

$ZOOKEEPER_HOME/bin/ status

The output from the above command is as below.

ZooKeeper JMX enabled by default
Using config: /opt/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower

Execute all the below commands on each node. If the output imok is not received, there is some issue and warrants troubleshooting.

$ echo ruok | nc orcl1 2181

$ echo ruok | nc orcl2 2181

$ echo ruok | nc orcl3 2181

$ echo ruok | nc orcl4 2181

$ echo ruok | nc orcl5 2181

To get more verbose details of status, use the below command

$ echo mntr | nc <node name> 2181

The output from the above command is shown below

zk_version      3.4.9-1757313, built on 08/23/2016 06:05 GMT
zk_avg_latency  0
zk_max_latency  0
zk_min_latency  0
zk_packets_received     2
zk_packets_sent 1
zk_num_alive_connections        1
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count  4
zk_watch_count  0
zk_ephemerals_count     0
zk_approximate_data_size        27
zk_open_file_descriptor_count   21
zk_max_file_descriptor_count    65535

Execute the command echo mntr | nc <Server where ZooKeeper is running> 2181 on all nodes where ZooKeeper is running to know the status