A step by step guide for a high availability Solr environment

As discussed in our previous blog: Here’s everything you need to know about Solr Solr is an open-source enterprise-search platform from the Apache Lucene project. The platform provides full-text search, hit highlight, database integration, rich document handling, and many more features that significantly speed up your search responses.

In this step-by-step guide, we’ll walk you through installing Solr and Zookeeper in order to achieve high availability. We will kick off with the installation of Zookeeper in order to have the framework for our Solr instances.

Solr & Zookeeper on Linux

For this example, we’ll show how to install Solr and Zookeeper clusters on Linux using CentOS. This test environment features 3 Solr instances and 3 Zookeeper nodes.

We will use these IPs for the VMs:

Solr1 -> 33.33.33.10 
Solr2 -> 33.33.33.11
Solr3 -> 33.33.33.12
Zookeeper1 -> 33.33.33.13
Zookeeper2 -> 33.33.33.14
Zookeeper3 -> 33.33.33.15

Test environment setup

In order to create a high availability scenario we have chosen the following setup, please see below. The diagram also features the sizing (CPU, RAM and Disk Size) for each node and instance.

Installing Zookeeper

1. Download the latest Zookeeper, extract it and install it

Download the latest Zookeeper, extract it and install it.

cd /opt
wget https://downloads.apache.org/zookeeper/zookeeper-3.5.7/apache-zookeeper-3.5.7-bin.tar.gz
mkdir /var/lib/zookeeper
ln -s apache-zookeeper-3.5.7 zookeeper
cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg

2. Add in the zoo.cfg file, the following config, making sure you replace the IPs with the actual IPs of your Zookeeper instances

[root@zookeeper1 opt]# cat zookeeper/conf/zoo.cfg

3. Now it's time to set the ID for each zookeeper instance

On each zookeeper, you need to edit /var/lib/zookeeper/myid and add a single line containing only the text of that machine's id. This id must be unique for each Zookeeper instance.

##Zookeeper1
[root@zookeeper1 opt]# cat /var/lib/zookeeper/myid
1
[root@zookeeper1 opt]#
##Zookeeper2
[root@zookeeper2 opt]# cat /var/lib/zookeeper/myid
2
[root@zookeeper2 opt]#
##Zookeeper3
[root@zookeeper3 opt]# cat /var/lib/zookeeper/myid
3
[root@zookeeper3 opt]#

4. Next adjust the memory settings

Remove the following from the zkEnv.sh file:

vim /opt/zookeeper/bin/zkEnv.sh

5. Create a new Java environment file

Create a new Java environment file /opt/zookeeper/conf/java.env and add start parameters

[root@zookeeper1 opt]# cat /opt/zookeeper/conf/java.env

6. Edit the zkServer.sh and allow 4 letter commands

Find the "start" function and add "-Dzookeeper.4lw.commands.whitelist=*". In our case, the start command would go like this:

nohup "$JAVA" $ZOO_DATADIR_AUTOCREATE "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" \

7. Once this is done, you can go ahead and start each Zookeeper instance, one by one

/opt/zookeeper/bin/zkServer.sh start

8. After all 3 Zookeepers nodes are started

Check to see which one is the master and which ones are the followers. In our case, Zookeeper2 is the master.

[root@zookeeper2 \~]# echo mntr | nc 127.0.0.1 2181

and the other ones are the followers:

[root@zookeeper1 \~]# echo mntr | nc 127.0.0.1 2181

Installing Solr

Now it’s time to install Solr by following these steps:

1. Download the latest Solr, extract it and install it as follows

Download the latest Solr, extract it and install it as follows.

# --> Got to /opt and download Solr

2. Edit the /etc/default/solr.in.sh file on all 3 Solr instances and uncomment & adjust the memory settings

[root@solr1 opt]# cat /etc/default/solr.in.sh | grep SOLR_JAVA_MEM
SOLR_JAVA_MEM="-Xms2g -Xmx2g"
[root@solr1 opt]#

3. Edit your /etc/default/solr.in.sh file on all 3 Solr instances and add the IPs of the Zookeeper ensemble

[root@solr1 \~]# cat /etc/default/solr.in.sh | grep ZK_HOST

4. Once all the Zookeeper nodes are up and running and Solr is configured, you may go ahead and start your Solr cluster

service solr start

Additional operational steps

Getting the health check of your collection (the IPs are from the Zookeeper ensemble)

/opt/solr/bin/solr healthcheck -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15

Back-up via snapshot

A snapshot is a piece of metadata referring to the specific Lucene index commit. Solr guarantees it is preserved during the lifetime of the snapshot, in spite of subsequent index optimisations. This enables a Solr collection snapshot to provide a point-in-time, consistent state of index data even in the presence of concurrent index operations.

The snapshot creation is very fast since it just persists the snapshot metadata and does not copy the associated index files.

1. The IPs are from the Zookeeper ensemble.

2. The describe snapshot command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.

/opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --describe my-first-snapshot -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15

3. For a list of all the snapshots in a collection issue the command on any Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.

/opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --list -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15
my-2nd-snapshot
my-first-snapshot

4. There is an option to export the created snapshot to the disk, for this access to a shared drive is needed(e.g.: a NFS share), all nodes should have the shared drive mounted in the same location/mount point and the “solr” user should have write permissions. Our Solr instances have the shared drive mounted under /mnt/nfs/var/nfs.

The command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.

[root@solr3 /]# ll /mnt/nfs/var/nfs
total 0
[root@solr3 /]#
[root@solr2 logs]# /opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --export my-first-snapshot -c collection1 -d /mnt/nfs/var/nfs -z 33.33.33.13,33.33.33.14,33.33.33.15
Done. GoodBye!
[root@solr2 logs]#
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 4
drwxrwxr-x 4 solr solr 4096 Jul  1 16:09 my-first-snapshot
[root@solr3 /]#
The command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 0
[root@solr3 /]#
[root@solr2 logs]# /opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --export my-first-snapshot -c collection1 -d /mnt/nfs/var/nfs -z 33.33.33.13,33.33.33.14,33.33.33.15
Done. GoodBye!
[root@solr2 logs]#
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 4
drwxrwxr-x 4 solr solr 4096 Jul  1 16:09 my-first-snapshot
[root@solr3 /]#

This concludes our installation guide for Zookeeper and Solr to achieve high availability. Bear in mind that there are many more options you can use in order to personalize your environment and make sure you have the perfect blend of functionality and performance.

Choosing the right Solr integration for your business

At Netcentric, our Solr experts have experience dealing with various integrations across multiple platforms. This enables us to offer a wide range of services that transform your Solr implementation in conjunction with the Adobe Stack. By making sure that all your needs are covered, we can add value to each line of business that features Solr and Adobe Stack instances.

Get in touch with our experts today to discover how we can help take your Solr implementation to the next level.

Contact us