A step by step guide for a high availability Solr environment
As discussed in our previous blog: Here’s everything you need to know about Solr Solr is an open-source enterprise-search platform from the Apache Lucene project. The platform provides full-text search, hit highlight, database integration, rich document handling, and many more features that significantly speed up your search responses.
In this step-by-step guide, we’ll walk you through installing Solr and Zookeeper in order to achieve high availability. We will kick off with the installation of Zookeeper in order to have the framework for our Solr instances.
Solr & Zookeeper on Linux
For this example, we’ll show how to install Solr and Zookeeper clusters on Linux using CentOS. This test environment features 3 Solr instances and 3 Zookeeper nodes.
We will use these IPs for the VMs:
Solr1 -> 33.33.33.10
Solr2 -> 33.33.33.11
Solr3 -> 33.33.33.12
Zookeeper1 -> 33.33.33.13
Zookeeper2 -> 33.33.33.14
Zookeeper3 -> 33.33.33.15
Test environment setup
In order to create a high availability scenario we have chosen the following setup, please see below. The diagram also features the sizing (CPU, RAM and Disk Size) for each node and instance.
Installing Zookeeper
1. Download the latest Zookeeper, extract it and install it
Download the latest Zookeeper, extract it and install it.
cd /opt
wget https://downloads.apache.org/zookeeper/zookeeper-3.5.7/apache-zookeeper-3.5.7-bin.tar.gz
mkdir /var/lib/zookeeper
ln -s apache-zookeeper-3.5.7 zookeeper
cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg
2. Add in the zoo.cfg file, the following config, making sure you replace the IPs with the actual IPs of your Zookeeper instances
[root@zookeeper1 opt]# cat zookeeper/conf/zoo.cfg
3. Now it's time to set the ID for each zookeeper instance
On each zookeeper, you need to edit /var/lib/zookeeper/myid and add a single line containing only the text of that machine's id. This id must be unique for each Zookeeper instance.
##Zookeeper1
[root@zookeeper1 opt]# cat /var/lib/zookeeper/myid
1
[root@zookeeper1 opt]#
##Zookeeper2
[root@zookeeper2 opt]# cat /var/lib/zookeeper/myid
2
[root@zookeeper2 opt]#
##Zookeeper3
[root@zookeeper3 opt]# cat /var/lib/zookeeper/myid
3
[root@zookeeper3 opt]#
4. Next adjust the memory settings
Remove the following from the zkEnv.sh file:
vim /opt/zookeeper/bin/zkEnv.sh
5. Create a new Java environment file
Create a new Java environment file /opt/zookeeper/conf/java.env and add start parameters
[root@zookeeper1 opt]# cat /opt/zookeeper/conf/java.env
6. Edit the zkServer.sh and allow 4 letter commands
Find the "start" function and add "-Dzookeeper.4lw.commands.whitelist=*". In our case, the start command would go like this:
nohup "$JAVA" $ZOO_DATADIR_AUTOCREATE "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" \
7. Once this is done, you can go ahead and start each Zookeeper instance, one by one
/opt/zookeeper/bin/zkServer.sh start
8. After all 3 Zookeepers nodes are started
Check to see which one is the master and which ones are the followers. In our case, Zookeeper2 is the master.
[root@zookeeper2 \~]# echo mntr | nc 127.0.0.1 2181
and the other ones are the followers:
[root@zookeeper1 \~]# echo mntr | nc 127.0.0.1 2181
Installing Solr
Now it’s time to install Solr by following these steps:
1. Download the latest Solr, extract it and install it as follows
Download the latest Solr, extract it and install it as follows.
# --> Got to /opt and download Solr
2. Edit the /etc/default/solr.in.sh file on all 3 Solr instances and uncomment & adjust the memory settings
[root@solr1 opt]# cat /etc/default/solr.in.sh | grep SOLR_JAVA_MEM
SOLR_JAVA_MEM="-Xms2g -Xmx2g"
[root@solr1 opt]#
3. Edit your /etc/default/solr.in.sh file on all 3 Solr instances and add the IPs of the Zookeeper ensemble
[root@solr1 \~]# cat /etc/default/solr.in.sh | grep ZK_HOST
4. Once all the Zookeeper nodes are up and running and Solr is configured, you may go ahead and start your Solr cluster
service solr start
Additional operational steps
Getting the health check of your collection (the IPs are from the Zookeeper ensemble)
/opt/solr/bin/solr healthcheck -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15
Back-up via snapshot
A snapshot is a piece of metadata referring to the specific Lucene index commit. Solr guarantees it is preserved during the lifetime of the snapshot, in spite of subsequent index optimisations. This enables a Solr collection snapshot to provide a point-in-time, consistent state of index data even in the presence of concurrent index operations.
The snapshot creation is very fast since it just persists the snapshot metadata and does not copy the associated index files.
1. The IPs are from the Zookeeper ensemble.
2. The describe snapshot command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
/opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --describe my-first-snapshot -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15
3. For a list of all the snapshots in a collection issue the command on any Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
/opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --list -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15
my-2nd-snapshot
my-first-snapshot
4. There is an option to export the created snapshot to the disk, for this access to a shared drive is needed(e.g.: a NFS share), all nodes should have the shared drive mounted in the same location/mount point and the “solr” user should have write permissions. Our Solr instances have the shared drive mounted under /mnt/nfs/var/nfs.
The command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 0
[root@solr3 /]#
[root@solr2 logs]# /opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --export my-first-snapshot -c collection1 -d /mnt/nfs/var/nfs -z 33.33.33.13,33.33.33.14,33.33.33.15
Done. GoodBye!
[root@solr2 logs]#
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 4
drwxrwxr-x 4 solr solr 4096 Jul 1 16:09 my-first-snapshot
[root@solr3 /]#
The command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 0
[root@solr3 /]#
[root@solr2 logs]# /opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --export my-first-snapshot -c collection1 -d /mnt/nfs/var/nfs -z 33.33.33.13,33.33.33.14,33.33.33.15
Done. GoodBye!
[root@solr2 logs]#
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 4
drwxrwxr-x 4 solr solr 4096 Jul 1 16:09 my-first-snapshot
[root@solr3 /]#
This concludes our installation guide for Zookeeper and Solr to achieve high availability. Bear in mind that there are many more options you can use in order to personalize your environment and make sure you have the perfect blend of functionality and performance.