As discussed in our previous blog: Here’s everything you need to know about Solr Solr is an open-source enterprise-search platform from the Apache Lucene project. The platform provides full-text search, hit highlight, database integration, rich document handling, and many more features that significantly speed up your search responses.
In this step-by-step guide, we’ll walk you through installing Solr and Zookeeper in order to achieve high availability. We will kick off with the installation of Zookeeper in order to have the framework for our Solr instances.
For this example, we’ll show how to install Solr and Zookeeper clusters on Linux using CentOS. This test environment features 3 Solr instances and 3 Zookeeper nodes.
We will use these IPs for the VMs:
Solr1 -> 33.33.33.10
Solr2 -> 33.33.33.11
Solr3 -> 33.33.33.12
Zookeeper1 -> 33.33.33.13
Zookeeper2 -> 33.33.33.14
Zookeeper3 -> 33.33.33.15
Download the latest Zookeeper, extract it and install it.
cd /opt
wget https://downloads.apache.org/zookeeper/zookeeper-3.5.7/apache-zookeeper-3.5.7-bin.tar.gz
mkdir /var/lib/zookeeper
ln -s apache-zookeeper-3.5.7 zookeeper
cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg
[root@zookeeper1 opt]# cat zookeeper/conf/zoo.cfg # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/tmp/zookeeper # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=0 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 dataDir=/var/lib/zookeeper server.1=33.33.33.13:2888:3888 server.2=33.33.33.14:2888:3888 server.3=33.33.33.15:2888:3888 |
On each zookeeper, you need to edit /var/lib/zookeeper/myid and add a single line containing only the text of that machine's id. This id must be unique for each Zookeeper instance.
##Zookeeper1
[root@zookeeper1 opt]# cat /var/lib/zookeeper/myid
1
[root@zookeeper1 opt]#
##Zookeeper2
[root@zookeeper2 opt]# cat /var/lib/zookeeper/myid
2
[root@zookeeper2 opt]#
##Zookeeper3
[root@zookeeper3 opt]# cat /var/lib/zookeeper/myid
3
[root@zookeeper3 opt]#
Remove the following from the zkEnv.sh file:
vim /opt/zookeeper/bin/zkEnv.sh ---- >>> remove these lines from the end of the file # default heap for zookeeper server ZK_SERVER_HEAP="${ZK_SERVER_HEAP:-1000}" export SERVER_JVMFLAGS="-Xmx${ZK_SERVER_HEAP}m $SERVER_JVMFLAGS" # default heap for zookeeper client ZK_CLIENT_HEAP="${ZK_CLIENT_HEAP:-256}" export CLIENT_JVMFLAGS="-Xmx${ZK_CLIENT_HEAP}m $CLIENT_JVMFLAGS" |
Create a new Java environment file /opt/zookeeper/conf/java.env and add start parameters
[root@zookeeper1 opt]# cat /opt/zookeeper/conf/java.env export JVMFLAGS="-Xmx2g -Xms2g" export ZOO_LOG_DIR="/opt/zookeeper/logs" |
Find the "start" function and add "-Dzookeeper.4lw.commands.whitelist=*". In our case, the start command would go like this:
nohup "$JAVA" $ZOO_DATADIR_AUTOCREATE "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" \ "-Dzookeeper.log.file=${ZOO_LOG_FILE}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" "-Dzookeeper.4lw.commands.whitelist=*" \ -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -9 %p' \ -cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null & |
/opt/zookeeper/bin/zkServer.sh start |
Check to see which one is the master and which ones are the followers. In our case, Zookeeper2 is the master.
[root@zookeeper2 ~]# echo mntr | nc 127.0.0.1 2181 zk_version 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 1 zk_packets_sent 0 zk_num_alive_connections 1 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 5 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size 182 zk_open_file_descriptor_count 59 zk_max_file_descriptor_count 4096 zk_followers 2 zk_synced_followers 2 zk_pending_syncs 0 zk_last_proposal_size -1 zk_max_proposal_size -1 zk_min_proposal_size -1 [root@zookeeper2 ~]# |
and the other ones are the followers:
[root@zookeeper1 ~]# echo mntr | nc 127.0.0.1 2181 zk_version 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 1 zk_packets_sent 0 zk_num_alive_connections 1 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 5 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size 182 zk_open_file_descriptor_count 57 zk_max_file_descriptor_count 4096 [root@zookeeper1 ~]# |
Now it’s time to install Solr by following these steps:
Download the latest Solr, extract it and install it as follows.
# --> Got to /opt and download Solr cd /opt wget https://archive.apache.org/dist/lucene/solr/8.5.0/solr-8.5.0.tgz # --> Extract just the installation script from thee archive tar xzf solr-8.5.0.tgz solr-8.5.0/bin/install_solr_service.sh --strip-components=2 # --> Execute the script against the archive with some parameters # --> i -> Directory to extract the Solr installation archive; defaults to /opt/". The specified path must exist prior to using this script. # --> d -> Directory for live / writable Solr files, such as logs, pid files, and index data; defaults to /var/solr # --> u -> User to own the Solr files and run the Solr process as; defaults to solr. This script will create the specified user account if it does not exist. # --> s -> Service name; defaults to solr # --> p -> Port Solr should bind to; default is 8983 ./install_solr_service.sh solr-8.5.0.tgz -i /opt -d /var/solr -u solr -s solr -p 8983 Extracting solr-8.5.0.tgz to /opt Installing symlink /opt/solr -> /opt/solr-8.5.0 ... Installing /etc/init.d/solr script ... Installing /etc/default/solr.in.sh ... Service solr installed. Customize Solr startup configuration in /etc/default/solr.in.sh *** [WARN] *** Your open file limit is currently 1024. It should be set to 65000 to avoid operational disruption. If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh *** [WARN] *** Your Max Processes Limit is currently 3894. It should be set to 65000 to avoid operational disruption. If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh Warning: Available entropy is low. As a result, use of the UUIDField, SSL, or any other features that require RNG might not work properly. To check for the amount of available entropy, use 'cat /proc/sys/kernel/random/entropy_avail'. Waiting up to 180 seconds to see Solr running on port 8983 [-] Started Solr server on port 8983 (pid=5500). Happy searching! Found 1 Solr nodes: Solr process 5500 running on port 8983 { "solr_home":"/var/solr/data", "version":"8.5.0 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42 - romseygeek - 2020-03-13 09:38:24", "startTime":"2020-03-30T15:48:15.498Z", "uptime":"0 days, 0 hours, 0 minutes, 13 seconds", "memory":"140.5 MB (%27.4) of 512 MB"} [root@solr2 opt]# service solr status Found 1 Solr nodes: Solr process 5500 running on port 8983 { "solr_home":"/var/solr/data", "version":"8.5.0 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42 - romseygeek - 2020-03-13 09:38:24", "startTime":"2020-03-30T15:48:15.498Z", "uptime":"0 days, 0 hours, 0 minutes, 59 seconds", "memory":"145.5 MB (%28.4) of 512 MB"} |
[root@solr1 opt]# cat /etc/default/solr.in.sh | grep SOLR_JAVA_MEM
SOLR_JAVA_MEM="-Xms2g -Xmx2g"
[root@solr1 opt]#
[root@solr1 ~]# cat /etc/default/solr.in.sh | grep ZK_HOST ZK_HOST="33.33.33.13,33.33.33.14,33.33.33.15" [root@solr1 ~]# |
service solr start
Getting the health check of your collection (the IPs are from the Zookeeper ensemble)
/opt/solr/bin/solr healthcheck -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15 { "collection":"collection1", "status":"healthy", "numDocs":0, "numShards":1, "shards":[{ "shard":"shard1", "status":"healthy", "replicas":[ { "name":"core_node3", "url":"http://33.33.33.12:8983/solr/collection1_shard1_replica_n1/", "numDocs":0, "status":"active", "uptime":"0 days, 1 hours, 32 minutes, 16 seconds", "memory":"552.9 MB (%27) of 2 GB"}, { "name":"core_node5", "url":"http://33.33.33.10:8983/solr/collection1_shard1_replica_n2/", "numDocs":0, "status":"active", "uptime":"0 days, 1 hours, 34 minutes, 9 seconds", "memory":"502.5 MB (%24.5) of 2 GB", "leader":true}, { "name":"core_node6", "url":"http://33.33.33.11:8983/solr/collection1_shard1_replica_n4/", "numDocs":0, "status":"active", "uptime":"0 days, 1 hours, 32 minutes, 57 seconds", "memory":"64.5 MB (%3.1) of 2 GB"}]}]} |
Back-up via snapshot
A snapshot is a piece of metadata referring to the specific Lucene index commit. Solr guarantees it is preserved during the lifetime of the snapshot, in spite of subsequent index optimisations. This enables a Solr collection snapshot to provide a point-in-time, consistent state of index data even in the presence of concurrent index operations.
The snapshot creation is very fast since it just persists the snapshot metadata and does not copy the associated index files.
1. The IPs are from the Zookeeper ensemble.
2. The describe snapshot command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
/opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --describe my-first-snapshot -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15 Name: my-first-snapshot Status: Successful Time of creation: Sun, 3 May 2020 08:37:18 EDT Total number of cores with snapshot: 3 ----------------------------------- Core [name=collection1_shard1_replica_n1, leader=false, generation=1, indexDirPath=/var/solr/data/collection1_shard1_replica_n1/data/index/] Core [name=collection1_shard1_replica_n4, leader=false, generation=2, indexDirPath=/var/solr/data/collection1_shard1_replica_n4/data/index/] Core [name=collection1_shard1_replica_n2, leader=true, generation=1, indexDirPath=/var/solr/data/collection1_shard1_replica_n2/data/index/] |
3. For a list of all the snapshots in a collection issue the command on any Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
/opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --list -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15
my-2nd-snapshot
my-first-snapshot
4. There is an option to export the created snapshot to the disk, for this access to a shared drive is needed(e.g.: a NFS share), all nodes should have the shared drive mounted in the same location/mount point and the “solr” user should have write permissions. Our Solr instances have the shared drive mounted under /mnt/nfs/var/nfs.
The command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 0
[root@solr3 /]#
[root@solr2 logs]# /opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --export my-first-snapshot -c collection1 -d /mnt/nfs/var/nfs -z 33.33.33.13,33.33.33.14,33.33.33.15
Done. GoodBye!
[root@solr2 logs]#
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 4
drwxrwxr-x 4 solr solr 4096 Jul 1 16:09 my-first-snapshot
[root@solr3 /]#
The command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 0
[root@solr3 /]#
[root@solr2 logs]# /opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --export my-first-snapshot -c collection1 -d /mnt/nfs/var/nfs -z 33.33.33.13,33.33.33.14,33.33.33.15
Done. GoodBye!
[root@solr2 logs]#
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 4
drwxrwxr-x 4 solr solr 4096 Jul 1 16:09 my-first-snapshot
[root@solr3 /]#
This concludes our installation guide for Zookeeper and Solr to achieve high availability. Bear in mind that there are many more options you can use in order to personalize your environment and make sure you have the perfect blend of functionality and performance.