Skip Main Navigation

A step by step guide for a high availability Solr environment

As discussed in our previous blog: Here’s everything you need to know about Solr Solr is an open-source enterprise-search platform from the Apache Lucene project. The platform provides full-text search, hit highlight, database integration, rich document handling, and many more features that significantly speed up your search responses. 

 

In this step-by-step guide, we’ll walk you through installing Solr and Zookeeper in order to achieve high availability. We will kick off with the installation of Zookeeper in order to have the framework for our Solr instances.

Solr & Zookeeper on Linux

For this example, we’ll show how to install Solr and Zookeeper clusters on Linux using CentOS. This test environment features 3 Solr instances and 3 Zookeeper nodes. 

 

We will use these IPs  for the VMs:

 

Solr1 -> 33.33.33.10
Solr2 -> 33.33.33.11
Solr3 -> 33.33.33.12
Zookeeper1 -> 33.33.33.13
Zookeeper2 -> 33.33.33.14
Zookeeper3 -> 33.33.33.15

Test environment setup

In order to create a high availability scenario we have chosen the following setup, please see below. The diagram also features the sizing (CPU, RAM and Disk Size)  for each node and instance.

Installing Zookeeper

Installing Solr

Now it’s time to install Solr by following these steps:

Additional operational steps

Getting the health check of your collection (the IPs are from the Zookeeper ensemble)
/opt/solr/bin/solr healthcheck -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15
{
  "collection":"collection1",
  "status":"healthy",
  "numDocs":0,
  "numShards":1,
  "shards":[{
   "shard":"shard1",
   "status":"healthy",
   "replicas":[
     {
       "name":"core_node3",     "url":"http://33.33.33.12:8983/solr/collection1_shard1_replica_n1/",
       "numDocs":0,
       "status":"active",
       "uptime":"0 days, 1 hours, 32 minutes, 16 seconds",
       "memory":"552.9 MB (%27) of 2 GB"},
     {
       "name":"core_node5",      "url":"http://33.33.33.10:8983/solr/collection1_shard1_replica_n2/",
       "numDocs":0,
       "status":"active",
       "uptime":"0 days, 1 hours, 34 minutes, 9 seconds",
       "memory":"502.5 MB (%24.5) of 2 GB",
      "leader":true},
     {
       "name":"core_node6",
"url":"http://33.33.33.11:8983/solr/collection1_shard1_replica_n4/",
       "numDocs":0,
       "status":"active",
       "uptime":"0 days, 1 hours, 32 minutes, 57 seconds",
       "memory":"64.5 MB (%3.1) of 2 GB"}]}]}
Back-up via snapshot

A snapshot is a piece of metadata referring to the specific Lucene index commit. Solr guarantees it is preserved during the lifetime of the snapshot, in spite of subsequent index optimisations. This enables a Solr collection snapshot to provide a point-in-time, consistent state of index data even in the presence of concurrent index operations. 

 

The snapshot creation is very fast since it just persists the snapshot metadata and does not copy the associated index files.

 

1. The IPs are from the Zookeeper ensemble.

 

2. The describe snapshot command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.

 

/opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --describe my-first-snapshot -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15
Name: my-first-snapshot
Status: Successful
Time of creation: Sun, 3 May 2020 08:37:18 EDT
Total number of cores with snapshot: 3
-----------------------------------
Core [name=collection1_shard1_replica_n1, leader=false, generation=1, indexDirPath=/var/solr/data/collection1_shard1_replica_n1/data/index/]
Core [name=collection1_shard1_replica_n4, leader=false, generation=2, indexDirPath=/var/solr/data/collection1_shard1_replica_n4/data/index/]
Core [name=collection1_shard1_replica_n2, leader=true, generation=1, indexDirPath=/var/solr/data/collection1_shard1_replica_n2/data/index/]

 

3. For a list of all the snapshots in a collection issue the command on any Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.

 

/opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --list -c collection1 -z 33.33.33.13,33.33.33.14,33.33.33.15
my-2nd-snapshot
my-first-snapshot

 

4. There is an option to export the created snapshot to the disk, for this access to a shared drive is needed(e.g.: a NFS share), all nodes should have the shared drive mounted in the same location/mount point and the “solr” user should have write permissions. Our Solr instances have the shared drive mounted under /mnt/nfs/var/nfs. 

 

The command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.

 

[root@solr3 /]# ll /mnt/nfs/var/nfs
total 0
[root@solr3 /]#
[root@solr2 logs]# /opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --export my-first-snapshot -c collection1 -d /mnt/nfs/var/nfs -z 33.33.33.13,33.33.33.14,33.33.33.15
Done. GoodBye!
[root@solr2 logs]#
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 4
drwxrwxr-x 4 solr solr 4096 Jul  1 16:09 my-first-snapshot
[root@solr3 /]#
The command can be issued on any of the Solr instances (not just on the master). The IPs are from the Zookeeper ensemble.
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 0
[root@solr3 /]#
[root@solr2 logs]# /opt/solr/server/scripts/cloud-scripts/snapshotscli.sh --export my-first-snapshot -c collection1 -d /mnt/nfs/var/nfs -z 33.33.33.13,33.33.33.14,33.33.33.15
Done. GoodBye!
[root@solr2 logs]#
[root@solr3 /]# ll /mnt/nfs/var/nfs
total 4
drwxrwxr-x 4 solr solr 4096 Jul  1 16:09 my-first-snapshot
[root@solr3 /]#

 

This concludes our installation guide for Zookeeper and Solr to achieve high availability. Bear in mind that there are many more options you can use in order to personalize your environment and make sure you have the perfect blend of functionality and performance.

Choosing the right Solr integration for your business

At Netcentric, our Solr experts have experience dealing with various integrations across multiple platforms. This enables us to offer a wide range of services that transform your Solr implementation in conjunction with the Adobe Stack. By making sure that all your needs are covered, we can add value to each line of business that features Solr and Adobe Stack instances.

×

Get in touch with our experts today to discover how we can help take your Solr implementation to the next level.


Bogdan Martin

Senior System Engineer


Alexandru Ionescu

System Engineer

More Developer Circle

  • Using Private Dependencies in Cloud Manager Builds

    Georg Henzler

    Principal Solution Architect

    Georg has grown up with the web space from the very beginning and has helped enterprises to design their solution for it from the time when images were first added to their web sites. Today he shapes the way Netcentric delivers IT projects technically by continuously evolving the development process and automating infrastructure to the last slash. Amongst being a committer for Apache Sling (the web framework in AEM) he has contributed to many other open source projects and talks regularly at tech conferences. He loves Netcentric for its culture of "Share, discuss and bring ideas to life"!