FileCloud is a hyper-secure file storage, sharing and collaboration platform that provides a powerful set of tools for admins and users to manage their data. This includes High Availability (HA) architecture support and content management functionalities, specifically content search via FileCloud’s Solr integration. Solr is an open-source content indexing and search application developed and distributed […]
FileCloud is a hyper-secure file storage, sharing and collaboration platform that provides a powerful set of tools for admins and users to manage their data. This includes High Availability (HA) architecture support and content management functionalities, specifically content search via FileCloud’s Solr integration.
Solr is an open-source content indexing and search application developed and distributed by Apache. This application is included with FileCloud installations.
Pacemaker and Corosync are open-source software solutions maintained by ClusterLabs. These solutions provide cluster management capabilities to client servers. Specifically, Pacemaker is a resource manager tool used on computer clusters for HA architecture, whereas Corosync supports cluster membership and messaging.
By configuring Solr HA in FileCloud with Pacemaker and Corosync, the admin can strengthen redundancy configurations, improve overall resiliency of backend software components, including quorate and resource-driven clusters, and provide fine-tuned management capabilities within and between nodes.
This step-by-step guide will outline how to manually configure Solr HA with Pacemaker and Corosync in FileCloud.
solr01 – Solr host – cluster member
solr02 – Solr host – cluster member
solr03 – quorum-device – quorum for cluster
solr-ha – proxy-ha host
NFSShare – NFS resource which can be mounted on solr01 and solr02
The example laid out in this blog post uses CentOS 7 (CentOS Linux release 7.9.2009 (Core)).
The installation instructions for Pacemaker and Corosync clusters remain the same, regardless of the Linux distributor (Ubuntu, Fedora, RedHat, or Debian).
Install all available patches using the following command:
Command(as root):
yum update
After installing the necessary patches, reboot the system. This step must be completed for all three hosts: solr01, solr02, and solr03.
Then, the package that provides necessary nfs-client subsystems must be installed.
command(as root):
yum install -y nfs-utils
Next, wget must be installed.
command(as root):
yum install -y wget
Installing Solr in your FileCloud instance is (naturally) a critical part of configuring Solr HA. As indicated above Solr can be broken down into specific Solr hosts that are members of a cluster. These hosts must be individually configured.
Beginning with Solr01, prepare a clean Linux-based OS (such as the example we are using, Centos7). You may also use other operating systems according to your preference.
On the clean OS, download the FileCloud installation script: filecloud-liu.sh (official installation script).
If any issues arise related to the REMI repo, the alternative can be used: https://zerg.codelathe.com/url/qj2ny2yrvtzehjsk
Create the following folder: /opt/solrfcdata
Command(as root):
mkdir /opt/solrfcdata
The NFS filesystem should be mounted under the following:
Command(as root):
mount -t nfs ip_nfs_server:/path/to/nfs_resource /opt/solrfcdata
Next, start the solr component installation from using FileCloud installation script:
command(as root):
sh ./filecloud-liu.sh
Follow the instructions until reaching the selection screen.
Select the “solr” option and click “enter.” The installation process may take a few minutes. Wait for confirmation that installation has been completed.
Host: solr01, solr02
Solrd will, by default, try to bind to the localhost only. Modify the file so that solr binds to the external interface.
Modify the following file: /opt/solr/server/etc/jetty-http.xml
Change the following line in the file.
Original Line:
<Set name="host"><Property name="jetty.host" default="127.0.0.1" /></Set>
New Line:
<Set name="host"><Property name="jetty.host" default="0.0.0.0" /></Set>
Solr was started with the FileCloud installation. Before proceeding, stop the Solr service.
Host: solr01, solr02
command(as root):
/etc/init.d/solr stop
Remove the following file: /etc/init,d/solr
command(as root):
rm /etc/init.d/solr
Create a new file:
command(as root):
touch /etc/systemd/system/solrd.service
Edit this new file and copy the contents specified below to this file:
command(as root):
vi /etc/systemd/system/solrd.service
Copied Content:
### Beginning of File ### [Unit] Description=Apache SOLR [Service] User=solr LimitNOFILE=65000 LimitNPROC=65000 Type=forking Restart=no ExecStart=/opt/solr/bin/solr start ExecStop=/opt/solr/bin/solr stop ### End of File ###
Save the file before continuing.
Host: solr01, solr02
command(as root):
systemctl daemon-reload systemctl stop solrd
It should not return any errors. Start the service:
command(as root):
systemctl start solrd systemctl status solrd
Expected Output:
Folder: /opt/solrfcdata
Host: solr02
Critical Note: THIS COMMAND SHOULD BE ISSUED ONLY ON solr02.
command(as root):
systemctl stop solrd rm -rf /opt/solrfcdata/*
Complete this step whenever needed, as in the below example on CentOS.
Host: solr01, solr02
command(as root):
firewall-cmd --permanent --add-port 8983/tcp firewall-cmd --reload
With these steps completed, the Solr installation has been carried out to successfully prepare the environment for HA clusters.
Host: solr01, solr02, solr03
Add the entries for all 3 cluster nodes, so that the file reads as follows:
coresponding_ip solr01 coresponding_ip solr02 coresponding_ip solr03
Example:
File: cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.101.59 solr01 192.168.101.60 solr02 192.168.101.61 solr03
hosts: solr01 and solr02
command(as root):
yum -y install pacemaker pcs corosync-qdevice sbd
hosts: solr01 and solr02
command(as root):
systemctl start pcsd systemctl enable pcsd
hosts: solr01, solr02
Set the same password for all hosts for the hacluster user.
command(as root):
passwd hacluster
Provide the hacluster user with the login credentials, as these will be necessary in later steps.
hosts: solr01 and solr02
command(as root):
firewall-cmd --add-service=high-availability –permanent firewall-cmd --reload
hosts: Solr01
command(as root):
pcs cluster auth solr01 solr02
Username: hacluster
Password: "secret_password” set in the previous step.
Expected Output:
solr01 Authorized solr02 Authorized
hosts: solr01
command(as root):
pcs cluster setup --name solr_cluster solr01 solr02
hosts: solr01
command(as root):
pcs cluster start --all pcs cluster enable --all
Install the required software on solr03 (quorum-only cluster node).
Host: solr03
command(as root):
yum install pcs corosync-qnetd
Host: solr03
command(as root):
systemctl enable pcsd.service systemctl start pcsd.service
Host: solr03
command(as root):
pcs qdevice setup model net --enable –start
Open the firewall traffic (if required – below example on CentOS)
Host: solr03
command(as root):
firewall-cmd --permanent --add-service=high-availability firewall-cmd --add-service=high-availability
Set the password for the hacluster user on solr03.
Host: solr03
command(as root):
passwd hacluster
Provide the password to the HA cluster user. This password should be the same password used for solr01 and solr02.
Host: solr01
command(as root):
pcs cluster auth solr03
Username: hacluster
Password:
Host: solr01
command(as root):
pcs quorum device add model net host=solr03 algorithm=lms
Verification:
Host: solr01
command(as root):
pcs quorum status
Expected Output:
Quorum information ------------------ Date: Wed Aug 3 10:27:26 2022 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 1 Ring ID: 2/9 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 2 1 A,V,NMW solr02 1 1 A,V,NMW solr01 (local) 0 1 Qdevice
The module softwatchdog should load automatically after rebooting the system.
Host: solr01, solr02
command(as root):
echo softdog > /etc/modules-load.d/watchdog.conf
Host: solr01, solr02
command(as root):
reboot
Carry out the reboots in sequence:
Host: solr01, solr02
command(as root):
pcs stonith sbd enable
Host: solr01
command(as root):
pcs cluster stop --all pcs cluster start --all
Host: solr01
command(as root):
pcs stonith sbd status
Expected Output:
SBD STATUS <node name>: <installed> | <enabled> | <running> solr01: YES | YES | YES solr02: YES | YES | YES
Host: solr01
command(as root):
pcs resource create NFSMount Filesystem device=192.168.101.70:/mnt/rhvmnfs/solrnfs directory=/opt/solrfcdata fstype=nfs --group solr
Legend:
The parameter device should point to the nfs server and nfs share being used in the configuration.
Verification:
Host: solr01
command(as root):
pcs status
Expected Output:
Cluster name: solr_cluster Stack: corosync Current DC: solr01 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum Last updated: Wed Aug 3 12:22:36 2022 Last change: Wed Aug 3 12:20:35 2022 by root via cibadmin on solr01 2 nodes configured 1 resource instance configured Online: [ solr01 solr02 ] Full list of resources: Resource Group: solr NFSMount (ocf::heartbeat:Filesystem): Started solr01 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled sbd: active/enabled
Host: solr01
command(as root):
pcs resource update NFSMount meta on-fail=fence
Host: solr01
command(as root):
pcs resource create solrd systemd:solrd --group solr
Verification:
Host: solr01
command(as root):
pcs status
Expected Output:
Cluster name: solr_cluster Stack: corosync Current DC: solr01 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum Last updated: Wed Aug 3 12:25:45 2022 Last change: Wed Aug 3 12:25:22 2022 by root via cibadmin on solr01 2 nodes configured 2 resource instances configured Online: [ solr01 solr02 ] Full list of resources: Resource Group: solr NFSMount (ocf::heartbeat:Filesystem): Started solr01 solrd (systemd:solrd): Started solr02 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled sbd: active/enabled
Host: solr01
command(as root):
pcs property set stonith-watchdog-timeout=36 pcs property set no-quorum-policy=suicide
Our example uses CentOS.
Host: solr-ha
command(as root):
yum install -y haproxy
Configure the haproxy to redirect to the active solr node.
Host: solr-ha
backup file: /etc/haproxy/haproxy.cfg
command(as root):
mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg_bck
File: /etc/haproxy/haproxy.cfg
Add the content below into the empty file.
#### beginning of /etc/haproxy/haproxy.cfg ### global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon stats socket /var/lib/haproxy/stats defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 frontend solr_front *:8983 default_backend solr_back backend static balance roundrobin server static 127.0.0.1:4331 check backend solr_back server solr01 solr01:8983 check server solr02 solr02:8983 check #### beginning of /etc/haproxy/haproxy.cfg ###
Ensure that parameters solr01/solr02 point to the full DNS name or to the IP of the cluster nodes.
Host: solr-ha
command(as-root):
systemctl enable haproxy systemctl start haproxy
Solr service will be available on host solr-ha on port 8983 – independent of where it is really running, either on solr01 or solr02.
Congratulations! If you followed these step-by-step instructions, you will have successfully configured Solr with high availability along with Pacemaker and Corosync. This configuration will serve to improve redundancy and security for your critical data.
For any questions on Solr or High-Availability architecture, schedule a consultation or configuration support session.
Article written by Marek Frueauff, Solutions Architect
Edited by Katie Gerhardt, Junior Product Marketing Manager
Below are the key terms used in this article, listed in alphabetical order.
Term | Definition |
---|---|
Cluster | The group of servers or other IT systems, which primary purpose is to realize similar or exactly the same function to achieve one or both of the outcome’s: High Availability or Load Balance. |
Cluster Quorum | Server or other system that is part of the cluster and performs a particular role: verify which production cluster nodes (servers) can be communicated and their health status. In cluster members are missing, the cluster quorum system decides if the remaining servers can operate and continue providing services or if it should be treated as unhealthy. There is a risk of the split brain situation. The main purpose of the cluster quorum system is to avoid the split brain scenario. |
Corosync | Corosync is a typical part of High Availability architecture set up in Linux or Unix systems and usually exists alongside pacemaker. Corosync is the communication engine responsible for keeping cluster nodes (servers) in sync state. |
Firewall | Software or hardware which can inspect and manipulate based on the multiple rules the network traffic. The modern firewalls implementations can operate on multiple network layers (usually from 3 to 7) including the network frame content inspection. |
Firewall-cmd | The modern Linux build in firewall software implementation. |
nfs | Network File System – is the filesystem which by design is network related. It is common method to share file resources in the unix environment. Due to very long history related to this technology it has been implemented almost on all possible operating systems and became very popular and commonly used. |
Pacemaker | Open-source software involved in cluster resource management and part of a typical High Availability setup on Linux systems to provide modern functions and cluster management resources. |
Proxy | Software or hardware solution that provides a gateway between two networks separated by design. A proxy is usually installed between the public Internet and a local network and allows some communications between those network segments based on predefined rules. A proxy can also be used for other purposes, like load balancing: for example redirecting incoming connections from one network to multiple hosts in another network segment. |
Proxy-HA | The specific implementation of the proxy mechanism to provide High Availability service, which is usually correlated with a single host (server). In our example proxy-ha is used to verify where services are currently running (on which cluster servers) and redirect all incoming requests to the active node. |
Resource Group | A logical organization unit within the pacemaker cluster implementation that enables control of the dependencies between particular resources managed by the cluster. For example, the nfs server that shares files must be started after the filesystem where the files resists and additionally on the same cluster node (server) – this control can be easily achieved using Resource Groups. |
QDevice | The software implementation of the quorum functionality in the pacemaker cluster setup. This kind of functionality is being installed on the cluster host, which will perform the quorum role only, and will never provide any other services. |
SBD | Stonith Block Device - by design this the implementation of the additional communication and stonith mechanism on top of shared block device between cluster nodes (servers). In some cases, sbd can be used in the diskless mode (as in our example). To operate in this mode, the watchdog mechanism needs to be enabled/installed. |
Solr | Advanced and open-source search and indexing system maintained and developed by Apache. This mechanism is a part of the standard FileCloud installation. |
Split Brain | Very dangerous scenario in all cluster environments in which a node or nodes loses the ability to communicate with the rest of the node population due to an environment malfunction (most often due to lost network connectivity). In this situation, a separated node may “think” that it is the “last man standing” and calls up all cluster resources to begin providing all services. This resource demand is repeated by all cluster nodes, leading to disagreement on which node should remain active and which services the cluster should provide. Each cluster implementation has multiple built-in mechanisms to prevent this situation, which can easily lead to data corruption. One such mechanism is stonith, which is activated as soon as the node is loses its “quorate” status –indicating a high probability that the node is not visible by the rest of the environment. |
Stonith | Shut The Other Node in The Head is a mechanism that allows an immediate restart (without any shut down procedure) of any node in the cluster. This mechanism is extremely important to prevent potential data corruption by the wrong cluster node behavior. |
SystemV | The name of the former Linux approach to starting and stopping system services (daemons). |
SystemD | The name of the modern Linux approach to starting and stopping system services (daemons) and much more. Each modern Linux distribution now uses systemd as the main mechanism to manage system services. |
Watchdog | The software or hardware mechanism that works like a delayed bomb detonator. The watchdog is periodically pinged by the system (approximately every 5 seconds) to reset the countdown procedure. If the countdown reaches 0, watchdog will reset the operating system immediately. Watchdog is used with Pacemaker in clusters to ensure that nodes remain recognized within the cluster community. In the event of a lost connection (which is the typical reason behind the Split Brain scenario), Watchdog enables an immediate reboot of the node. |