Handling maintenance outages
Part of maintaining a Bravura Security Fabric instance involves performing operating system updates and other maintenance processes that would lead to service outages, and in the case of a replicated instance, may lead to loss of data and even configuration synchronization:
Outages may be needed when:
Applying an operating system update on an application node or database node
Upgrading the backend database engine
Running maintenance on the backend database (rebuilding indexes, data defragmentation, etc)
Manual database propagation
Other database/network outages
As a result, certain precautions have to be taken.
Single-node environment
In a single-node environment there is little that can be done about the outage, other than making it clean, so that no errors are recorded in the log:
Before a maintenance window, it is recommended that you disable any scheduled jobs.
From the Manage the system menu, navigate to Maintenance> Scheduled jobs, and disable any jobs that are scheduled for the maintenance time.
If any such jobs have already started, wait for them to end, or close them safely.
Before doing any maintenance that changes something in configuration, especially when upgrading Microsoft SQL Server (for which a Migration may be required), backup the server in question (Snapshot in a VM, disk image on baremetal) in case something goes wrong and you need to rollback.
Note that upgrading MSSQL to a newer version may not officially be supported by Bravura Security; in those cases, the MSSQL server could still work in compatibility mode. Ensure that you test in a separate environment before applying in production.
See Installing database and database client software for a list of database versions officially supported.
If the maintenance is to be done on the Bravura Security Fabric application node, arrange with the proxy service used in the network environment, or with the DNS resolver, to forward calls to a static page on some other web server that describes the outage and the planned service restoration time; otherwise users will be faced with a "service unavailable" error from their browsers if they try to access the product web interface.
If the maintenance is to be done on the backend database and that is running on a different server:
Disable Bravura Security tasks in the OS task scheduler.
Disable the Bravura Security External Database Replicator tasks in a single-node instance or on the secondary node(s) of a replicated instance. See the File Synchronization section in the Replication and Recovery documentation.
Stop the Microsoft IIS service (w3svc) or redirect HTTP requests to the application to a static page as mentioned above.
Stop and disable the <iddb>.
To automate the service part of this, in Powershell:
gsv w3svc,*_<instancename> | % {C:\Windows\System32\sc stop $_.Name}
When database maintenance is completed, restore these processes in reverse order:
<iddb> and the rest of the non-disabled Bravura Security Fabric services, IIS, scheduled tasks.
gsv *_<instancename>,w3svc | % {C:\Windows\System32\sc start $_.Name}
Monitor
idmsuite.log
when the product services (especiallyiddb
) are starting up, and address any errors before continuing.
Multi-node environment
In order to provide service continuity and failover, Bravura Security Fabric offers application-level data and configuration replication. That allows one application node to be taken down for maintenance as described above, while the other nodes continue to respond to user requests.
When this happens, the nodes that remain running will queue any changes in their respective instance db\replication directories.
When the free space available on the disk reaches 10% (90% is used by the queues, logs or other files in that disk partition or virtual disk), the product will not respond to user requests anymore. Administrators with the "Configure replication" privilege can increase the replication thresholds.
If upgrading the MSSQL service, note that until all replicated nodes' databases are upgraded, the <iddb> will warn that there are different database versions, both on the Database replication page, and in idmsuite.log
.
After performing maintenance on the server or backend, bring the node back up, and look at the page for each node under Manage the system> Maintenance> Database replication . Scroll down on the page to verify that the queue is empty between the current node and all others listed in separate sections on that page.
See also
Replication and Recovery provides for more detailed information on restoring nodes in a replicated environment.