Single Bravura Security Fabric server fails
Users may complain that they cannot log into this server at all (no login page) and that other servers display the following error message:
Changes to this instance are temporarily disallowed. Please contact the Bravura Security Fabric administrator. Due to a problem in the replication environment all pages except Database replication and System logs are temporarily disabled.
The DB COMMIT SUSPEND event (Manage the system > Maintenance > Options) is configured by default to send an email as soon as the Bravura Security Fabric server has entered this state.
What stops working | What continues to work | Possible Causes | Data loss | Resolution |
---|---|---|---|---|
| Other servers continue to function normally, unless their replication queues reach their limit. In the event that the queue is full on other servers, they switch to DB COMMIT SUSPEND mode. In that case, removing the non-functional server from replication is the only possible action. | A problem occurs on a single Bravura Security Fabric server. This may be for a variety of reasons, including:
| No data loss or – due to an unavoidable race condition – minimal data loss if updates on target systems were not yet committed to the database when the damaged server went offline | Fix the failed server if it can be done in time. See Time available to fix problems . Other servers will continue to function in the meanwhile. See Troubleshooting Bravura Security Fabric server failures for fixes to some possible failures. If the server cannot be fixed quickly or is permanently damaged, remove it from the replication configuration on other servers promptly, as described in Removing a node from replication . If the failed server can be recovered (for example, by installing new hardware), synchronize the node with the already-running replicated nodes, using the process described in Synchronizing a new node with an existing set of Bravura Security Fabric replicas . If the failing server was acting as the primary, then it may be necessary to promote one of the secondary nodes to allow it to initiate resynchronization. Update the list of scheduled jobs so the most up-to-date replica is acting as the primary, then resynchronize the new replacement node. Once the replacement has been confirmed as functional, it can be promoted to the primary node similarly. |