Replication in detail at a service level

The built-in application-level replication between Bravura Security Fabric nodes forwards calls to stored procedures known to alter the state of the database. Calls to stored procedures that do not alter the user data or product configuration are not replicated.

To describe the replication process, a reference model will be used:

Assume there are two application nodes (A-A and A-B).
Each application node has its own database node (DB-A and DB-B).
Each application node has one writer thread responsible for writing updates to its local database initiated on another node (W-A and W-B).
The writer threads (W-A and W-B) each have their own queues on the local file system. These are used to retry database stored procedure calls that failed due to connectivity or database problems.
Each application node also has one propagation thread per replicated peer, responsible for sending updates to peer application nodes. Thread P-A runs on A-A and sends updates to W-B. Thread P-B runs on A-B and sends updates to W-A.
The propagation threads (P-A and P-B) each have their own queues on the local filesystem. These are used to retry connections to peer application nodes (W-A and W-B) in the event of a connectivity or service outages.

This arrangement is highly reliable – it connects simple components in a fault-tolerant fashion, as illustrated below.

At runtime, replication operates as follows:

The user or an automation process changes relevant data on application node A-A.
The application on A-A calls a stored procedure P1 on DB-A.
A-A also queues P1 for replication to DB-B through P-A.
If transmission from P-A to W-B fails, P-A will continue to retry the transmission – either when the next stored procedure arrives (that is, try to send {P1,P2}) or after five minutes, whichever happens first.
Every transmission failure is logged on A-A’s idmsuite.log .
The AES encryption algorithm encrypts all node communication using a shared-key handshake. Both endpoints are authenticated in the handshake.
P1 will only be removed from P-A’s queue once transmission to W-B has been confirmed. Confirmation in this case means that W-B received P1 and inserted it into its queue, not that it has completed executing P1 on DB-B.
W-B will process transactions, including P1, from its queue and attempt to issue them to DB-B.
The transaction includes its source, so the audit data will contain the name of the application node where the transaction originated, no matter on which database it is written.
If DB-B reports successful execution, P1 will be removed from W-B’s queue.
If DB-B reports an error when attempting to run P1, such as an attempted violation of an integrity constraint, the full error, as reported by the database client, will be logged in idmsuite.log on A-B.
A summary of the P1 store procedure call (including a timestamp of the failure, the name of the stored procedure, its arguments and a short failure string) will be recorded in a sproc-failure log in the db\ directory of the instance on A-B. After that, P1 will be removed from W-B’s queue.
If W-B experiences a database connectivity error (that is, W-B cannot contact its database client, or its database client fails to send P1 to DB-B) then P1 will not be removed from W-B’s queue. W-B will retry database updates every 30 seconds until connectivity to DB-B is re-established.

In this section:

Replication in detail at a service level

Search results