Impact of environment

Normally, the application-level replication is fast and efficient. However, several environmental measurements can affect the delay of data and configuration changes made on one node being available on the rest of the replicated application nodes.

For example, if a database deadlock does not allow changes to be saved on a database, none of the application nodes connected to that database, even the one where the change originated, will not be able to display the change.

The subsections below are listed in the order they can impact the change recording transmission delay, from more to less:

Database server issues
Application server resources
Network latency
Bandwidth constraints

Impact of database server issues

Many issues can impact the proper functionality of the back-end database, and that’s exactly the source of the majority of the product reported issues and outages.

Database server resources The database server has to be correctly provisioned and its requirements as specified by Microsoft have to be met. There are some notes about what can affect application services in the next section, and for the database, the important resources are RAM size and speed, single-CPU speed and disk I/O.
Free disk space available has to be 1.5 times the size of the database files. Even more space is needed on the disk where the database’s tempdb files are stored, depending on the frequency of maintenance. During auto discovery, several large tables are completely replaced, which leads to large transactions, with tempdb files growing and shrinking very fast.
Database configuration Follow procedures as detailed in Installing database and database client software . If your database administrator does not agree with the settings, check with support@bravurasecurity.com on what impact any deviation from the listed configuration may have.
Database-user mapping If the database Login which is configured for use by the Bravura Security Fabric does not have enough privileges or even has sysadmin role granted, it can fail to run stored procedures in the correct context, or not even find them. The correct settings are also listed in Installing database and database client software.
Database maintenance This is important, especially after the count of table row passes the one million mark. Maintenance like reindexing, defragmenting, cleanup of obsolete historical data, and so on, can double or triple performance. Such maintenance has to be scheduled outside of working hours for most users, and also outside of application-scheduled tasks which change the database drastically, like psupdate, idtrack, autores , and so on.
- Ideally, the Database Service (iddb) on all the application nodes that use a database where maintenance is performed, should be stopped and disabled in order to prevent table and cross-sproc locking, and to avoid changing indexing while an already-calculated complex sproc execution plan is already running.
- It is also recommended to wait for the replication queues to empty after database maintenance completes and the affected application nodes are brought back online, before taking down another database node for maintenance.
Database schema and stored procedures These are also very important. They are designed with existing use-cases in mind, and as time and versions go by, both are improved and optimized for the types of loads and data patterns our clients require. The MSSQL Activity Monitor tends to point to "missing indexes" or "better index configuration" whenever long-running sprocs are involved. However, in most cases those "solutions" would increase the database size considerably.

Impact of application server resources

The first step toward a performant Bravura Security Fabric instance is to provision all production application servers according to the installation Server requirements .

The following server resources are important:

Multi-CPU (the more CPUs and faster, the more binaries can run at the same time).
Disk I/O is crucial for the application server. Services cache most files they need as they start up, but a large part of the automation in the application will use some sort of scripting in Python and components, which are all loaded on demand. Low disk latency (imposed by network conditions at the data center) can easily slow down any automated process, especially when millions of records are read from list databases.
RAM is important from its absolute minimum of 4GB to an acceptable value of 8GB, available to leave space for troubleshooting tools to run when needed.
- In the current version, RAM use will peak during psupdate and skin generation (node.exe will grow to 1.2GB as it compiles the various Angular apps that make up the web-based interface).
- Under high use ( 60-100 users per node simultaneously), the Ajax service can grow to 2GB or more.
- The more concurrent agents are to be run from the primary or from a proxy server, the more RAM will be needed.
- Starting with Bravura Security Fabric version 12, list files from target systems are SQLite databases instead of structured text files, which will increase the listing and filtering performance, at the expense of more RAM use per agent, for the iddiscover service, and for any other process that handles list files.

Impact of network latency

In the context of file/registry replication, each transmission is followed by acknowledgment from the peer application node. This means that the impact of replication is approximately a product of the number of files to be transferred times the packet response time. For example, to replicate 100 files over a 100ms network, a total of 20 seconds of packet latency overhead is introduced per pair of replicated nodes.

In the context of database replication, each batch of stored procedure calls is impacted by the latency twice per TCP receive window size. The TCP receive window size is an operating system parameter, typically between 8KB and 64KB. In general, throughput drops as latency increases, but this is not normally a problem so long as latency is well under 500ms per packet and overall transaction volumes are "normal".

Best practice

Bravura Security recommends placing all Bravura Security Fabric servers which made up a database node at locations with no more than 150ms latency between them.

In practice, high network latency (where application nodes are on opposite sides of a continent or separated by an ocean) has the following impact:

The time required to complete file/registry replication during nightly auto-discovery will grow by several minutes per pair of replicated application nodes.
The time required to complete database replication for large volumes of data – during nightly auto discovery and under heavy load conditions – can grow from seconds to minutes.

Impact of bandwidth constraints

File/registry replication is impacted by bandwidth between replicated nodes in the sense that the time required to transmit changes from one node to another will be at least the size of the change divided by the bandwidth available. For example, if a 100MByte executable is installed on the primary node and 10Mbps is available between application nodes, then the time to transmit the change will be at least:

T = (100 * 10 6 * 10bits/byte)=(10 * 10 6 ) = 100sec

In practice, latency adds further delay to this calculation, as described in Impact of network latency , so in the best case scenario using the data above, the actual time would be more than 100sec.

Similarly, database replication is impacted by bandwidth by limiting the rate at which stored procedure calls can be forwarded from one node to another. Since stored procedures arrive at each application node sporadically, they are more impacted by latency, which adds a "fixed cost" to each batch of stored procedures. In practice, it is only transfers of relatively large data sets – for example during auto-discovery, that are noticeably impacted by bandwidth.

Best practice

Bravura Security recommends placing Bravura Security Fabric database nodes at locations with at least 5Mbps bandwidth available between them.

In practice, low network bandwidth, where application nodes have less than 1Mbps of bandwidth available to propagate changes from one to another, has the following impacts:

The time required to complete file/registry replication during nightly auto-discovery will grow – with the delay being determined by (a) the volume of data that needs to be forwarded and (b) the available bandwidth.
The time required to complete database replication for large volumes of data – that is, during nightly auto discovery and under heavy load conditions – can grow. In some cases (that is, high load (for example, Gigabyte volumes), very low bandwidth (for example, 100kbps)) a substantial backlog can develop.

Estimating bandwidth requirements

Nightly auto-discovery

The bulk of data transmission between application nodes during the nightly auto-discovery process is to transfer list files from the primary application node, where they are generated, to all other nodes. Since compression is used, on average, the total data transmitted will generally be less than half of the disk space consumed on the primary node by these files.

For example, if lists of users, groups, account attributes and computer groups on the primary node consume 50MB of disk space then no more than 25MB of network bandwidth will be used during nightly auto-discovery to transfer this data set to each secondary application node.

Real time database replication

The volume of data replication between servers depends on the workload generated by each server. Some rough rules of thumb are:

With Bravura Pass , every user login session, either to change passwords or enroll security questions, will generate about 29 replicating procedures.
With Bravura Identity , every workflow request (input, approvals, fulfillment) will generate about 200 replicating procedures.
With Bravura Privilege , every scheduled password randomization will generate about 20 replicating procedures.

In this section:

Impact of environment

Impact of database server issues

Impact of application server resources

Impact of network latency

Best practice

Impact of bandwidth constraints

Best practice

Estimating bandwidth requirements

Search results