Conflict detection

Randomizations with uncertain outcomes

Replication works extremely well in most cases without any additional complexity. When it comes to password randomization, however, merely maintaining synchronization is not good enough. Not only must all nodes record the same password for an account but it must be the correct password for the account on its target system as well. The delayed nature of replication means that if two nodes perform a randomization simultaneously, the value recorded for the password on each node is unpredictable because incoming replication messages from a replica will overwrite the current stored value. Depending on the account, if the wrong password is set in a node’s database, the impact can range from completely inconsequential to a catastrophic outage. Bravura Privilege is now capable of detecting cases where the passwords stored in its database may be incorrect or out of synchronization with its replicas and has a conflict resolution module to correct them.

Consider a replicated environment with only two nodes: A and B. When A randomizes an account’s password, it immediately writes to its database the password it intends to set and replicates that same value to B. B follows exactly the same process when it performs a randomization. If B begins a randomization near the same time as A’s randomization, the password that is stored in the database depends on how its constituent operations interact with A’s randomization. In the following three scenarios, involving two nodes randomizing the same account, different replication timing leads to different, sometimes incorrect, results.

Case 1: Serialized randomization

A writes provisional password #1 into its database
A launches its connector
A’s connector completes successfully
A records the completion of randomization #1
B receives provisional password #1 from A
B receives the completion of randomization #1 from A
B writes provisional password #2 into its database
B launches its connector
B’s connector completes successfully
B records the completion of randomization #2
A receives provisional password #2 from B
A receives the completion of randomization #2 from B

First A records password #1, then B records password #1, then B records password #2, then A records password #2. Ignoring any replication delay, both nodes correctly agree (that is, they have stored the same password in their databases) at all times as to the current password. Ideally, all randomizations follow this flow: each node is allowed to complete and fully replicate each randomization before any other node begins a new one.

Case 2: Perfectly parallel randomization

Suppose now that node B initiates its randomization earlier:

A writes provisional password #1 into its database
B writes provisional password #2 into its database
A launches its connector
B launches its connector
A’s connector completes successfully
B’s connector completes successfully
A records the completion of randomization #1
B records the completion of randomization #2
A receives provisional password #2 from B
A receives the completion of randomization #2 from B
B receives provisional password #1 from A
B receives the completion of randomization #1 from A

In this case, password #2 is the last password that was stored at A while password #1 is the last password that was stored at B. The nodes disagree and the instance is desynchronized. Because these are two completely independent nodes, each corresponding pair of steps may occur at exactly the same time on each node. By the time A learns that B is in the process of randomizing, its randomization is already complete, and vice versa.

Case 3: Interleaved parallel randomization

A writes provisional password #1 into its database
B writes provisional password #2 into its database
A launches its connector
B launches its connector
A’s connector completes successfully
B’s connector completes successfully
B records the completion of randomization #2
A receives provisional password #2 from B
A receives the completion of randomization #2 from B
A records the completion of randomization #1
B receives provisional password #1 from A
B receives the completion of randomization #1 from A

In this variant, A and B are at first in disagreement about the provisional password. Then, because A is a little slower than B, both nodes first store B’s password #2 and are in agreement, and then A’s password #1. But A’s connector completed before B’s connector. Although both nodes agree on password #1, the current password on the target system is password #2.

Case 4: Incomplete randomization

This is another case that is handled by Bravura Security Fabric 's password conflict detection technology, though it is unrelated to replication. If a node launches a connector to perform a randomization and the connector stops responding or times out, there is no way to know whether the connector was actually able to set the password on the target. It may be that the connector stopped responding immediately after startup and wasn't even able to connect to the target, or it may be that the target accepted the password randomization and network latency caused its acknowledgment to the connector to be dropped, leading to a timeout.

Ancestry trees

An ancestry tree is used to detect and correct simultaneous randomizations. In an ancestry tree, each randomization is linked to the last successful randomization that came before it (this randomization is called its parent), according to the node that issued it. This linkage creates a tree of randomizations. The root of the tree is the earliest randomization, and the tip is the latest randomization.

In the ideal case of a single node with no conflicts, each randomization has at most one child and at most one parent, and the tree is entirely vertical, as in Figure 1. With multiple nodes, each node maintains its own copy of the randomization tree, as in Figure 2, “Copy”. When a node receives a replication message informing it of a randomization, it incorporates that randomization into its tree, as in Figure 3, “Incorporation”. A replication message that creates a new randomization includes information about its parent and the node that initiated the randomization.

Figure 1. No conflicts

A simple randomization tree with no conflicts. Randomization 1 is the root and Randomization 3 is the tip

Figure 2. Copy

Node B performs a randomization and Node A has not yet processed its replication message. The replication message for Randomization 2 records that it was performed by Node B (in green), and that its parent is Randomization 1

Figure 3. Incorporation

Node A incorporates Node B’s randomization into its tree

Simultaneous randomizations are defined as two randomizations with the same parent. Such randomizations trigger a conflict called a tree conflict. In Figure 4, “Simultaneous randomization”, both nodes have performed different randomizations with the same parent. When their replication messages to each other are processed, they create a complex tree, shown in Figure 5, “Conflict detection”. Each node checks its randomization tree with each addition and searches for such conflicts.

Figure 4. Simultaneous randomization

Nodes A (in red) and B (in green) randomize the same account simultaneously and replicate to each other

Figure 5. Conflict detection

Nodes A (in red) and B (in green) randomize the same account simultaneously and replicate to each other

Schema - tables used for randomization

There are four primary tables used for randomization:

wstnpwdhis: This table contains all passwords that are known to have been successfully set on an account. Each password is identified by its sigkey, a globally unique string, and a reference to the account the password belongs to.
wstnpwdcur: This table has one row for each account. It tracks account-level metadata like the first time the account’s password was randomized, and has a reference to the sigkey of the current password for each account (located in wstnpwdhis).
wstnpwd_working: This table contains each account’s randomization tree.
wstnpwd_working_his: This table holds all passwords that are known to have been unsuccessfully set on an account. When a randomization fails, passwords are moved to this table. If a password with a status of U (see below) is rejected during password conflict resolution , it is also moved to this table.

All passwords that have ever been generated are always retained in one of these tables. They are only discarded using the rmidarchivehis program or a similar administrator-controlled process.

All passwords have a status, regardless of table location (though some tables imply some subsets of these statuses). The important statuses for conflict resolution are:

P : Pending. The password has been generated and a connector launch is imminent or in progress, but definitely not complete. Passwords in this status will be periodically timed out by a poll loop in the idarch service. Timed out pending passwords have their status set to U.
C: Confirmed. The connector attempted to set this password and was met with a successful acknowledgment from the target system. All passwords in wstnpwdhis implicitly have this status.
U : Uncertain. The connector was launched but stopped responding or timed out. The password may or may not have been successfully set on the target. The presence of a password with this status always causes accounts to be marked conflicted, even if the randomization tree is otherwise conflict-free. These types of conflicts are called incomplete randomization conflicts.
F : Failed. The connector attempted to set this password and was rejected by the target system.

Each node’s randomization tree is rooted at the sigkey specified in wstnpwdcur, which marks the current password for the account. This is the password that will be disclosed when checked out.

Randomization proceeds through a four-stage process:

A random password is generated and placed in wstnpwd_working with its parent set to the current password and its status set to P. The generated password is provided to the connector, which attempts to set it on the target system.
The password’s status is changed depending on the outcome of the connector execution. If the connector timed out or stopped responding, its status is updated to U. If the connector reports that it successfully changed the password, its status is updated to C. If the connector reports that it failed to change the password, its status is updated to F.
The resulting randomization tree is examined for conflicts. If there are conflicts, the account is marked conflicted. If not, the tree is walked from root to tip. Every confirmed password along the path is moved to wstnpwdhis, and the sigkey of the password at the tip is stored in wstnpwdcur, becoming the new root. Failed passwords encountered are moved to wstnpwd_working_his.
Conflict resolution is initiated for accounts in conflict. See Conflict resolution for more details of this process.

Implementation of the fourth stage differs depending on the node. On the node where randomization was initiated, the idarch service is signalled to begin resolution immediately. On other nodes it is not possible to signal any services because the tree validation takes place within a stored procedure in the context of the database service processing the incoming replication queue. Instead, the idarch service periodically polls the database to find accounts that have been marked conflicted by incoming replication messages. Note that conflicts are detected immediately when replicated, and the account marked appropriately; database polling is only required to initiate conflict resolution.

In this section: