Skip to main content

Link between single Bravura Security Fabric server and its database goes offline

Users may be given warning messages that refer to the module not being licensed when users attempt to start a new login session or their existing session is terminated. This error is generated because the database service is unable to authenticate, log, or confirm requests.The most likely errors that users will see are:

invalid session key! please re-log in

or

This module is not enabled for use! Please call your help desk.

Users may be given warning messages that refer to the module not being licensed when users attempt to start a new login session or their existing session is terminated. This error is generated because the database service is unable to authenticate, log, or confirm requests.The most likely errors that users will see are:

invalid session key! please re-log in

or

This module is not enabled for use! Please call your help desk.

The logs may include messages such as the following:

iddb.exe [948,2296] Error: Failed to initialize the SQL Server OLE DB provider, ensure
it is installed [0x80004005]
iddb.exe [948,2296] Error: Got error [0x80004005], [2], [0x0], [0x4005]
Replication and Recovery
iddb.exe [948,2296] Error: Provider error [HRESULT: 0X80004005
SQLSTATE: HYT00
Native Error: 0
Source: Microsoft SQL Native Client
Error message: Login timeout expired
HRESULT: 0X80004005
SQLSTATE: 08001
Native Error: 10061
Source: Microsoft SQL Native Client
Error message: An error has occurred while establishing a connection to the server.
When connecting to SQL Server 2005, this failure may be caused by the fact that under
the default settings SQL Server does not allow remote connections.
HRESULT: 0X80004005
SQLSTATE: 08001
Native Error: 10061
Error state: 1
Severity: 16
Source: Microsoft SQL Native Client
Error message: TCP Provider: No connection could be made because the target machine
actively

Any system monitoring system that is tracking the health of the database should also alarm at this point.

What stops working

What continues to work

Possible Causes

Data loss

Resolution

  • Users can no longer log into this server.

  • Users can no longer retrieve passwords from this server.

  • This server can no longer push password updates to target systems for which it is responsible.

  • Other servers detect that replication is impossible to this server, so start queuing updates to this server and displaying alarm messages, indicating that when the queue fills, they will stop functioning normally.

  • If the queue is allowed to fill – which could take several hours to several days, depending on activity level and queue size – other servers will suspend services; users will be unable to log in (since logins are logged in a replicated fashion) and will be unable to checkout passwords.

    Effectively, the entire system will go into an alert state until the dead server is repaired or removed from replication. The entire system will eventually switch to a DB COMMIT SUSPEND state if a repair is not made before replication queues on the other servers fill.

Other servers continue to function normally, unless their replication queues reach their limit.

In the event that the queue is full on other servers, they switch to DB COMMIT SUSPEND mode. In that case the only possible action is to remove the non-functional server from replication.

A problem occurs on the network connecting a single Bravura Security Fabric server to its database server; this presumes that the two are not on shared hardware.

This may be caused by DNS problems, router, switch or cabling problems, a failed NIC, or something else.

No data loss or – due to an unavoidable race condition – minimal data loss if updates on target systems were not yet committed to the database when the damaged server went offline

Network links and DNS problems should be diagnosed and repaired quickly. See Time available to fix problems.

If the server/database link cannot be fixed quickly, the affected Bravura Security Fabric server should be removed from the replication configuration on other Bravura Security Fabric servers promptly. Instructions for this are in Removing a node from replication.

At a later date, the server should be returned to the replicating set using instructions from Synchronizing a new node with an existing set of Bravura Security Fabric replicas.