Deadlock between database and AppendOnlyPersistentMap

Description

We observed nodes locking up on Azure and OVH. This is output captured from a run on OVH with Rick's cache locking branch (https://github.com/corda/enterprise/tree/parkri-cache-locking).

A thread on the receiving node holds a lock in the PersistentMap, while waiting to read from the DB, see stack-trace below.

A different thread is likely holding a lock in the DB, while waiting for the lock on the AppendOnlyMap.

The full stack-traces of the sending and receiving nodes are attached (node1 and node2).

sp_who2 shows suspended selects.

Status

Assignee

Christian Sailer

Reporter

Thomas Schroeter

Labels

Priority

High

Fix versions

Ported to...

None

Feature Team

Performance and Platform Sustainability

CVSS Vector

None

Severity

Critical

Affects versions

Configure