Deadlock between database and AppendOnlyPersistentMap

Description

We observed nodes locking up on Azure and OVH. This is output captured from a run on OVH with Rick's cache locking branch (https://github.com/corda/enterprise/tree/parkri-cache-locking).

A thread on the receiving node holds a lock in the PersistentMap, while waiting to read from the DB, see stack-trace below.

A different thread is likely holding a lock in the DB, while waiting for the lock on the AppendOnlyMap.

The full stack-traces of the sending and receiving nodes are attached (node1 and node2).

sp_who2 shows suspended selects.

Assignee

Christian Sailer

Reporter

Thomas Schroeter

Epic Link

None

Priority

High

Engineering Teams

None

Fix versions

Affects versions

Ported to...

None

Sprint

None

Labels

Story Points / Dev Days

None

Feature Team

Performance and Platform Sustainability

Severity

Critical
Configure