We're updating the issue view to help you get more done. 

Deadlock between database and AppendOnlyPersistentMap

Description

We observed nodes locking up on Azure and OVH. This is output captured from a run on OVH with Rick's cache locking branch (https://github.com/corda/enterprise/tree/parkri-cache-locking).

A thread on the receiving node holds a lock in the PersistentMap, while waiting to read from the DB, see stack-trace below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 "flow-worker" #1658 daemon prio=5 os_prio=0 tid=0x00007fa9e40a9800 nid=0x811d runnable [0x00007fa66a0be000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:999) at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:989) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975) - locked <0x00007fac6d4b7f38> (a java.lang.Object) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) - locked <0x00007fac6d4b9b70> (a sun.security.ssl.AppInputStream) at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1971) at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6362) - locked <0x00007fb3e20001a8> (a com.microsoft.sqlserver.jdbc.TDSReader) at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7620) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:576) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:508) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7233) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2869) - locked <0x00007fac6d4c2bf0> (a java.lang.Object) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:243) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:218) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:461) at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java) at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.executeUpdate(ResultSetReturnImpl.java:175) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3171) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3686) at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:90) at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:604) at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:478) at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:356) at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:39) at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1454) at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1440) at net.corda.node.utilities.AppendOnlyPersistentMapBase.loadValue(AppendOnlyPersistentMap.kt:144) at net.corda.node.utilities.AppendOnlyPersistentMapBase.access$loadValue(AppendOnlyPersistentMap.kt:22) at net.corda.node.utilities.AppendOnlyPersistentMapBase$transactionalLoadValue$3.invoke(AppendOnlyPersistentMap.kt:159) at net.corda.node.utilities.AppendOnlyPersistentMapBase$Transactional$Unknown$valueWithoutIsolationDelegate$1.invoke(AppendOnlyPersistentMap.kt:279) at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74) - locked <0x00007fb3e70008e0> (a kotlin.SynchronizedLazyImpl) at net.corda.node.utilities.AppendOnlyPersistentMapBase$Transactional$Unknown.isPresent(AppendOnlyPersistentMap.kt:277) at net.corda.node.utilities.AppendOnlyPersistentMapBase$Transactional.orElse(AppendOnlyPersistentMap.kt:250) at net.corda.node.utilities.AppendOnlyPersistentMapBase.get(AppendOnlyPersistentMap.kt:40) at net.corda.node.services.identity.PersistentIdentityService$certificateFromKey$1.invoke(PersistentIdentityService.kt:156) at net.corda.node.services.identity.PersistentIdentityService$certificateFromKey$1.invoke(PersistentIdentityService.kt:33) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:219) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:199) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:205) at net.corda.node.services.identity.PersistentIdentityService.certificateFromKey(PersistentIdentityService.kt:156) at net.corda.node.services.identity.PersistentIdentityService.stripNotOurKeys(PersistentIdentityService.kt:194) at net.corda.node.services.keys.BasicHSMKeyManagementService$filterMyKeys$1.invoke(BasicHSMKeyManagementService.kt:106) at net.corda.node.services.keys.BasicHSMKeyManagementService$filterMyKeys$1.invoke(BasicHSMKeyManagementService.kt:32) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:219) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:199) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:205) at net.corda.node.services.keys.BasicHSMKeyManagementService.filterMyKeys(BasicHSMKeyManagementService.kt:105) at net.corda.node.services.vault.NodeVaultService$makeUpdates$1.invoke(NodeVaultService.kt:230) at net.corda.node.services.vault.NodeVaultService.makeUpdates(NodeVaultService.kt:285) at net.corda.node.services.vault.NodeVaultService.access$makeUpdates(NodeVaultService.kt:59) at net.corda.node.services.vault.NodeVaultService$notifyAll$2.invoke(NodeVaultService.kt:210) at net.corda.node.services.vault.NodeVaultService.notifyAll(NodeVaultService.kt:221) at net.corda.node.services.api.ServiceHubInternal$Companion$recordTransactions$1.invoke(ServiceHubInternal.kt:106) at net.corda.node.services.api.ServiceHubInternal$Companion$recordTransactions$1.invoke(ServiceHubInternal.kt:51) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:219) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:199) at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:205) at net.corda.node.services.api.ServiceHubInternal$Companion.recordTransactions(ServiceHubInternal.kt:60) at net.corda.node.services.api.ServiceHubInternal$DefaultImpls.recordTransactions(ServiceHubInternal.kt:132) at net.corda.node.internal.AbstractNode$ServiceHubInternalImpl.recordTransactions(AbstractNode.kt:1005) at net.corda.core.internal.ResolveTransactionsFlow.call(ResolveTransactionsFlow.kt:102) at net.corda.core.internal.ResolveTransactionsFlow.call(ResolveTransactionsFlow.kt:25) at net.corda.node.services.statemachine.FlowStateMachineImpl.subFlow(FlowStateMachineImpl.kt:294) at net.corda.core.flows.FlowLogic.subFlow(FlowLogic.kt:311) at net.corda.core.flows.ReceiveTransactionFlow.call(ReceiveTransactionFlow.kt:46) at net.corda.core.flows.ReceiveTransactionFlow.call(ReceiveTransactionFlow.kt:27) at net.corda.node.services.statemachine.FlowStateMachineImpl.subFlow(FlowStateMachineImpl.kt:294) at net.corda.core.flows.FlowLogic.subFlow(FlowLogic.kt:311) at net.corda.core.flows.ReceiveFinalityFlow.call(FinalityFlow.kt:260)

A different thread is likely holding a lock in the DB, while waiting for the lock on the AppendOnlyMap.

The full stack-traces of the sending and receiving nodes are attached (node1 and node2).

sp_who2 shows suspended selects.

Status

Assignee

Christian Sailer

Reporter

Thomas Schroeter

Priority

High

Labels

Severity

Critical

Fix versions

Ported to...

None

Feature Team

Performance and Platform Sustainability

Affects versions

Corda Enterprise 4